Science.gov

Sample records for reinforcement learning based

  1. Reinforcement Learning Based Artificial Immune Classifier

    PubMed Central

    Karakose, Mehmet

    2013-01-01

    One of the widely used methods for classification that is a decision-making process is artificial immune systems. Artificial immune systems based on natural immunity system can be successfully applied for classification, optimization, recognition, and learning in real-world problems. In this study, a reinforcement learning based artificial immune classifier is proposed as a new approach. This approach uses reinforcement learning to find better antibody with immune operators. The proposed new approach has many contributions according to other methods in the literature such as effectiveness, less memory cell, high accuracy, speed, and data adaptability. The performance of the proposed approach is demonstrated by simulation and experimental results using real data in Matlab and FPGA. Some benchmark data and remote image data are used for experimental results. The comparative results with supervised/unsupervised based artificial immune system, negative selection classifier, and resource limited artificial immune classifier are given to demonstrate the effectiveness of the proposed new method. PMID:23935424

  2. Model-Based Reinforcement Learning under Concurrent Schedules of Reinforcement in Rodents

    ERIC Educational Resources Information Center

    Huh, Namjung; Jo, Suhyun; Kim, Hoseok; Sul, Jung Hoon; Jung, Min Whan

    2009-01-01

    Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's…

  3. A reinforcement learning-based architecture for fuzzy logic control

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1992-01-01

    This paper introduces a new method for learning to refine a rule-based fuzzy logic controller. A reinforcement learning technique is used in conjunction with a multilayer neural network model of a fuzzy controller. The approximate reasoning based intelligent control (ARIC) architecture proposed here learns by updating its prediction of the physical system's behavior and fine tunes a control knowledge base. Its theory is related to Sutton's temporal difference (TD) method. Because ARIC has the advantage of using the control knowledge of an experienced operator and fine tuning it through the process of learning, it learns faster than systems that train networks from scratch. The approach is applied to a cart-pole balancing system.

  4. Reinforcement Learning Based Web Service Compositions for Mobile Business

    NASA Astrophysics Data System (ADS)

    Zhou, Juan; Chen, Shouming

    In this paper, we propose a new solution to Reactive Web Service Composition, via molding with Reinforcement Learning, and introducing modified (alterable) QoS variables into the model as elements in the Markov Decision Process tuple. Moreover, we give an example of Reactive-WSC-based mobile banking, to demonstrate the intrinsic capability of the solution in question of obtaining the optimized service composition, characterized by (alterable) target QoS variable sets with optimized values. Consequently, we come to the conclusion that the solution has decent potentials in boosting customer experiences and qualities of services in Web Services, and those in applications in the whole electronic commerce and business sector.

  5. Model-based hierarchical reinforcement learning and human action control

    PubMed Central

    Botvinick, Matthew; Weinstein, Ari

    2014-01-01

    Recent work has reawakened interest in goal-directed or ‘model-based’ choice, where decisions are based on prospective evaluation of potential action outcomes. Concurrently, there has been growing attention to the role of hierarchy in decision-making and action control. We focus here on the intersection between these two areas of interest, considering the topic of hierarchical model-based control. To characterize this form of action control, we draw on the computational framework of hierarchical reinforcement learning, using this to interpret recent empirical findings. The resulting picture reveals how hierarchical model-based mechanisms might play a special and pivotal role in human decision-making, dramatically extending the scope and complexity of human behaviour. PMID:25267822

  6. Kernel-based least squares policy iteration for reinforcement learning.

    PubMed

    Xu, Xin; Hu, Dewen; Lu, Xicheng

    2007-07-01

    In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of uncertain dynamic systems. By using KLSPI, near-optimal control policies can be obtained without much a priori knowledge on dynamic models of control plants. In KLSPI, Mercer kernels are used in the policy evaluation of a policy iteration process, where a new kernel-based least squares temporal-difference algorithm called KLSTD-Q is proposed for efficient policy evaluation. To keep the sparsity and improve the generalization ability of KLSTD-Q solutions, a kernel sparsification procedure based on approximate linear dependency (ALD) is performed. Compared to the previous works on approximate RL methods, KLSPI makes two progresses to eliminate the main difficulties of existing results. One is the better convergence and (near) optimality guarantee by using the KLSTD-Q algorithm for policy evaluation with high precision. The other is the automatic feature selection using the ALD-based kernel sparsification. Therefore, the KLSPI algorithm provides a general RL method with generalization performance and convergence guarantee for large-scale Markov decision problems (MDPs). Experimental results on a typical RL task for a stochastic chain problem demonstrate that KLSPI can consistently achieve better learning efficiency and policy quality than the previous least squares policy iteration (LSPI) algorithm. Furthermore, the KLSPI method was also evaluated on two nonlinear feedback control problems, including a ship heading control problem and the swing up control of a double-link underactuated pendulum called acrobot. Simulation results illustrate that the proposed method can optimize controller performance using little a priori information of uncertain dynamic systems. It is also demonstrated that KLSPI can be applied to online learning control by incorporating

  7. Robot Docking Based on Omnidirectional Vision and Reinforcement Learning

    NASA Astrophysics Data System (ADS)

    Muse, David; Weber, Cornelius; Wermter, Stefan

    We present a system for visual robotic docking using an omnidirectional camera coupled with the actor critic reinforcement learning algorithm. The system enables a PeopleBot robot to locate and approach a table so that it can pick an object from it using the pan-tilt camera mounted on the robot. We use a staged approach to solve this problem as there are distinct sub tasks and different sensors used. Starting with random wandering of the robot until the table is located via a landmark, and then a network trained via reinforcement allows the robot to rum to and approach the table. Once at the table the robot is to pick the object from it. We argue that our approach has a lot of potential allowing the learning of robot control for navigation removing the need for internal maps of the environment. This is achieved by allowing the robot to learn couplings between motor actions and the position of a landmark.

  8. Cognitive control predicts use of model-based reinforcement learning.

    PubMed

    Otto, A Ross; Skatova, Anya; Madlon-Kay, Seth; Daw, Nathaniel D

    2015-02-01

    Accounts of decision-making and its neural substrates have long posited the operation of separate, competing valuation systems in the control of choice behavior. Recent theoretical and experimental work suggest that this classic distinction between behaviorally and neurally dissociable systems for habitual and goal-directed (or more generally, automatic and controlled) choice may arise from two computational strategies for reinforcement learning (RL), called model-free and model-based RL, but the cognitive or computational processes by which one system may dominate over the other in the control of behavior is a matter of ongoing investigation. To elucidate this question, we leverage the theoretical framework of cognitive control, demonstrating that individual differences in utilization of goal-related contextual information--in the service of overcoming habitual, stimulus-driven responses--in established cognitive control paradigms predict model-based behavior in a separate, sequential choice task. The behavioral correspondence between cognitive control and model-based RL compellingly suggests that a common set of processes may underpin the two behaviors. In particular, computational mechanisms originally proposed to underlie controlled behavior may be applicable to understanding the interactions between model-based and model-free choice behavior. PMID:25170791

  9. B-tree search reinforcement learning for model based intelligent agent

    NASA Astrophysics Data System (ADS)

    Bhuvaneswari, S.; Vignashwaran, R.

    2013-03-01

    Agents trained by learning techniques provide a powerful approximation of active solutions for naive approaches. In this study using B - Trees implying reinforced learning the data search for information retrieval is moderated to achieve accuracy with minimum search time. The impact of variables and tactics applied in training are determined using reinforcement learning. Agents based on these techniques perform satisfactory baseline and act as finite agents based on the predetermined model against competitors from the course.

  10. Knowledge-Based Reinforcement Learning for Data Mining

    NASA Astrophysics Data System (ADS)

    Kudenko, Daniel; Grzes, Marek

    Data Mining is the process of extracting patterns from data. Two general avenues of research in the intersecting areas of agents and data mining can be distinguished. The first approach is concerned with mining an agent’s observation data in order to extract patterns, categorize environment states, and/or make predictions of future states. In this setting, data is normally available as a batch, and the agent’s actions and goals are often independent of the data mining task. The data collection is mainly considered as a side effect of the agent’s activities. Machine learning techniques applied in such situations fall into the class of supervised learning. In contrast, the second scenario occurs where an agent is actively performing the data mining, and is responsible for the data collection itself. For example, a mobile network agent is acquiring and processing data (where the acquisition may incur a certain cost), or a mobile sensor agent is moving in a (perhaps hostile) environment, collecting and processing sensor readings. In these settings, the tasks of the agent and the data mining are highly intertwined and interdependent (or even identical). Supervised learning is not a suitable technique for these cases. Reinforcement Learning (RL) enables an agent to learn from experience (in form of reward and punishment for explorative actions) and adapt to new situations, without a teacher. RL is an ideal learning technique for these data mining scenarios, because it fits the agent paradigm of continuous sensing and acting, and the RL agent is able to learn to make decisions on the sampling of the environment which provides the data. Nevertheless, RL still suffers from scalability problems, which have prevented its successful use in many complex real-world domains. The more complex the tasks, the longer it takes a reinforcement learning algorithm to converge to a good solution. For many real-world tasks, human expert knowledge is available. For example, human

  11. Incorporation of perception-based information in robot learning using fuzzy reinforcement learning agents

    NASA Astrophysics Data System (ADS)

    Changjiu, Zhou; Qingchun, Meng; Zhongwen, Guo; Wiefen, Qu; Bo, Yin

    2002-04-01

    Robot learning in unstructured environments has been proved to be an extremely challenging problem, mainly because of many uncertainties always present in the real world. Human beings, on the other hand, seem to cope very well with uncertain and unpredictable environments, often relying on perception-based information. Furthermore, humans beings can also utilize perceptions to guide their learning on those parts of the perception-action space that are actually relevant to the task. Therefore, we conduct a research aimed at improving robot learning through the incorporation of both perception-based and measurement-based information. For this reason, a fuzzy reinforcement learning (FRL) agent is proposed in this paper. Based on a neural-fuzzy architecture, different kinds of information can be incorporated into the FRL agent to initialise its action network, critic network and evaluation feedback module so as to accelerate its learning. By making use of the global optimisation capability of GAs (genetic algorithms), a GA-based FRL (GAFRL) agent is presented to solve the local minima problem in traditional actor-critic reinforcement learning. On the other hand, with the prediction capability of the critic network, GAs can perform a more effective global search. Different GAFRL agents are constructed and verified by using the simulation model of a physical biped robot. The simulation analysis shows that the biped learning rate for dynamic balance can be improved by incorporating perception-based information on biped balancing and walking evaluation. The biped robot can find its application in ocean exploration, detection or sea rescue activity, as well as military maritime activity.

  12. Reinforcement Learning Trees

    PubMed Central

    Zhu, Ruoqing; Zeng, Donglin; Kosorok, Michael R.

    2015-01-01

    In this paper, we introduce a new type of tree-based method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001) under high-dimensional settings. The innovations are three-fold. First, the new method implements reinforcement learning at each selection of a splitting variable during the tree construction processes. By splitting on the variable that brings the greatest future improvement in later splits, rather than choosing the one with largest marginal effect from the immediate split, the constructed tree utilizes the available samples in a more efficient way. Moreover, such an approach enables linear combination cuts at little extra computational cost. Second, we propose a variable muting procedure that progressively eliminates noise variables during the construction of each individual tree. The muting procedure also takes advantage of reinforcement learning and prevents noise variables from being considered in the search for splitting rules, so that towards terminal nodes, where the sample size is small, the splitting rules are still constructed from only strong variables. Last, we investigate asymptotic properties of the proposed method under basic assumptions and discuss rationale in general settings. PMID:26903687

  13. Reinforcement learning in scheduling

    NASA Technical Reports Server (NTRS)

    Dietterich, Tom G.; Ok, Dokyeong; Zhang, Wei; Tadepalli, Prasad

    1994-01-01

    The goal of this research is to apply reinforcement learning methods to real-world problems like scheduling. In this preliminary paper, we show that learning to solve scheduling problems such as the Space Shuttle Payload Processing and the Automatic Guided Vehicle (AGV) scheduling can be usefully studied in the reinforcement learning framework. We discuss some of the special challenges posed by the scheduling domain to these methods and propose some possible solutions we plan to implement.

  14. Reinforcement learning in supply chains.

    PubMed

    Valluri, Annapurna; North, Michael J; Macal, Charles M

    2009-10-01

    Effective management of supply chains creates value and can strategically position companies. In practice, human beings have been found to be both surprisingly successful and disappointingly inept at managing supply chains. The related fields of cognitive psychology and artificial intelligence have postulated a variety of potential mechanisms to explain this behavior. One of the leading candidates is reinforcement learning. This paper applies agent-based modeling to investigate the comparative behavioral consequences of three simple reinforcement learning algorithms in a multi-stage supply chain. For the first time, our findings show that the specific algorithm that is employed can have dramatic effects on the results obtained. Reinforcement learning is found to be valuable in multi-stage supply chains with several learning agents, as independent agents can learn to coordinate their behavior. However, learning in multi-stage supply chains using these postulated approaches from cognitive psychology and artificial intelligence take extremely long time periods to achieve stability which raises questions about their ability to explain behavior in real supply chains. The fact that it takes thousands of periods for agents to learn in this simple multi-agent setting provides new evidence that real world decision makers are unlikely to be using strict reinforcement learning in practice. PMID:19885962

  15. Online learning control by association and reinforcement.

    PubMed

    Si, J; Wang, Y T

    2001-01-01

    This paper focuses on a systematic treatment for developing a generic online learning control system based on the fundamental principle of reinforcement learning or more specifically neural dynamic programming. This online learning system improves its performance over time in two aspects: 1) it learns from its own mistakes through the reinforcement signal from the external environment and tries to reinforce its action to improve future performance; and 2) system states associated with the positive reinforcement is memorized through a network learning process where in the future, similar states will be more positively associated with a control action leading to a positive reinforcement. A successful candidate of online learning control design is introduced. Real-time learning algorithms is derived for individual components in the learning system. Some analytical insight is provided to give guidelines on the learning process took place in each module of the online learning control system. PMID:18244383

  16. Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning

    PubMed Central

    Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien

    2015-01-01

    Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction. PMID:26065018

  17. RPS Market Analysis Based on Reinforcement Learning in Power Systems

    NASA Astrophysics Data System (ADS)

    Sugano, Takanori; Kita, Hiroyuki; Tanaka, Eiichi; Hasegawa, Jun

    Deregulation and restructuring of electric power supply business are proceeding all over the world. In many cases, a competitive environment is introduced, where a market to transact electric power is established, and various attempts are done to decrease the price. On the other hand, environmental problems are pointed out in recent years. However, there is a possibility of the environmental deterioration by cost reduction of electric power. In this paper, the RPS (Renewable Portfolio Standard) system is taken up as the solution method of environmental problem under the deregulation of electric power supply business. A RPS model is created by multi-agent theory, where Q-learning is used as a decision-making technique of agent. By using this model, the RPS system is verified for its effectiveness and influence.

  18. Protein interaction network constructing based on text mining and reinforcement learning with application to prostate cancer.

    PubMed

    Zhu, Fei; Liu, Quan; Zhang, Xiaofang; Shen, Bairong

    2015-08-01

    Constructing interaction network from biomedical texts is a very important and interesting work. The authors take advantage of text mining and reinforcement learning approaches to establish protein interaction network. Considering the high computational efficiency of co-occurrence-based interaction extraction approaches and high precision of linguistic patterns approaches, the authors propose an interaction extracting algorithm where they utilise frequently used linguistic patterns to extract the interactions from texts and then find out interactions from extended unprocessed texts under the basic idea of co-occurrence approach, meanwhile they discount the interaction extracted from extended texts. They put forward a reinforcement learning-based algorithm to establish a protein interaction network, where nodes represent proteins and edges denote interactions. During the evolutionary process, a node selects another node and the attained reward determines which predicted interaction should be reinforced. The topology of the network is updated by the agent until an optimal network is formed. They used texts downloaded from PubMed to construct a prostate cancer protein interaction network by the proposed methods. The results show that their method brought out pretty good matching rate. Network topology analysis results also demonstrate that the curves of node degree distribution, node degree probability and probability distribution of constructed network accord with those of the scale-free network well. PMID:26243825

  19. Reinforcement Learning of Optimal Supervisor based on the Worst-Case Behavior

    NASA Astrophysics Data System (ADS)

    Kajiwara, Kouji; Yamasaki, Tatsushi

    The supervisory control initiated by Ramadge and Wonham is a framework for logical control of discrete event systems. In the original supervisory control, the costs for occurrence and disabling of events have not been considered. Then, the optimal supervisory control based on quatitative measures has also been studied. This paper proposes a synthesis method of the optimal supervisor based on the worst-case behavior of discrete event systems. We introduce the new value functions for the assigned control patterns. The new value functions are not based on the expected total rewards, but based on the most undesirable event occurrence in the assigned control pattern. In the proposed method, the supervisor learns how to assign the control pattern based on reinforcement learning so as to maximize the value functions. We show the efficiency of the proposed method by computer simulation.

  20. Hidden state and reinforcement learning with instance-based state identification.

    PubMed

    McCallum, R A

    1996-01-01

    Real robots with real sensors are not omniscient. When a robot's next course of action depends on information that is hidden from the sensors because of problems such as occlusion, restricted range, bounded field of view and limited attention, we say the robot suffers from the hidden state problem. State identification techniques use history information to uncover hidden state. Some previous approaches to encoding history include: finite state machines, recurrent neural networks and genetic programming with indexed memory. A chief disadvantage of all these techniques is their long training time. This paper presents instance-based state identification, a new approach to reinforcement learning with state identification that learns with much fewer training steps. Noting that learning with history and learning in continuous spaces both share the property that they begin without knowing the granularity of the state space, the approach applies instance-based (or "memory-based") learning to history sequences-instead of recording instances in a continuous geometrical space, we record instances in action-percept-reward sequence space. The first implementation of this approach, called Nearest Sequence Memory, learns with an order of magnitude fewer steps than several previous approaches. PMID:18263047

  1. Variability in Dopamine Genes Dissociates Model-Based and Model-Free Reinforcement Learning

    PubMed Central

    Bath, Kevin G.; Daw, Nathaniel D.; Frank, Michael J.

    2016-01-01

    Considerable evidence suggests that multiple learning systems can drive behavior. Choice can proceed reflexively from previous actions and their associated outcomes, as captured by “model-free” learning algorithms, or flexibly from prospective consideration of outcomes that might occur, as captured by “model-based” learning algorithms. However, differential contributions of dopamine to these systems are poorly understood. Dopamine is widely thought to support model-free learning by modulating plasticity in striatum. Model-based learning may also be affected by these striatal effects, or by other dopaminergic effects elsewhere, notably on prefrontal working memory function. Indeed, prominent demonstrations linking striatal dopamine to putatively model-free learning did not rule out model-based effects, whereas other studies have reported dopaminergic modulation of verifiably model-based learning, but without distinguishing a prefrontal versus striatal locus. To clarify the relationships between dopamine, neural systems, and learning strategies, we combine a genetic association approach in humans with two well-studied reinforcement learning tasks: one isolating model-based from model-free behavior and the other sensitive to key aspects of striatal plasticity. Prefrontal function was indexed by a polymorphism in the COMT gene, differences of which reflect dopamine levels in the prefrontal cortex. This polymorphism has been associated with differences in prefrontal activity and working memory. Striatal function was indexed by a gene coding for DARPP-32, which is densely expressed in the striatum where it is necessary for synaptic plasticity. We found evidence for our hypothesis that variations in prefrontal dopamine relate to model-based learning, whereas variations in striatal dopamine function relate to model-free learning. SIGNIFICANCE STATEMENT Decisions can stem reflexively from their previously associated outcomes or flexibly from deliberative

  2. Reinforcement learning for a biped robot based on a CPG-actor-critic method.

    PubMed

    Nakamura, Yutaka; Mori, Takeshi; Sato, Masa-aki; Ishii, Shin

    2007-08-01

    Animals' rhythmic movements, such as locomotion, are considered to be controlled by neural circuits called central pattern generators (CPGs), which generate oscillatory signals. Motivated by this biological mechanism, studies have been conducted on the rhythmic movements controlled by CPG. As an autonomous learning framework for a CPG controller, we propose in this article a reinforcement learning method we call the "CPG-actor-critic" method. This method introduces a new architecture to the actor, and its training is roughly based on a stochastic policy gradient algorithm presented recently. We apply this method to an automatic acquisition problem of control for a biped robot. Computer simulations show that training of the CPG can be successfully performed by our method, thus allowing the biped robot to not only walk stably but also adapt to environmental changes. PMID:17412559

  3. When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition

    ERIC Educational Resources Information Center

    Janssen, Christian P.; Gray, Wayne D.

    2012-01-01

    Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other…

  4. Batch Mode Reinforcement Learning based on the Synthesis of Artificial Trajectories.

    PubMed

    Fonteneau, Raphael; Murphy, Susan A; Wehenkel, Louis; Ernst, Damien

    2013-09-01

    In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value function. As an alternative to the use of function approximators, we rely on the synthesis of "artificial trajectories" from the given sample of trajectories, and show that this idea opens new avenues for designing and analyzing algorithms for batch mode reinforcement learning. PMID:24049244

  5. Batch Mode Reinforcement Learning based on the Synthesis of Artificial Trajectories

    PubMed Central

    Fonteneau, Raphael; Murphy, Susan A.; Wehenkel, Louis; Ernst, Damien

    2013-01-01

    In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value function. As an alternative to the use of function approximators, we rely on the synthesis of “artificial trajectories” from the given sample of trajectories, and show that this idea opens new avenues for designing and analyzing algorithms for batch mode reinforcement learning. PMID:24049244

  6. Collaborating Fuzzy Reinforcement Learning Agents

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1997-01-01

    Earlier, we introduced GARIC-Q, a new method for doing incremental Dynamic Programming using a society of intelligent agents which are controlled at the top level by Fuzzy Relearning and at the local level, each agent learns and operates based on ANTARCTIC, a technique for fuzzy reinforcement learning. In this paper, we show that it is possible for these agents to compete in order to affect the selected control policy but at the same time, they can collaborate while investigating the state space. In this model, the evaluator or the critic learns by observing all the agents behaviors but the control policy changes only based on the behavior of the winning agent also known as the super agent.

  7. Gaze data reveal distinct choice processes underlying model-based and model-free reinforcement learning

    PubMed Central

    Konovalov, Arkady; Krajbich, Ian

    2016-01-01

    Organisms appear to learn and make decisions using different strategies known as model-free and model-based learning; the former is mere reinforcement of previously rewarded actions and the latter is a forward-looking strategy that involves evaluation of action-state transition probabilities. Prior work has used neural data to argue that both model-based and model-free learners implement a value comparison process at trial onset, but model-based learners assign more weight to forward-looking computations. Here using eye-tracking, we report evidence for a different interpretation of prior results: model-based subjects make their choices prior to trial onset. In contrast, model-free subjects tend to ignore model-based aspects of the task and instead seem to treat the decision problem as a simple comparison process between two differentially valued items, consistent with previous work on sequential-sampling models of decision making. These findings illustrate a problem with assuming that experimental subjects make their decisions at the same prescribed time. PMID:27511383

  8. Gaze data reveal distinct choice processes underlying model-based and model-free reinforcement learning.

    PubMed

    Konovalov, Arkady; Krajbich, Ian

    2016-01-01

    Organisms appear to learn and make decisions using different strategies known as model-free and model-based learning; the former is mere reinforcement of previously rewarded actions and the latter is a forward-looking strategy that involves evaluation of action-state transition probabilities. Prior work has used neural data to argue that both model-based and model-free learners implement a value comparison process at trial onset, but model-based learners assign more weight to forward-looking computations. Here using eye-tracking, we report evidence for a different interpretation of prior results: model-based subjects make their choices prior to trial onset. In contrast, model-free subjects tend to ignore model-based aspects of the task and instead seem to treat the decision problem as a simple comparison process between two differentially valued items, consistent with previous work on sequential-sampling models of decision making. These findings illustrate a problem with assuming that experimental subjects make their decisions at the same prescribed time. PMID:27511383

  9. Hybrid Approach to Reinforcement Learning

    NASA Astrophysics Data System (ADS)

    Boulebtateche, Brahim; Fezari, Mourad; Boughazi, Mohamed

    2008-06-01

    Reinforcement Learning (RL) is a general framework in which an autonomous agent tries to learn an optimal policy of actions from direct interaction with the surrounding environment (RL). However, one difficulty for the application of RL control is its slow convergence, especially in environments with continuous state space. In this paper, a modified structure of RL is proposed to speed up reinforcement learning control. In this approach, supervision technique is combined with the standard Q-learning, a model-free algorithm of reinforcement learning. The a priori information is provided to the RL by an optimal LQ-controller, used to indicate preferred actions at intermittent times. It is shown that the convergence speed of the supervised RL agent is greatly improved compared to the conventional Q-Learning algorithm. Simulation work and results on the cart-pole balancing problem are given to illustrate the efficiency of the proposed method.

  10. Cognitive Control Predicts Use of Model-Based Reinforcement-Learning

    PubMed Central

    Otto, A. Ross; Skatova, Anya; Madlon-Kay, Seth; Daw, Nathaniel D.

    2015-01-01

    Accounts of decision-making and its neural substrates have long posited the operation of separate, competing valuation systems in the control of choice behavior. Recent theoretical and experimental work suggest that this classic distinction between behaviorally and neurally dissociable systems for habitual and goal-directed (or more generally, automatic and controlled) choice may arise from two computational strategies for reinforcement learning (RL), called model-free and model-based RL, but the cognitive or computational processes by which one system may dominate over the other in the control of behavior is a matter of ongoing investigation. To elucidate this question, we leverage the theoretical framework of cognitive control, demonstrating that individual differences in utilization of goal-related contextual information—in the service of overcoming habitual, stimulus-driven responses—in established cognitive control paradigms predict model-based behavior in a separate, sequential choice task. The behavioral correspondence between cognitive control and model-based RL compellingly suggests that a common set of processes may underpin the two behaviors. In particular, computational mechanisms originally proposed to underlie controlled behavior may be applicable to understanding the interactions between model-based and model-free choice behavior. PMID:25170791

  11. Using a board game to reinforce learning.

    PubMed

    Yoon, Bona; Rodriguez, Leslie; Faselis, Charles J; Liappis, Angelike P

    2014-03-01

    Experiential gaming strategies offer a variation on traditional learning. A board game was used to present synthesized content of fundamental catheter care concepts and reinforce evidence-based practices relevant to nursing. Board games are innovative educational tools that can enhance active learning. PMID:24588236

  12. Auto-exploratory average reward reinforcement learning

    SciTech Connect

    Ok, DoKyeong; Tadepalli, P.

    1996-12-31

    We introduce a model-based average reward Reinforcement Learning method called H-learning and compare it with its discounted counterpart, Adaptive Real-Time Dynamic Programming, in a simulated robot scheduling task. We also introduce an extension to H-learning, which automatically explores the unexplored parts of the state space, while always choosing greedy actions with respect to the current value function. We show that this {open_quotes}Auto-exploratory H-learning{close_quotes} performs better than the original H-learning under previously studied exploration methods such as random, recency-based, or counter-based exploration.

  13. A Flexible Mechanism of Rule Selection Enables Rapid Feature-Based Reinforcement Learning

    PubMed Central

    Balcarras, Matthew; Womelsdorf, Thilo

    2016-01-01

    Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or color) and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or color. Two-thirds of subjects (n = 22/32) exhibited behavior that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behavior of other subjects (n = 10/32) was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioral rules by leveraging simple model-free reinforcement learning and context

  14. A Flexible Mechanism of Rule Selection Enables Rapid Feature-Based Reinforcement Learning.

    PubMed

    Balcarras, Matthew; Womelsdorf, Thilo

    2016-01-01

    Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or color) and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or color. Two-thirds of subjects (n = 22/32) exhibited behavior that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behavior of other subjects (n = 10/32) was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioral rules by leveraging simple model-free reinforcement learning and context

  15. Learning to trade via direct reinforcement.

    PubMed

    Moody, J; Saffell, M

    2001-01-01

    We present methods for optimizing portfolios, asset allocations, and trading systems based on direct reinforcement (DR). In this approach, investment decision-making is viewed as a stochastic control problem, and strategies are discovered directly. We present an adaptive algorithm called recurrent reinforcement learning (RRL) for discovering investment policies. The need to build forecasting models is eliminated, and better trading performance is obtained. The direct reinforcement approach differs from dynamic programming and reinforcement algorithms such as TD-learning and Q-learning, which attempt to estimate a value function for the control problem. We find that the RRL direct reinforcement framework enables a simpler problem representation, avoids Bellman's curse of dimensionality and offers compelling advantages in efficiency. We demonstrate how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs. In extensive simulation work using real financial data, we find that our approach based on RRL produces better trading strategies than systems utilizing Q-learning (a value function method). Real-world applications include an intra-daily currency trader and a monthly asset allocation system for the S&P 500 Stock Index and T-Bills. PMID:18249919

  16. Application of fuzzy logic-neural network based reinforcement learning to proximity and docking operations

    NASA Technical Reports Server (NTRS)

    Jani, Yashvant

    1992-01-01

    As part of the Research Institute for Computing and Information Systems (RICIS) activity, the reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Max satellite simulation. This activity is carried out in the software technology laboratory utilizing the Orbital Operations Simulator (OOS). This interim report provides the status of the project and outlines the future plans.

  17. Corticostriatal circuit mechanisms of value-based action selection: Implementation of reinforcement learning algorithms and beyond.

    PubMed

    Morita, Kenji; Jitsev, Jenia; Morrison, Abigail

    2016-09-15

    Value-based action selection has been suggested to be realized in the corticostriatal local circuits through competition among neural populations. In this article, we review theoretical and experimental studies that have constructed and verified this notion, and provide new perspectives on how the local-circuit selection mechanisms implement reinforcement learning (RL) algorithms and computations beyond them. The striatal neurons are mostly inhibitory, and lateral inhibition among them has been classically proposed to realize "Winner-Take-All (WTA)" selection of the maximum-valued action (i.e., 'max' operation). Although this view has been challenged by the revealed weakness, sparseness, and asymmetry of lateral inhibition, which suggest more complex dynamics, WTA-like competition could still occur on short time scales. Unlike the striatal circuit, the cortical circuit contains recurrent excitation, which may enable retention or temporal integration of information and probabilistic "soft-max" selection. The striatal "max" circuit and the cortical "soft-max" circuit might co-implement an RL algorithm called Q-learning; the cortical circuit might also similarly serve for other algorithms such as SARSA. In these implementations, the cortical circuit presumably sustains activity representing the executed action, which negatively impacts dopamine neurons so that they can calculate reward-prediction-error. Regarding the suggested more complex dynamics of striatal, as well as cortical, circuits on long time scales, which could be viewed as a sequence of short WTA fragments, computational roles remain open: such a sequence might represent (1) sequential state-action-state transitions, constituting replay or simulation of the internal model, (2) a single state/action by the whole trajectory, or (3) probabilistic sampling of state/action. PMID:27173430

  18. Reinforcing Constructivist Teaching in Advanced Level Biochemistry through the Introduction of Case-Based Learning Activities

    ERIC Educational Resources Information Center

    Hartfield, Perry J.

    2010-01-01

    In the process of curriculum development, I have integrated a constructivist teaching strategy into an advanced-level biochemistry teaching unit. Specifically, I have introduced case-based learning activities into the teaching/learning framework. These case-based learning activities were designed to develop problem-solving skills, consolidate…

  19. Fuzzy Q-Learning for Generalization of Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1996-01-01

    Fuzzy Q-Learning, introduced earlier by the author, is an extension of Q-Learning into fuzzy environments. GARIC is a methodology for fuzzy reinforcement learning. In this paper, we introduce GARIC-Q, a new method for doing incremental Dynamic Programming using a society of intelligent agents which are controlled at the top level by Fuzzy Q-Learning and at the local level, each agent learns and operates based on GARIC. GARIC-Q improves the speed and applicability of Fuzzy Q-Learning through generalization of input space by using fuzzy rules and bridges the gap between Q-Learning and rule based intelligent systems.

  20. Rational and Mechanistic Perspectives on Reinforcement Learning

    ERIC Educational Resources Information Center

    Chater, Nick

    2009-01-01

    This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…

  1. Application of fuzzy logic-neural network based reinforcement learning to proximity and docking operations: Translational controller results

    NASA Technical Reports Server (NTRS)

    Jani, Yashvant

    1992-01-01

    The reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Maximum Mission (SMM) satellite simulation. In utilizing these fuzzy learning techniques, we also use the Approximate Reasoning based Intelligent Control (ARIC) architecture, and so we use two terms interchangeable to imply the same. This activity is carried out in the Software Technology Laboratory utilizing the Orbital Operations Simulator (OOS). This report is the deliverable D3 in our project activity and provides the test results of the fuzzy learning translational controller. This report is organized in six sections. Based on our experience and analysis with the attitude controller, we have modified the basic configuration of the reinforcement learning algorithm in ARIC as described in section 2. The shuttle translational controller and its implementation in fuzzy learning architecture is described in section 3. Two test cases that we have performed are described in section 4. Our results and conclusions are discussed in section 5, and section 6 provides future plans and summary for the project.

  2. From Creatures of Habit to Goal-Directed Learners: Tracking the Developmental Emergence of Model-Based Reinforcement Learning.

    PubMed

    Decker, Johannes H; Otto, A Ross; Daw, Nathaniel D; Hartley, Catherine A

    2016-06-01

    Theoretical models distinguish two decision-making strategies that have been formalized in reinforcement-learning theory. A model-based strategy leverages a cognitive model of potential actions and their consequences to make goal-directed choices, whereas a model-free strategy evaluates actions based solely on their reward history. Research in adults has begun to elucidate the psychological mechanisms and neural substrates underlying these learning processes and factors that influence their relative recruitment. However, the developmental trajectory of these evaluative strategies has not been well characterized. In this study, children, adolescents, and adults performed a sequential reinforcement-learning task that enabled estimation of model-based and model-free contributions to choice. Whereas a model-free strategy was apparent in choice behavior across all age groups, a model-based strategy was absent in children, became evident in adolescents, and strengthened in adults. These results suggest that recruitment of model-based valuation systems represents a critical cognitive component underlying the gradual maturation of goal-directed behavior. PMID:27084852

  3. Design issues for a reinforcement-based self-learning fuzzy controller

    NASA Technical Reports Server (NTRS)

    Yen, John; Wang, Haojin; Dauherity, Walter

    1993-01-01

    Fuzzy logic controllers have some often cited advantages over conventional techniques such as PID control: easy implementation, its accommodation to natural language, the ability to cover wider range of operating conditions and others. One major obstacle that hinders its broader application is the lack of a systematic way to develop and modify its rules and as result the creation and modification of fuzzy rules often depends on try-error or pure experimentation. One of the proposed approaches to address this issue is self-learning fuzzy logic controllers (SFLC) that use reinforcement learning techniques to learn the desirability of states and to adjust the consequent part of fuzzy control rules accordingly. Due to the different dynamics of the controlled processes, the performance of self-learning fuzzy controller is highly contingent on the design. The design issue has not received sufficient attention. The issues related to the design of a SFLC for the application to chemical process are discussed and its performance is compared with that of PID and self-tuning fuzzy logic controller.

  4. Design issues of a reinforcement-based self-learning fuzzy controller for petrochemical process control

    NASA Technical Reports Server (NTRS)

    Yen, John; Wang, Haojin; Daugherity, Walter C.

    1992-01-01

    Fuzzy logic controllers have some often-cited advantages over conventional techniques such as PID control, including easier implementation, accommodation to natural language, and the ability to cover a wider range of operating conditions. One major obstacle that hinders the broader application of fuzzy logic controllers is the lack of a systematic way to develop and modify their rules; as a result the creation and modification of fuzzy rules often depends on trial and error or pure experimentation. One of the proposed approaches to address this issue is a self-learning fuzzy logic controller (SFLC) that uses reinforcement learning techniques to learn the desirability of states and to adjust the consequent part of its fuzzy control rules accordingly. Due to the different dynamics of the controlled processes, the performance of a self-learning fuzzy controller is highly contingent on its design. The design issue has not received sufficient attention. The issues related to the design of a SFLC for application to a petrochemical process are discussed, and its performance is compared with that of a PID and a self-tuning fuzzy logic controller.

  5. Nearly data-based optimal control for linear discrete model-free systems with delays via reinforcement learning

    NASA Astrophysics Data System (ADS)

    Zhang, Jilie; Zhang, Huaguang; Wang, Binrui; Cai, Tiaoyang

    2016-05-01

    In this paper, a nearly data-based optimal control scheme is proposed for linear discrete model-free systems with delays. The nearly optimal control can be obtained using only measured input/output data from systems, by reinforcement learning technology, which combines Q-learning with value iterative algorithm. First, we construct a state estimator by using the measured input/output data. Second, the quadratic functional is used to approximate the value function at each point in the state space, and the data-based control is designed by Q-learning method using the obtained state estimator. Then, the paper states the method, that is, how to solve the optimal inner kernel matrix ? in the least-square sense, by value iteration algorithm. Finally, the numerical examples are given to illustrate the effectiveness of our approach.

  6. Autonomous reinforcement learning with experience replay.

    PubMed

    Wawrzyński, Paweł; Tanwani, Ajay Kumar

    2013-05-01

    This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. PMID:23237972

  7. Modular Inverse Reinforcement Learning for Visuomotor Behavior

    PubMed Central

    Rothkopf, Constantin A.; Ballard, Dana H.

    2013-01-01

    In a large variety of situations one would like to have an expressive and accurate model of observed animal or human behavior. While general purpose mathematical models may capture successfully properties of observed behavior, it is desirable to root models in biological facts. Because of ample empirical evidence for reward-based learning in visuomotor tasks we use a computational model based on the assumption that the observed agent is balancing the costs and benefits of its behavior to meet its goals. This leads to using the framework of Reinforcement Learning, which additionally provides well-established algorithms for learning of visuomotor task solutions. To quantify the agent’s goals as rewards implicit in the observed behavior we propose to use inverse reinforcement learning, which quantifies the agent’s goals as rewards implicit in the observed behavior. Based on the assumption of a modular cognitive architecture, we introduce a modular inverse reinforcement learning algorithm that estimates the relative reward contributions of the component tasks in navigation, consisting of following a path while avoiding obstacles and approaching targets. It is shown how to recover the component reward weights for individual tasks and that variability in observed trajectories can be explained succinctly through behavioral goals. It is demonstrated through simulations that good estimates can be obtained already with modest amounts of observation data, which in turn allows the prediction of behavior in novel configurations. PMID:23832417

  8. Drive-reinforcement learning system applications

    NASA Astrophysics Data System (ADS)

    Johnson, Daniel W.

    1992-07-01

    The application of Drive-Reinforcement (D-R) to the unsupervised learning of manipulator control functions was investigated. In particular, the ability of a D-R neuronal system to learn servo-level and trajectory-level controls for a robotic mechanism was assessed. Results indicate that D-R based systems can be successful at learning these functions in real-time with actual hardware. Moreover, since the control architectures are generic, the evidence suggests that D-R would be effective in control system applications outside the robotics arena.

  9. Reinforcement learning or active inference?

    PubMed

    Friston, Karl J; Daunizeau, Jean; Kiebel, Stefan J

    2009-01-01

    This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain. PMID:19641614

  10. Reinforcement Learning or Active Inference?

    PubMed Central

    Friston, Karl J.; Daunizeau, Jean; Kiebel, Stefan J.

    2009-01-01

    This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain. PMID:19641614

  11. Ensemble algorithms in reinforcement learning.

    PubMed

    Wiering, Marco A; van Hasselt, Hado

    2008-08-01

    This paper describes several ensemble methods that combine multiple different reinforcement learning (RL) algorithms in a single agent. The aim is to enhance learning speed and final performance by combining the chosen actions or action probabilities of different RL algorithms. We designed and implemented four different ensemble methods combining the following five different RL algorithms: Q-learning, Sarsa, actor-critic (AC), QV-learning, and AC learning automaton. The intuitively designed ensemble methods, namely, majority voting (MV), rank voting, Boltzmann multiplication (BM), and Boltzmann addition, combine the policies derived from the value functions of the different RL algorithms, in contrast to previous work where ensemble methods have been used in RL for representing and learning a single value function. We show experiments on five maze problems of varying complexity; the first problem is simple, but the other four maze tasks are of a dynamic or partially observable nature. The results indicate that the BM and MV ensembles significantly outperform the single RL algorithms. PMID:18632380

  12. A new computational account of cognitive control over reinforcement-based decision-making: Modeling of a probabilistic learning task.

    PubMed

    Zendehrouh, Sareh

    2015-11-01

    Recent work on decision-making field offers an account of dual-system theory for decision-making process. This theory holds that this process is conducted by two main controllers: a goal-directed system and a habitual system. In the reinforcement learning (RL) domain, the habitual behaviors are connected with model-free methods, in which appropriate actions are learned through trial-and-error experiences. However, goal-directed behaviors are associated with model-based methods of RL, in which actions are selected using a model of the environment. Studies on cognitive control also suggest that during processes like decision-making, some cortical and subcortical structures work in concert to monitor the consequences of decisions and to adjust control according to current task demands. Here a computational model is presented based on dual system theory and cognitive control perspective of decision-making. The proposed model is used to simulate human performance on a variant of probabilistic learning task. The basic proposal is that the brain implements a dual controller, while an accompanying monitoring system detects some kinds of conflict including a hypothetical cost-conflict one. The simulation results address existing theories about two event-related potentials, namely error related negativity (ERN) and feedback related negativity (FRN), and explore the best account of them. Based on the results, some testable predictions are also presented. PMID:26339919

  13. Hierarchical Bayesian inverse reinforcement learning.

    PubMed

    Choi, Jaedeug; Kim, Kee-Eung

    2015-04-01

    Inverse reinforcement learning (IRL) is the problem of inferring the underlying reward function from the expert's behavior data. The difficulty in IRL mainly arises in choosing the best reward function since there are typically an infinite number of reward functions that yield the given behavior data as optimal. Another difficulty comes from the noisy behavior data due to sub-optimal experts. We propose a hierarchical Bayesian framework, which subsumes most of the previous IRL algorithms as well as models the sub-optimality of the expert's behavior. Using a number of experiments on a synthetic problem, we demonstrate the effectiveness of our approach including the robustness of our hierarchical Bayesian framework to the sub-optimal expert behavior data. Using a real dataset from taxi GPS traces, we additionally show that our approach predicts the driving behavior with a high accuracy. PMID:25291805

  14. An Energy-Efficient Spectrum-Aware Reinforcement Learning-Based Clustering Algorithm for Cognitive Radio Sensor Networks.

    PubMed

    Mustapha, Ibrahim; Mohd Ali, Borhanuddin; Rasid, Mohd Fadlee A; Sali, Aduwati; Mohamad, Hafizal

    2015-01-01

    It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191

  15. An Energy-Efficient Spectrum-Aware Reinforcement Learning-Based Clustering Algorithm for Cognitive Radio Sensor Networks

    PubMed Central

    Mustapha, Ibrahim; Ali, Borhanuddin Mohd; Rasid, Mohd Fadlee A.; Sali, Aduwati; Mohamad, Hafizal

    2015-01-01

    It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191

  16. Effective reinforcement learning following cerebellar damage requires a balance between exploration and motor noise

    PubMed Central

    Therrien, Amanda S.; Wolpert, Daniel M.

    2016-01-01

    See Miall and Galea (doi: 10.1093/awv343) for a scientific commentary on this article. Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. PMID:26626368

  17. Effective reinforcement learning following cerebellar damage requires a balance between exploration and motor noise.

    PubMed

    Therrien, Amanda S; Wolpert, Daniel M; Bastian, Amy J

    2016-01-01

    Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. PMID:26626368

  18. Benchmarking for Bayesian Reinforcement Learning

    PubMed Central

    Ernst, Damien; Couëtoux, Adrien

    2016-01-01

    In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the collected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many BRL algorithms have already been proposed, but the benchmarks used to compare them are only relevant for specific cases. The paper addresses this problem, and provides a new BRL comparison methodology along with the corresponding open source library. In this methodology, a comparison criterion that measures the performance of algorithms on large sets of Markov Decision Processes (MDPs) drawn from some probability distributions is defined. In order to enable the comparison of non-anytime algorithms, our methodology also includes a detailed analysis of the computation time requirement of each algorithm. Our library is released with all source code and documentation: it includes three test problems, each of which has two different prior distributions, and seven state-of-the-art RL algorithms. Finally, our library is illustrated by comparing all the available algorithms and the results are discussed. PMID:27304891

  19. Reinforcement learning, conditioning, and the brain: Successes and challenges.

    PubMed

    Maia, Tiago V

    2009-12-01

    The field of reinforcement learning has greatly influenced the neuroscientific study of conditioning. This article provides an introduction to reinforcement learning followed by an examination of the successes and challenges using reinforcement learning to understand the neural bases of conditioning. Successes reviewed include (1) the mapping of positive and negative prediction errors to the firing of dopamine neurons and neurons in the lateral habenula, respectively; (2) the mapping of model-based and model-free reinforcement learning to associative and sensorimotor cortico-basal ganglia-thalamo-cortical circuits, respectively; and (3) the mapping of actor and critic to the dorsal and ventral striatum, respectively. Challenges reviewed consist of several behavioral and neural findings that are at odds with standard reinforcement-learning models, including, among others, evidence for hyperbolic discounting and adaptive coding. The article suggests ways of reconciling reinforcement-learning models with many of the challenging findings, and highlights the need for further theoretical developments where necessary. Additional information related to this study may be downloaded from http://cabn.psychonomic-journals.org/content/supplemental. PMID:19897789

  20. Reinforcement learning improves behaviour from evaluative feedback

    NASA Astrophysics Data System (ADS)

    Littman, Michael L.

    2015-05-01

    Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.

  1. Processing speed enhances model-based over model-free reinforcement learning in the presence of high working memory functioning

    PubMed Central

    Schad, Daniel J.; Jünger, Elisabeth; Sebold, Miriam; Garbusow, Maria; Bernhardt, Nadine; Javadi, Amir-Homayoun; Zimmermann, Ulrich S.; Smolka, Michael N.; Heinz, Andreas; Rapp, Michael A.; Huys, Quentin J. M.

    2014-01-01

    Theories of decision-making and its neural substrates have long assumed the existence of two distinct and competing valuation systems, variously described as goal-directed vs. habitual, or, more recently and based on statistical arguments, as model-free vs. model-based reinforcement-learning. Though both have been shown to control choices, the cognitive abilities associated with these systems are under ongoing investigation. Here we examine the link to cognitive abilities, and find that individual differences in processing speed covary with a shift from model-free to model-based choice control in the presence of above-average working memory function. This suggests shared cognitive and neural processes; provides a bridge between literatures on intelligence and valuation; and may guide the development of process models of different valuation components. Furthermore, it provides a rationale for individual differences in the tendency to deploy valuation systems, which may be important for understanding the manifold neuropsychiatric diseases associated with malfunctions of valuation. PMID:25566131

  2. Vicarious reinforcement learning signals when instructing others.

    PubMed

    Apps, Matthew A J; Lesage, Elise; Ramnani, Narender

    2015-02-18

    Reinforcement learning (RL) theory posits that learning is driven by discrepancies between the predicted and actual outcomes of actions (prediction errors [PEs]). In social environments, learning is often guided by similar RL mechanisms. For example, teachers monitor the actions of students and provide feedback to them. This feedback evokes PEs in students that guide their learning. We report the first study that investigates the neural mechanisms that underpin RL signals in the brain of a teacher. Neurons in the anterior cingulate cortex (ACC) signal PEs when learning from the outcomes of one's own actions but also signal information when outcomes are received by others. Does a teacher's ACC signal PEs when monitoring a student's learning? Using fMRI, we studied brain activity in human subjects (teachers) as they taught a confederate (student) action-outcome associations by providing positive or negative feedback. We examined activity time-locked to the students' responses, when teachers infer student predictions and know actual outcomes. We fitted a RL-based computational model to the behavior of the student to characterize their learning, and examined whether a teacher's ACC signals when a student's predictions are wrong. In line with our hypothesis, activity in the teacher's ACC covaried with the PE values in the model. Additionally, activity in the teacher's insula and ventromedial prefrontal cortex covaried with the predicted value according to the student. Our findings highlight that the ACC signals PEs vicariously for others' erroneous predictions, when monitoring and instructing their learning. These results suggest that RL mechanisms, processed vicariously, may underpin and facilitate teaching behaviors. PMID:25698730

  3. Vicarious Reinforcement Learning Signals When Instructing Others

    PubMed Central

    Lesage, Elise; Ramnani, Narender

    2015-01-01

    Reinforcement learning (RL) theory posits that learning is driven by discrepancies between the predicted and actual outcomes of actions (prediction errors [PEs]). In social environments, learning is often guided by similar RL mechanisms. For example, teachers monitor the actions of students and provide feedback to them. This feedback evokes PEs in students that guide their learning. We report the first study that investigates the neural mechanisms that underpin RL signals in the brain of a teacher. Neurons in the anterior cingulate cortex (ACC) signal PEs when learning from the outcomes of one's own actions but also signal information when outcomes are received by others. Does a teacher's ACC signal PEs when monitoring a student's learning? Using fMRI, we studied brain activity in human subjects (teachers) as they taught a confederate (student) action–outcome associations by providing positive or negative feedback. We examined activity time-locked to the students' responses, when teachers infer student predictions and know actual outcomes. We fitted a RL-based computational model to the behavior of the student to characterize their learning, and examined whether a teacher's ACC signals when a student's predictions are wrong. In line with our hypothesis, activity in the teacher's ACC covaried with the PE values in the model. Additionally, activity in the teacher's insula and ventromedial prefrontal cortex covaried with the predicted value according to the student. Our findings highlight that the ACC signals PEs vicariously for others' erroneous predictions, when monitoring and instructing their learning. These results suggest that RL mechanisms, processed vicariously, may underpin and facilitate teaching behaviors. PMID:25698730

  4. Do Hands-On, Technology-Based Activities Enhance Learning by Reinforcing Cognitive Knowledge and Retention?

    ERIC Educational Resources Information Center

    Korwin, Anthony R.; Jones, Ronald E.

    1990-01-01

    The geodesic dome concept was presented to 25 eighth graders through reading and a hands-on group assignment and to 25 via reading and lecture. Pre/posttest results showed that organized hands-on activities increased learning and retention of technological concepts. (SK)

  5. Adaptive Educational Software by Applying Reinforcement Learning

    ERIC Educational Resources Information Center

    Bennane, Abdellah

    2013-01-01

    The introduction of the intelligence in teaching software is the object of this paper. In software elaboration process, one uses some learning techniques in order to adapt the teaching software to characteristics of student. Generally, one uses the artificial intelligence techniques like reinforcement learning, Bayesian network in order to adapt…

  6. Control of nonaffine nonlinear discrete-time systems using reinforcement-learning-based linearly parameterized neural networks.

    PubMed

    Yang, Qinmin; Vance, Jonathan Blake; Jagannathan, S

    2008-08-01

    A nonaffine discrete-time system represented by the nonlinear autoregressive moving average with eXogenous input (NARMAX) representation with unknown nonlinear system dynamics is considered. An equivalent affinelike representation in terms of the tracking error dynamics is first obtained from the original nonaffine nonlinear discrete-time system so that reinforcement-learning-based near-optimal neural network (NN) controller can be developed. The control scheme consists of two linearly parameterized NNs. One NN is designated as the critic NN, which approximates a predefined long-term cost function, and an action NN is employed to derive a near-optimal control signal for the system to track a desired trajectory while minimizing the cost function simultaneously. The NN weights are tuned online. By using the standard Lyapunov approach, the stability of the closed-loop system is shown. The net result is a supervised actor-critic NN controller scheme which can be applied to a general nonaffine nonlinear discrete-time system without needing the affinelike representation. Simulation results demonstrate satisfactory performance of the controller. PMID:18632390

  7. Can model-free reinforcement learning explain deontological moral judgments?

    PubMed

    Ayars, Alisabeth

    2016-05-01

    Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations that seek to maximize certain kinds of outcomes. I present three concerns regarding the use of model-free reinforcement learning to explain deontological moral judgment. First, many actions that humans find aversive from model-free learning are not judged to be morally wrong. Moral judgment must require something in addition to model-free learning. Second, there is a dearth of evidence for central predictions of the reinforcement account-e.g., that people with different reinforcement histories will, all else equal, make different moral judgments. Finally, to account for the effect of intention within the framework requires certain assumptions which lack support. These challenges are reasonable foci for future empirical/theoretical work on the model-free/model-based framework. PMID:26918742

  8. Learning reaching strategies through reinforcement for a sensor-based manipulator.

    PubMed

    Martín, P; Millán, J del R

    1998-03-31

    This paper presents a neural controller that learns goal-oriented obstacle-avoiding reaction strategies for a multilink robot arm. It acquires these strategies on-line from local sensory data. The controller consists of two neural modules: an actor-critic module and a module for differential inverse kinematics (DIV). The input codification for the controller exploits the inherent symmetry of the robot arm kinematics. The actor-critic module generates actions with regard to the Shortest Path Vector (SPV) to the closest goal in the configuration space. However, the computation of the SPV is cumbersome for manipulators with more than two links. The DIV module aims to overcome the SPV calculation. This module provides a goal vector by means of the inversion of a neural network that has been trained previously to approximate the manipulator forward kinematics. Results for a two-link robot arm show that the combination of both modules speeds up the learning process. PMID:12662844

  9. A confidence metric for using neurobiological feedback in actor-critic reinforcement learning based brain-machine interfaces

    PubMed Central

    Prins, Noeline W.; Sanchez, Justin C.; Prasad, Abhishek

    2014-01-01

    Brain-Machine Interfaces (BMIs) can be used to restore function in people living with paralysis. Current BMIs require extensive calibration that increase the set-up times and external inputs for decoder training that may be difficult to produce in paralyzed individuals. Both these factors have presented challenges in transitioning the technology from research environments to activities of daily living (ADL). For BMIs to be seamlessly used in ADL, these issues should be handled with minimal external input thus reducing the need for a technician/caregiver to calibrate the system. Reinforcement Learning (RL) based BMIs are a good tool to be used when there is no external training signal and can provide an adaptive modality to train BMI decoders. However, RL based BMIs are sensitive to the feedback provided to adapt the BMI. In actor-critic BMIs, this feedback is provided by the critic and the overall system performance is limited by the critic accuracy. In this work, we developed an adaptive BMI that could handle inaccuracies in the critic feedback in an effort to produce more accurate RL based BMIs. We developed a confidence measure, which indicated how appropriate the feedback is for updating the decoding parameters of the actor. The results show that with the new update formulation, the critic accuracy is no longer a limiting factor for the overall performance. We tested and validated the system onthree different data sets: synthetic data generated by an Izhikevich neural spiking model, synthetic data with a Gaussian noise distribution, and data collected from a non-human primate engaged in a reaching task. All results indicated that the system with the critic confidence built in always outperformed the system without the critic confidence. Results of this study suggest the potential application of the technique in developing an autonomous BMI that does not need an external signal for training or extensive calibration. PMID:24904257

  10. A confidence metric for using neurobiological feedback in actor-critic reinforcement learning based brain-machine interfaces.

    PubMed

    Prins, Noeline W; Sanchez, Justin C; Prasad, Abhishek

    2014-01-01

    Brain-Machine Interfaces (BMIs) can be used to restore function in people living with paralysis. Current BMIs require extensive calibration that increase the set-up times and external inputs for decoder training that may be difficult to produce in paralyzed individuals. Both these factors have presented challenges in transitioning the technology from research environments to activities of daily living (ADL). For BMIs to be seamlessly used in ADL, these issues should be handled with minimal external input thus reducing the need for a technician/caregiver to calibrate the system. Reinforcement Learning (RL) based BMIs are a good tool to be used when there is no external training signal and can provide an adaptive modality to train BMI decoders. However, RL based BMIs are sensitive to the feedback provided to adapt the BMI. In actor-critic BMIs, this feedback is provided by the critic and the overall system performance is limited by the critic accuracy. In this work, we developed an adaptive BMI that could handle inaccuracies in the critic feedback in an effort to produce more accurate RL based BMIs. We developed a confidence measure, which indicated how appropriate the feedback is for updating the decoding parameters of the actor. The results show that with the new update formulation, the critic accuracy is no longer a limiting factor for the overall performance. We tested and validated the system onthree different data sets: synthetic data generated by an Izhikevich neural spiking model, synthetic data with a Gaussian noise distribution, and data collected from a non-human primate engaged in a reaching task. All results indicated that the system with the critic confidence built in always outperformed the system without the critic confidence. Results of this study suggest the potential application of the technique in developing an autonomous BMI that does not need an external signal for training or extensive calibration. PMID:24904257

  11. Evolution with Reinforcement Learning in Negotiation

    PubMed Central

    Zou, Yi; Zhan, Wenjie; Shao, Yuan

    2014-01-01

    Adaptive behavior depends less on the details of the negotiation process and makes more robust predictions in the long term as compared to in the short term. However, the extant literature on population dynamics for behavior adjustment has only examined the current situation. To offset this limitation, we propose a synergy of evolutionary algorithm and reinforcement learning to investigate long-term collective performance and strategy evolution. The model adopts reinforcement learning with a tradeoff between historical and current information to make decisions when the strategies of agents evolve through repeated interactions. The results demonstrate that the strategies in populations converge to stable states, and the agents gradually form steady negotiation habits. Agents that adopt reinforcement learning perform better in payoff, fairness, and stableness than their counterparts using classic evolutionary algorithm. PMID:25048108

  12. Reinforcement learning for port-hamiltonian systems.

    PubMed

    Sprangers, Olivier; Babuška, Robert; Nageshrao, Subramanya P; Lopes, Gabriel A D

    2015-05-01

    Passivity-based control (PBC) for port-Hamiltonian systems provides an intuitive way of achieving stabilization by rendering a system passive with respect to a desired storage function. However, in most instances the control law is obtained without any performance considerations and it has to be calculated by solving a complex partial differential equation (PDE). In order to address these issues we introduce a reinforcement learning (RL) approach into the energy-balancing passivity-based control (EB-PBC) method, which is a form of PBC in which the closed-loop energy is equal to the difference between the stored and supplied energies. We propose a technique to parameterize EB-PBC that preserves the systems's PDE matching conditions, does not require the specification of a global desired Hamiltonian, includes performance criteria, and is robust. The parameters of the control law are found by using actor-critic (AC) RL, enabling the search for near-optimal control policies satisfying a desired closed-loop energy landscape. The advantage is that the solutions learned can be interpreted in terms of energy shaping and damping injection, which makes it possible to numerically assess stability using passivity theory. From the RL perspective, our proposal allows for the class of port-Hamiltonian systems to be incorporated in the AC framework, speeding up the learning thanks to the resulting parameterization of the policy. The method has been successfully applied to the pendulum swing-up problem in simulations and real-life experiments. PMID:25167564

  13. Reinforcement learning in continuous time and space.

    PubMed

    Doya, K

    2000-01-01

    This article presents a reinforcement learning framework for continuous-time dynamical systems without a priori discretization of time, state, and action. Based on the Hamilton-Jacobi-Bellman (HJB) equation for infinite-horizon, discounted reward problems, we derive algorithms for estimating value functions and improving policies with the use of function approximators. The process of value function estimation is formulated as the minimization of a continuous-time form of the temporal difference (TD) error. Update methods based on backward Euler approximation and exponential eligibility traces are derived, and their correspondences with the conventional residual gradient, TD(0), and TD(lambda) algorithms are shown. For policy improvement, two methods-a continuous actor-critic method and a value-gradient-based greedy policy-are formulated. As a special case of the latter, a nonlinear feedback control law using the value gradient and the model of the input gain is derived. The advantage updating, a model-free algorithm derived previously, is also formulated in the HJB-based framework. The performance of the proposed algorithms is first tested in a nonlinear control task of swinging a pendulum up with limited torque. It is shown in the simulations that (1) the task is accomplished by the continuous actor-critic method in a number of trials several times fewer than by the conventional discrete actor-critic method; (2) among the continuous policy update methods, the value-gradient-based policy with a known or learned dynamic model performs several times better than the actor-critic method; and (3) a value function update using exponential eligibility traces is more efficient and stable than that based on Euler approximation. The algorithms are then tested in a higher-dimensional task: cart-pole swing-up. This task is accomplished in several hundred trials using the value-gradient-based policy with a learned dynamic model. PMID:10636940

  14. Refining Linear Fuzzy Rules by Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap S.; Malkani, Anil

    1996-01-01

    Linear fuzzy rules are increasingly being used in the development of fuzzy logic systems. Radial basis functions have also been used in the antecedents of the rules for clustering in product space which can automatically generate a set of linear fuzzy rules from an input/output data set. Manual methods are usually used in refining these rules. This paper presents a method for refining the parameters of these rules using reinforcement learning which can be applied in domains where supervised input-output data is not available and reinforcements are received only after a long sequence of actions. This is shown for a generalization of radial basis functions. The formation of fuzzy rules from data and their automatic refinement is an important step in closing the gap between the application of reinforcement learning methods in the domains where only some limited input-output data is available.

  15. Reinforcement learning: Solving two case studies

    NASA Astrophysics Data System (ADS)

    Duarte, Ana Filipa; Silva, Pedro; dos Santos, Cristina Peixoto

    2012-09-01

    Reinforcement Learning algorithms offer interesting features for the control of autonomous systems, such as the ability to learn from direct interaction with the environment, and the use of a simple reward signalas opposed to the input-outputs pairsused in classic supervised learning. The reward signal indicates the success of failure of the actions executed by the agent in the environment. In this work, are described RL algorithmsapplied to two case studies: the Crawler robot and the widely known inverted pendulum. We explore RL capabilities to autonomously learn a basic locomotion pattern in the Crawler, andapproach the balancing problem of biped locomotion using the inverted pendulum.

  16. Assist-as-needed robotic trainer based on reinforcement learning and its application to dart-throwing.

    PubMed

    Obayashi, Chihiro; Tamei, Tomoya; Shibata, Tomohiro

    2014-05-01

    This paper proposes a novel robotic trainer for motor skill learning. It is user-adaptive inspired by the assist-as-needed principle well known in the field of physical therapy. Most previous studies in the field of the robotic assistance of motor skill learning have used predetermined desired trajectories, and it has not been examined intensively whether these trajectories were optimal for each user. Furthermore, the guidance hypothesis states that humans tend to rely too much on external assistive feedback, resulting in interference with the internal feedback necessary for motor skill learning. A few studies have proposed a system that adjusts its assistive strength according to the user's performance in order to prevent the user from relying too much on the robotic assistance. There are, however, problems in these studies, in that a physical model of the user's motor system is required, which is inherently difficult to construct. In this paper, we propose a framework for a robotic trainer that is user-adaptive and that neither requires a specific desired trajectory nor a physical model of the user's motor system, and we achieve this using model-free reinforcement learning. We chose dart-throwing as an example motor-learning task as it is one of the simplest throwing tasks, and its performance can easily be and quantitatively measured. Training experiments with novices, aiming at maximizing the score with the darts and minimizing the physical robotic assistance, demonstrate the feasibility and plausibility of the proposed framework. PMID:24531040

  17. Reinforcement Learning in Information Searching

    ERIC Educational Resources Information Center

    Cen, Yonghua; Gan, Liren; Bai, Chen

    2013-01-01

    Introduction: The study seeks to answer two questions: How do university students learn to use correct strategies to conduct scholarly information searches without instructions? and, What are the differences in learning mechanisms between users at different cognitive levels? Method: Two groups of users, thirteen first year undergraduate students…

  18. Tunnel Ventilation Control Using Reinforcement Learning Methodology

    NASA Astrophysics Data System (ADS)

    Chu, Baeksuk; Kim, Dongnam; Hong, Daehie; Park, Jooyoung; Chung, Jin Taek; Kim, Tae-Hyung

    The main purpose of tunnel ventilation system is to maintain CO pollutant concentration and VI (visibility index) under an adequate level to provide drivers with comfortable and safe driving environment. Moreover, it is necessary to minimize power consumption used to operate ventilation system. To achieve the objectives, the control algorithm used in this research is reinforcement learning (RL) method. RL is a goal-directed learning of a mapping from situations to actions without relying on exemplary supervision or complete models of the environment. The goal of RL is to maximize a reward which is an evaluative feedback from the environment. In the process of constructing the reward of the tunnel ventilation system, two objectives listed above are included, that is, maintaining an adequate level of pollutants and minimizing power consumption. RL algorithm based on actor-critic architecture and gradient-following algorithm is adopted to the tunnel ventilation system. The simulations results performed with real data collected from existing tunnel ventilation system and real experimental verification are provided in this paper. It is confirmed that with the suggested controller, the pollutant level inside the tunnel was well maintained under allowable limit and the performance of energy consumption was improved compared to conventional control scheme.

  19. Online reinforcement learning for dynamic multimedia systems.

    PubMed

    Mastronarde, Nicholas; van der Schaar, Mihaela

    2010-02-01

    In our previous work, we proposed a systematic cross-layer framework for dynamic multimedia systems, which allows each layer to make autonomous and foresighted decisions that maximize the system's long-term performance, while meeting the application's real-time delay constraints. The proposed solution solved the cross-layer optimization offline, under the assumption that the multimedia system's probabilistic dynamics were known a priori, by modeling the system as a layered Markov decision process. In practice, however, these dynamics are unknown a priori and, therefore, must be learned online. In this paper, we address this problem by allowing the multimedia system layers to learn, through repeated interactions with each other, to autonomously optimize the system's long-term performance at run-time. The two key challenges in this layered learning setting are: (i) each layer's learning performance is directly impacted by not only its own dynamics, but also by the learning processes of the other layers with which it interacts; and (ii) selecting a learning model that appropriately balances time-complexity (i.e., learning speed) with the multimedia system's limited memory and the multimedia application's real-time delay constraints. We propose two reinforcement learning algorithms for optimizing the system under different design constraints: the first algorithm solves the cross-layer optimization in a centralized manner and the second solves it in a decentralized manner. We analyze both algorithms in terms of their required computation, memory, and interlayer communication overheads. After noting that the proposed reinforcement learning algorithms learn too slowly, we introduce a complementary accelerated learning algorithm that exploits partial knowledge about the system's dynamics in order to dramatically improve the system's performance. In our experiments, we demonstrate that decentralized learning can perform equally as well as centralized learning, while

  20. Fuzzy reinforcement learning control for compliance tasks of robotic manipulators.

    PubMed

    Tzafestas, S G; Rigatos, G G

    2002-01-01

    A fuzzy reinforcement learning (FRL) scheme which is based on the principles of sliding-mode control and fuzzy logic is proposed. The FRL uses only immediate reward. Sufficient conditions for the convergence of the FRL to the optimal task performance are studied. The validity of the method is tested through simulation examples of a robot which deburrs a metal surface. PMID:18238109

  1. Stress Modulates Reinforcement Learning in Younger and Older Adults

    PubMed Central

    Lighthall, Nichole R.; Gorlick, Marissa A.; Schoeke, Andrej; Frank, Michael J.; Mather, Mara

    2012-01-01

    Animal research and human neuroimaging studies indicate that stress increases dopamine levels in brain regions involved in reward processing and stress also appears to increase the attractiveness of addictive drugs. The current study tested the hypothesis that stress increases reward salience, leading to more effective learning about positive than negative outcomes in a probabilistic selection task. Changes to dopamine pathways with age raise the question of whether stress effects on incentive-based learning differ by age. Thus, the present study also examined whether effects of stress on reinforcement learning differed for younger (age 18–34) and older participants (age 65–85). Cold pressor stress was administered to half of the participants in each age group and salivary cortisol levels were used to confirm biophysiological response to cold stress. Following the manipulation, participants completed a probabilistic learning task involving positive and negative feedback. In both younger and older adults, stress enhanced learning about cues that predicted positive outcomes. In addition, during the initial learning phase, stress diminished sensitivity to recent feedback across age groups. These results indicate that stress affects reinforcement learning in both younger and older adults and suggests that stress exerts different effects on specific components of reinforcement learning depending on their neural underpinnings. PMID:22946523

  2. An extended reinforcement learning model of basal ganglia to understand the contributions of serotonin and dopamine in risk-based decision making, reward prediction, and punishment learning

    PubMed Central

    Balasubramani, Pragathi P.; Chakravarthy, V. Srinivasa; Ravindran, Balaraman; Moustafa, Ahmed A.

    2014-01-01

    Although empirical and neural studies show that serotonin (5HT) plays many functional roles in the brain, prior computational models mostly focus on its role in behavioral inhibition. In this study, we present a model of risk based decision making in a modified Reinforcement Learning (RL)-framework. The model depicts the roles of dopamine (DA) and serotonin (5HT) in Basal Ganglia (BG). In this model, the DA signal is represented by the temporal difference error (δ), while the 5HT signal is represented by a parameter (α) that controls risk prediction error. This formulation that accommodates both 5HT and DA reconciles some of the diverse roles of 5HT particularly in connection with the BG system. We apply the model to different experimental paradigms used to study the role of 5HT: (1) Risk-sensitive decision making, where 5HT controls risk assessment, (2) Temporal reward prediction, where 5HT controls time-scale of reward prediction, and (3) Reward/Punishment sensitivity, in which the punishment prediction error depends on 5HT levels. Thus the proposed integrated RL model reconciles several existing theories of 5HT and DA in the BG. PMID:24795614

  3. Application of fuzzy logic-neural network based reinforcement learning to proximity and docking operations: Attitude control results

    NASA Technical Reports Server (NTRS)

    Jani, Yashvant

    1992-01-01

    As part of the RICIS activity, the reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Max satellite simulation. This activity is carried out in the software technology laboratory utilizing the Orbital Operations Simulator (OOS). This report is deliverable D2 Altitude Control Results and provides the status of the project after four months of activities and outlines the future plans. In section 2 we describe the Fuzzy-Learner system for the attitude control functions. In section 3, we provide the description of test cases and results in a chronological order. In section 4, we have summarized our results and conclusions. Our future plans and recommendations are provided in section 5.

  4. A reinforcement learning approach to gait training improves retention

    PubMed Central

    Hasson, Christopher J.; Manczurowsky, Julia; Yen, Sheng-Che

    2015-01-01

    Many gait training programs are based on supervised learning principles: an individual is guided towards a desired gait pattern with directional error feedback. While this results in rapid adaptation, improvements quickly disappear. This study tested the hypothesis that a reinforcement learning approach improves retention and transfer of a new gait pattern. The results of a pilot study and larger experiment are presented. Healthy subjects were randomly assigned to either a supervised group, who received explicit instructions and directional error feedback while they learned a new gait pattern on a treadmill, or a reinforcement group, who was only shown whether they were close to or far from the desired gait. Subjects practiced for 10 min, followed by immediate and overnight retention and over-ground transfer tests. The pilot study showed that subjects could learn a new gait pattern under a reinforcement learning paradigm. The larger experiment, which had twice as many subjects (16 in each group) showed that the reinforcement group had better overnight retention than the supervised group (a 32% vs. 120% error increase, respectively), but there were no differences for over-ground transfer. These results suggest that encouraging participants to find rewarding actions through self-guided exploration is beneficial for retention. PMID:26379524

  5. Instructional control of reinforcement learning: a behavioral and neurocomputational investigation.

    PubMed

    Doll, Bradley B; Jacobs, W Jake; Sanfey, Alan G; Frank, Michael J

    2009-11-24

    Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is "overridden" at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract "Q-learning" and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a "confirmation bias" in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes. PMID:19595993

  6. Application of fuzzy logic-neural network based reinforcement learning to proximity and docking operations: Special approach/docking testcase results

    NASA Technical Reports Server (NTRS)

    Jani, Yashvant

    1993-01-01

    As part of the RICIS project, the reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Maximum Mission (SMM) satellite simulation. In utilizing these fuzzy learning techniques, we use the Approximate Reasoning based Intelligent Control (ARIC) architecture, and so we use these two terms interchangeably to imply the same. This activity is carried out in the Software Technology Laboratory utilizing the Orbital Operations Simulator (OOS) and programming/testing support from other contractor personnel. This report is the final deliverable D4 in our milestones and project activity. It provides the test results for the special testcase of approach/docking scenario for the shuttle and SMM satellite. Based on our experience and analysis with the attitude and translational controllers, we have modified the basic configuration of the reinforcement learning algorithm in ARIC. The shuttle translational controller and its implementation in ARIC is described in our deliverable D3. In order to simulate the final approach and docking operations, we have set-up this special testcase as described in section 2. The ARIC performance results for these operations are discussed in section 3 and conclusions are provided in section 4 along with the summary for the project.

  7. Reinforcement learning in professional basketball players.

    PubMed

    Neiman, Tal; Loewenstein, Yonatan

    2011-01-01

    Reinforcement learning in complex natural environments is a challenging task because the agent should generalize from the outcomes of actions taken in one state of the world to future actions in different states of the world. The extent to which human experts find the proper level of generalization is unclear. Here we show, using the sequences of field goal attempts made by professional basketball players, that the outcome of even a single field goal attempt has a considerable effect on the rate of subsequent 3 point shot attempts, in line with standard models of reinforcement learning. However, this change in behaviour is associated with negative correlations between the outcomes of successive field goal attempts. These results indicate that despite years of experience and high motivation, professional players overgeneralize from the outcomes of their most recent actions, which leads to decreased performance. PMID:22146388

  8. The Function of Direct and Vicarious Reinforcement in Human Learning.

    ERIC Educational Resources Information Center

    Owens, Carl R.; And Others

    The role of reinforcement has long been an issue in learning theory. The effects of reinforcement in learning were investigated under circumstances which made the information necessary for correct performance equally available to reinforced and nonreinforced subjects. Fourth graders (N=36) were given a pre-test of 20 items from the Peabody Picture…

  9. Structure identification in fuzzy inference using reinforcement learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap

    1993-01-01

    In our previous work on the GARIC architecture, we have shown that the system can start with surface structure of the knowledge base (i.e., the linguistic expression of the rules) and learn the deep structure (i.e., the fuzzy membership functions of the labels used in the rules) by using reinforcement learning. Assuming the surface structure, GARIC refines the fuzzy membership functions used in the consequents of the rules using a gradient descent procedure. This hybrid fuzzy logic and reinforcement learning approach can learn to balance a cart-pole system and to backup a truck to its docking location after a few trials. In this paper, we discuss how to do structure identification using reinforcement learning in fuzzy inference systems. This involves identifying both surface as well as deep structure of the knowledge base. The term set of fuzzy linguistic labels used in describing the values of each control variable must be derived. In this process, splitting a label refers to creating new labels which are more granular than the original label and merging two labels creates a more general label. Splitting and merging of labels directly transform the structure of the action selection network used in GARIC by increasing or decreasing the number of hidden layer nodes.

  10. Convergence of reinforcement learning algorithms and acceleration of learning

    NASA Astrophysics Data System (ADS)

    Potapov, A.; Ali, M. K.

    2003-02-01

    The techniques of reinforcement learning have been gaining increasing popularity recently. However, the question of their convergence rate is still open. We consider the problem of choosing the learning steps αn, and their relation with discount γ and exploration degree ɛ. Appropriate choices of these parameters may drastically influence the convergence rate of the techniques. From analytical examples, we conjecture optimal values of αn and then use numerical examples to verify our conjectures.

  11. Learning strategies in table tennis using inverse reinforcement learning.

    PubMed

    Muelling, Katharina; Boularias, Abdeslam; Mohler, Betty; Schölkopf, Bernhard; Peters, Jan

    2014-10-01

    Learning a complex task such as table tennis is a challenging problem for both robots and humans. Even after acquiring the necessary motor skills, a strategy is needed to choose where and how to return the ball to the opponent's court in order to win the game. The data-driven identification of basic strategies in interactive tasks, such as table tennis, is a largely unexplored problem. In this paper, we suggest a computational model for representing and inferring strategies, based on a Markov decision problem, where the reward function models the goal of the task as well as the strategic information. We show how this reward function can be discovered from demonstrations of table tennis matches using model-free inverse reinforcement learning. The resulting framework allows to identify basic elements on which the selection of striking movements is based. We tested our approach on data collected from players with different playing styles and under different playing conditions. The estimated reward function was able to capture expert-specific strategic information that sufficed to distinguish the expert among players with different skill levels as well as different playing styles. PMID:24756167

  12. Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis

    PubMed Central

    2013-01-01

    Background Depression is characterised partly by blunted reactions to reward. However, tasks probing this deficiency have not distinguished insensitivity to reward from insensitivity to the prediction errors for reward that determine learning and are putatively reported by the phasic activity of dopamine neurons. We attempted to disentangle these factors with respect to anhedonia in the context of stress, Major Depressive Disorder (MDD), Bipolar Disorder (BPD) and a dopaminergic challenge. Methods Six behavioural datasets involving 392 experimental sessions were subjected to a model-based, Bayesian meta-analysis. Participants across all six studies performed a probabilistic reward task that used an asymmetric reinforcement schedule to assess reward learning. Healthy controls were tested under baseline conditions, stress or after receiving the dopamine D2 agonist pramipexole. In addition, participants with current or past MDD or BPD were evaluated. Reinforcement learning models isolated the contributions of variation in reward sensitivity and learning rate. Results MDD and anhedonia reduced reward sensitivity more than they affected the learning rate, while a low dose of the dopamine D2 agonist pramipexole showed the opposite pattern. Stress led to a pattern consistent with a mixed effect on reward sensitivity and learning rate. Conclusion Reward-related learning reflected at least two partially separable contributions. The first related to phasic prediction error signalling, and was preferentially modulated by a low dose of the dopamine agonist pramipexole. The second related directly to reward sensitivity, and was preferentially reduced in MDD and anhedonia. Stress altered both components. Collectively, these findings highlight the contribution of model-based reinforcement learning meta-analysis for dissecting anhedonic behavior. PMID:23782813

  13. Reinforcement active learning in the vibrissae system: optimal object localization.

    PubMed

    Gordon, Goren; Dorfman, Nimrod; Ahissar, Ehud

    2013-01-01

    Rats move their whiskers to acquire information about their environment. It has been observed that they palpate novel objects and objects they are required to localize in space. We analyze whisker-based object localization using two complementary paradigms, namely, active learning and intrinsic-reward reinforcement learning. Active learning algorithms select the next training samples according to the hypothesized solution in order to better discriminate between correct and incorrect labels. Intrinsic-reward reinforcement learning uses prediction errors as the reward to an actor-critic design, such that behavior converges to the one that optimizes the learning process. We show that in the context of object localization, the two paradigms result in palpation whisking as their respective optimal solution. These results suggest that rats may employ principles of active learning and/or intrinsic reward in tactile exploration and can guide future research to seek the underlying neuronal mechanisms that implement them. Furthermore, these paradigms are easily transferable to biomimetic whisker-based artificial sensors and can improve the active exploration of their environment. PMID:22789551

  14. Reward and reinforcement activity in the nucleus accumbens during learning

    PubMed Central

    Gale, John T.; Shields, Donald C.; Ishizawa, Yumiko; Eskandar, Emad N.

    2014-01-01

    The nucleus accumbens core (NAcc) has been implicated in learning associations between sensory cues and profitable motor responses. However, the precise mechanisms that underlie these functions remain unclear. We recorded single-neuron activity from the NAcc of primates trained to perform a visual-motor associative learning task. During learning, we found two distinct classes of NAcc neurons. The first class demonstrated progressive increases in firing rates at the go-cue, feedback/tone and reward epochs of the task, as novel associations were learned. This suggests that these neurons may play a role in the exploitation of rewarding behaviors. In contrast, the second class exhibited attenuated firing rates, but only at the reward epoch of the task. These findings suggest that some NAcc neurons play a role in reward-based reinforcement during learning. PMID:24765069

  15. Framework for robot skill learning using reinforcement learning

    NASA Astrophysics Data System (ADS)

    Wei, Yingzi; Zhao, Mingyang

    2003-09-01

    Robot acquiring skill is a process similar to human skill learning. Reinforcement learning (RL) is an on-line actor critic method for a robot to develop its skill. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. We present an augmented reward function that provides a new way for RL controller to incorporate prior knowledge and experience into the RL controller. Also, the difference form of augmented reward function is considered carefully. The additional reward beyond conventional reward will provide more heuristic information for RL. In this paper, we present a strategy for the task of complex skill learning. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. The new form of value function is introduced to attain smooth motion switching swiftly. We present a formal, but practical, framework for robot skill learning and also illustrate with an example the utility of method for learning skilled robot control on line.

  16. Connectionist reinforcement learning of robot control skills

    NASA Astrophysics Data System (ADS)

    Araújo, Rui; Nunes, Urbano; de Almeida, A. T.

    1998-07-01

    Many robot manipulator tasks are difficult to model explicitly and it is difficult to design and program automatic control algorithms for them. The development, improvement, and application of learning techniques taking advantage of sensory information would enable the acquisition of new robot skills and avoid some of the difficulties of explicit programming. In this paper we use a reinforcement learning approach for on-line generation of skills for control of robot manipulator systems. Instead of generating skills by explicit programming of a perception to action mapping they are generated by trial and error learning, guided by a performance evaluation feedback function. The resulting system may be seen as an anticipatory system that constructs an internal representation model of itself and of its environment. This enables it to identify its current situation and to generate corresponding appropriate commands to the system in order to perform the required skill. The method was applied to the problem of learning a force control skill in which the tool-tip of a robot manipulator must be moved from a free space situation, to a contact state with a compliant surface and having a constant interaction force.

  17. Punishment Insensitivity and Impaired Reinforcement Learning in Preschoolers

    ERIC Educational Resources Information Center

    Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.

    2014-01-01

    Background: Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a…

  18. Reinforcement of Science Learning through Local Culture: A Delphi Study

    ERIC Educational Resources Information Center

    Nuangchalerm, Prasart

    2008-01-01

    This study aims to explore the ways to reinforce science learning through local culture by using Delphi technique. Twenty four participants in various fields of study were selected. The result of study provides a framework for reinforcement of science learning through local culture on the theme life and environment. (Contains 1 table.)

  19. Learning and tuning fuzzy logic controllers through reinforcements

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap

    1992-01-01

    This paper presents a new method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system. In particular, our generalized approximate reasoning-based intelligent control (GARIC) architecture (1) learns and tunes a fuzzy logic controller even when only weak reinforcement, such as a binary failure signal, is available; (2) introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; (3) introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and (4) learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward neural network, which can then adaptively improve performance by using gradient descent methods. We extend the AHC algorithm of Barto et al. (1983) to include the prior control knowledge of human operators. The GARIC architecture is applied to a cart-pole balancing system and demonstrates significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.

  20. Learning and tuning fuzzy logic controllers through reinforcements

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap

    1992-01-01

    A new method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system is presented. In particular, our Generalized Approximate Reasoning-based Intelligent Control (GARIC) architecture: (1) learns and tunes a fuzzy logic controller even when only weak reinforcements, such as a binary failure signal, is available; (2) introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; (3) introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and (4) learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward network, which can then adaptively improve performance by using gradient descent methods. We extend the AHC algorithm of Barto, Sutton, and Anderson to include the prior control knowledge of human operators. The GARIC architecture is applied to a cart-pole balancing system and has demonstrated significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.

  1. A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance.

    PubMed

    Ye, Cang; Yung, N C; Wang, Danwei

    2003-01-01

    Fuzzy logic systems are promising for efficient obstacle avoidance. However, it is difficult to maintain the correctness, consistency, and completeness of a fuzzy rule base constructed and tuned by a human expert. A reinforcement learning method is capable of learning the fuzzy rules automatically. However, it incurs a heavy learning phase and may result in an insufficiently learned rule base due to the curse of dimensionality. In this paper, we propose a neural fuzzy system with mixed coarse learning and fine learning phases. In the first phase, a supervised learning method is used to determine the membership functions for input and output variables simultaneously. After sufficient training, fine learning is applied which employs reinforcement learning algorithm to fine-tune the membership functions for output variables. For sufficient learning, a new learning method using a modification of Sutton and Barto's model is proposed to strengthen the exploration. Through this two-step tuning approach, the mobile robot is able to perform collision-free navigation. To deal with the difficulty of acquiring a large amount of training data with high consistency for supervised learning, we develop a virtual environment (VE) simulator, which is able to provide desktop virtual environment (DVE) and immersive virtual environment (IVE) visualization. Through operating a mobile robot in the virtual environment (DVE/IVE) by a skilled human operator, training data are readily obtained and used to train the neural fuzzy system. PMID:18238153

  2. Use of Inverse Reinforcement Learning for Identity Prediction

    NASA Technical Reports Server (NTRS)

    Hayes, Roy; Bao, Jonathan; Beling, Peter; Horowitz, Barry

    2011-01-01

    We adopt Markov Decision Processes (MDP) to model sequential decision problems, which have the characteristic that the current decision made by a human decision maker has an uncertain impact on future opportunity. We hypothesize that the individuality of decision makers can be modeled as differences in the reward function under a common MDP model. A machine learning technique, Inverse Reinforcement Learning (IRL), was used to learn an individual's reward function based on limited observation of his or her decision choices. This work serves as an initial investigation for using IRL to analyze decision making, conducted through a human experiment in a cyber shopping environment. Specifically, the ability to determine the demographic identity of users is conducted through prediction analysis and supervised learning. The results show that IRL can be used to correctly identify participants, at a rate of 68% for gender and 66% for one of three college major categories.

  3. Optimal Reward Functions in Distributed Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Wolpert, David H.; Tumer, Kagan

    2000-01-01

    We consider the design of multi-agent systems so as to optimize an overall world utility function when (1) those systems lack centralized communication and control, and (2) each agents runs a distinct Reinforcement Learning (RL) algorithm. A crucial issue in such design problems is to initialize/update each agent's private utility function, so as to induce best possible world utility. Traditional 'team game' solutions to this problem sidestep this issue and simply assign to each agent the world utility as its private utility function. In previous work we used the 'Collective Intelligence' framework to derive a better choice of private utility functions, one that results in world utility performance up to orders of magnitude superior to that ensuing from use of the team game utility. In this paper we extend these results. We derive the general class of private utility functions that both are easy for the individual agents to learn and that, if learned well, result in high world utility. We demonstrate experimentally that using these new utility functions can result in significantly improved performance over that of our previously proposed utility, over and above that previous utility's superiority to the conventional team game utility.

  4. Multiagent reinforcement learning in the Iterated Prisoner's Dilemma.

    PubMed

    Sandholm, T W; Crites, R H

    1996-01-01

    Reinforcement learning (RL) is based on the idea that the tendency to produce an action should be strengthened (reinforced) if it produces favorable results, and weakened if it produces unfavorable results. Q-learning is a recent RL algorithm that does not need a model of its environment and can be used on-line. Therefore, it is well suited for use in repeated games against an unknown opponent. Most RL research has been confined to single-agent settings or to multiagent settings where the agents have totally positively correlated payoffs (team problems) or totally negatively correlated payoffs (zero-sum games). This paper is an empirical study of reinforcement learning in the Iterated Prisoner's Dilemma (IPD), where the agents' payoffs are neither totally positively nor totally negatively correlated. RL is considerably more difficult in such a domain. This paper investigates the ability of a variety of Q-learning agents to play the IPD game against an unknown opponent. In some experiments, the opponent is the fixed strategy Tit-For-Tat, while in others it is another Q-learner. All the Q-learners learned to play optimally against Tit-For-Tat. Playing against another learner was more difficult because the adaptation of the other learner created a non-stationary environment, and because the other learner was not endowed with any a priori knowledge about the IPD game such as a policy designed to encourage cooperation. The learners that were studied varied along three dimensions: the length of history they received as context, the type of memory they employed (lookup tables based on restricted history windows or recurrent neural networks that can theoretically store features from arbitrarily deep in the past), and the exploration schedule they followed. Although all the learners faced difficulties when playing against other learners, agents with longer history windows, lookup table memories, and longer exploration schedules fared best in the IPD games. PMID:8924633

  5. Using Fuzzy Logic for Performance Evaluation in Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap S.

    1992-01-01

    Current reinforcement learning algorithms require long training periods which generally limit their applicability to small size problems. A new architecture is described which uses fuzzy rules to initialize its two neural networks: a neural network for performance evaluation and another for action selection. This architecture is applied to control of dynamic systems and it is demonstrated that it is possible to start with an approximate prior knowledge and learn to refine it through experiments using reinforcement learning.

  6. Context transfer in reinforcement learning using action-value functions.

    PubMed

    Mousavi, Amin; Nadjar Araabi, Babak; Nili Ahmadabadi, Majid

    2014-01-01

    This paper discusses the notion of context transfer in reinforcement learning tasks. Context transfer, as defined in this paper, implies knowledge transfer between source and target tasks that share the same environment dynamics and reward function but have different states or action spaces. In other words, the agents learn the same task while using different sensors and actuators. This requires the existence of an underlying common Markov decision process (MDP) to which all the agents' MDPs can be mapped. This is formulated in terms of the notion of MDP homomorphism. The learning framework is Q-learning. To transfer the knowledge between these tasks, the feature space is used as a translator and is expressed as a partial mapping between the state-action spaces of different tasks. The Q-values learned during the learning process of the source tasks are mapped to the sets of Q-values for the target task. These transferred Q-values are merged together and used to initialize the learning process of the target task. An interval-based approach is used to represent and merge the knowledge of the source tasks. Empirical results show that the transferred initialization can be beneficial to the learning process of the target task. PMID:25610457

  7. Changes in corticostriatal connectivity during reinforcement learning in humans

    PubMed Central

    Horga, Guillermo; Maia, Tiago V.; Marsh, Rachel; Hao, Xuejun; Xu, Dongrong; Duan, Yunsuo; Tau, Gregory Z.; Graniello, Barbara; Wang, Zhishun; Kangarlu, Alayar; Martinez, Diana; Packard, Mark G.; Peterson, Bradley S.

    2015-01-01

    Many computational models assume that reinforcement learning relies on changes in synaptic efficacy between cortical regions representing stimuli and striatal regions involved in response selection, but this assumption has thus far lacked empirical support in humans. We recorded hemodynamic signals with fMRI while participants navigated a virtual maze to find hidden rewards. We fitted a reinforcement-learning algorithm to participants’ choice behavior and evaluated the neural activity and the changes in functional connectivity related to trial-by-trial learning variables. Activity in the posterior putamen during choice periods increased progressively during learning. Furthermore, the functional connections between the sensorimotor cortex and the posterior putamen strengthened progressively as participants learned the task. These changes in corticostriatal connectivity differentiated participants who learned the task from those who did not. These findings provide a direct link between changes in corticostriatal connectivity and learning, thereby supporting a central assumption common to several computational models of reinforcement learning. PMID:25393839

  8. Switching Reinforcement Learning for Continuous Action Space

    NASA Astrophysics Data System (ADS)

    Nagayoshi, Masato; Murao, Hajime; Tamaki, Hisashi

    Reinforcement Learning (RL) attracts much attention as a technique of realizing computational intelligence such as adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL into practical use. This difficulty includes a problem of designing a suitable action space of an agent, i.e., satisfying two requirements in trade-off: (i) to keep the characteristics (or structure) of an original search space as much as possible in order to seek strategies that lie close to the optimal, and (ii) to reduce the search space as much as possible in order to expedite the learning process. In order to design a suitable action space adaptively, we propose switching RL model to mimic a process of an infant's motor development in which gross motor skills develop before fine motor skills. Then, a method for switching controllers is constructed by introducing and referring to the “entropy”. Further, through computational experiments by using robot navigation problems with one and two-dimensional continuous action space, the validity of the proposed method has been confirmed.

  9. Coevolutionary networks of reinforcement-learning agents

    NASA Astrophysics Data System (ADS)

    Kianercy, Ardeshir; Galstyan, Aram

    2013-07-01

    This paper presents a model of network formation in repeated games where the players adapt their strategies and network ties simultaneously using a simple reinforcement-learning scheme. It is demonstrated that the coevolutionary dynamics of such systems can be described via coupled replicator equations. We provide a comprehensive analysis for three-player two-action games, which is the minimum system size with nontrivial structural dynamics. In particular, we characterize the Nash equilibria (NE) in such games and examine the local stability of the rest points corresponding to those equilibria. We also study general n-player networks via both simulations and analytical methods and find that, in the absence of exploration, the stable equilibria consist of star motifs as the main building blocks of the network. Furthermore, in all stable equilibria the agents play pure strategies, even when the game allows mixed NE. Finally, we study the impact of exploration on learning outcomes and observe that there is a critical exploration rate above which the symmetric and uniformly connected network topology becomes stable.

  10. Preliminary Work for Examining the Scalability of Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Clouse, Jeff

    1998-01-01

    Researchers began studying automated agents that learn to perform multiple-step tasks early in the history of artificial intelligence (Samuel, 1963; Samuel, 1967; Waterman, 1970; Fikes, Hart & Nilsonn, 1972). Multiple-step tasks are tasks that can only be solved via a sequence of decisions, such as control problems, robotics problems, classic problem-solving, and game-playing. The objective of agents attempting to learn such tasks is to use the resources they have available in order to become more proficient at the tasks. In particular, each agent attempts to develop a good policy, a mapping from states to actions, that allows it to select actions that optimize a measure of its performance on the task; for example, reducing the number of steps necessary to complete the task successfully. Our study focuses on reinforcement learning, a set of learning techniques where the learner performs trial-and-error experiments in the task and adapts its policy based on the outcome of those experiments. Much of the work in reinforcement learning has focused on a particular, simple representation, where every problem state is represented explicitly in a table, and associated with each state are the actions that can be chosen in that state. A major advantage of this table lookup representation is that one can prove that certain reinforcement learning techniques will develop an optimal policy for the current task. The drawback is that the representation limits the application of reinforcement learning to multiple-step tasks with relatively small state-spaces. There has been a little theoretical work that proves that convergence to optimal solutions can be obtained when using generalization structures, but the structures are quite simple. The theory says little about complex structures, such as multi-layer, feedforward artificial neural networks (Rumelhart & McClelland, 1986), but empirical results indicate that the use of reinforcement learning with such structures is promising

  11. On the integration of reinforcement learning and approximate reasoning for control

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1991-01-01

    The author discusses the importance of strengthening the knowledge representation characteristic of reinforcement learning techniques using methods such as approximate reasoning. The ARIC (approximate reasoning-based intelligent control) architecture is an example of such a hybrid approach in which the fuzzy control rules are modified (fine-tuned) using reinforcement learning. ARIC also demonstrates that it is possible to start with an approximately correct control knowledge base and learn to refine this knowledge through further experience. On the other hand, techniques such as the TD (temporal difference) algorithm and Q-learning establish stronger theoretical foundations for their use in adaptive control and also in stability analysis of hybrid reinforcement learning and approximate reasoning-based controllers.

  12. Reinforcement learning of motor skills with policy gradients.

    PubMed

    Peters, Jan; Schaal, Stefan

    2008-05-01

    Autonomous learning is one of the hallmarks of human and animal behavior, and understanding the principles of learning will be crucial in order to achieve true autonomy in advanced machines like humanoid robots. In this paper, we examine learning of complex motor skills with human-like limbs. While supervised learning can offer useful tools for bootstrapping behavior, e.g., by learning from demonstration, it is only reinforcement learning that offers a general approach to the final trial-and-error improvement that is needed by each individual acquiring a skill. Neither neurobiological nor machine learning studies have, so far, offered compelling results on how reinforcement learning can be scaled to the high-dimensional continuous state and action spaces of humans or humanoids. Here, we combine two recent research developments on learning motor control in order to achieve this scaling. First, we interpret the idea of modular motor control by means of motor primitives as a suitable way to generate parameterized control policies for reinforcement learning. Second, we combine motor primitives with the theory of stochastic policy gradient learning, which currently seems to be the only feasible framework for reinforcement learning for humanoids. We evaluate different policy gradient methods with a focus on their applicability to parameterized motor primitives. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm. PMID:18482830

  13. Reinforcement Learning in a Nonstationary Environment: The El Farol Problem

    NASA Technical Reports Server (NTRS)

    Bell, Ann Maria

    1999-01-01

    This paper examines the performance of simple learning rules in a complex adaptive system based on a coordination problem modeled on the El Farol problem. The key features of the El Farol problem are that it typically involves a medium number of agents and that agents' pay-off functions have a discontinuous response to increased congestion. First we consider a single adaptive agent facing a stationary environment. We demonstrate that the simple learning rules proposed by Roth and Er'ev can be extremely sensitive to small changes in the initial conditions and that events early in a simulation can affect the performance of the rule over a relatively long time horizon. In contrast, a reinforcement learning rule based on standard practice in the computer science literature converges rapidly and robustly. The situation is reversed when multiple adaptive agents interact: the RE algorithms often converge rapidly to a stable average aggregate attendance despite the slow and erratic behavior of individual learners, while the CS based learners frequently over-attend in the early and intermediate terms. The symmetric mixed strategy equilibria is unstable: all three learning rules ultimately tend towards pure strategies or stabilize in the medium term at non-equilibrium probabilities of attendance. The brittleness of the algorithms in different contexts emphasize the importance of thorough and thoughtful examination of simulation-based results.

  14. A Brain-like Learning System with Supervised, Unsupervised and Reinforcement Learning

    NASA Astrophysics Data System (ADS)

    Sasakawa, Takafumi; Hu, Jinglu; Hirasawa, Kotaro

    Our brain has three different learning paradigms: supervised, unsupervised and reinforcement learning. And it is suggested that those learning paradigms relate deeply to the cerebellum, cerebral cortex and basal ganglia in the brain, respectively. Inspired by these knowledge of brain, we present a brain-like learning system with those three different learning algorithms. The proposed system consists of three parts: the supervised learning (SL) part, the unsupervised learning (UL) part and the reinforcement learning (RL) part. The SL part, corresponding to the cerebellum of brain, learns an input-output mapping by supervised learning. The UL part, corresponding to the cerebral cortex of brain, is a competitive learning network, and divides an input space to subspaces by unsupervised learning. The RL part, corresponding to the basal ganglia of brain, optimizes the model performance by reinforcement learning. Numerical simulations show that the proposed brain-like learning system optimizes its performance automatically and has superior performance to an ordinary neural network.

  15. The role of GABAB receptors in human reinforcement learning.

    PubMed

    Ort, Andres; Kometer, Michael; Rohde, Judith; Seifritz, Erich; Vollenweider, Franz X

    2014-10-01

    Behavioral evidence from human studies suggests that the γ-aminobutyric acid type B receptor (GABAB receptor) agonist baclofen modulates reinforcement learning and reduces craving in patients with addiction spectrum disorders. However, in contrast to the well established role of dopamine in reinforcement learning, the mechanisms by which the GABAB receptor influences reinforcement learning in humans remain completely unknown. To further elucidate this issue, a cross-over, double-blind, placebo-controlled study was performed in healthy human subjects (N=15) to test the effects of baclofen (20 and 50mg p.o.) on probabilistic reinforcement learning. Outcomes were the feedback-induced P2 component of the event-related potential, the feedback-related negativity, and the P300 component of the event-related potential. Baclofen produced a reduction of P2 amplitude over the course of the experiment, but did not modulate the feedback-related negativity. Furthermore, there was a trend towards increased learning after baclofen administration relative to placebo over the course of the experiment. The present results extend previous theories of reinforcement learning, which focus on the importance of mesolimbic dopamine signaling, and indicate that stimulation of cortical GABAB receptors in a fronto-parietal network leads to better attentional allocation in reinforcement learning. This observation is a first step in our understanding of how baclofen may improve reinforcement learning in healthy subjects. Further studies with bigger sample sizes are needed to corroborate this conclusion and furthermore, test this effect in patients with addiction spectrum disorder. PMID:25194227

  16. Grounding the Meanings in Sensorimotor Behavior using Reinforcement Learning.

    PubMed

    Farkaš, Igor; Malík, Tomáš; Rebrová, Kristína

    2012-01-01

    The recent outburst of interest in cognitive developmental robotics is fueled by the ambition to propose ecologically plausible mechanisms of how, among other things, a learning agent/robot could ground linguistic meanings in its sensorimotor behavior. Along this stream, we propose a model that allows the simulated iCub robot to learn the meanings of actions (point, touch, and push) oriented toward objects in robot's peripersonal space. In our experiments, the iCub learns to execute motor actions and comment on them. Architecturally, the model is composed of three neural-network-based modules that are trained in different ways. The first module, a two-layer perceptron, is trained by back-propagation to attend to the target position in the visual scene, given the low-level visual information and the feature-based target information. The second module, having the form of an actor-critic architecture, is the most distinguishing part of our model, and is trained by a continuous version of reinforcement learning to execute actions as sequences, based on a linguistic command. The third module, an echo-state network, is trained to provide the linguistic description of the executed actions. The trained model generalizes well in case of novel action-target combinations with randomized initial arm positions. It can also promptly adapt its behavior if the action/target suddenly changes during motor execution. PMID:22393319

  17. Reinforcement Learning and Dopamine in Schizophrenia: Dimensions of Symptoms or Specific Features of a Disease Group?

    PubMed Central

    Deserno, Lorenz; Boehme, Rebecca; Heinz, Andreas; Schlagenhauf, Florian

    2013-01-01

    Abnormalities in reinforcement learning are a key finding in schizophrenia and have been proposed to be linked to elevated levels of dopamine neurotransmission. Behavioral deficits in reinforcement learning and their neural correlates may contribute to the formation of clinical characteristics of schizophrenia. The ability to form predictions about future outcomes is fundamental for environmental interactions and depends on neuronal teaching signals, like reward prediction errors. While aberrant prediction errors, that encode non-salient events as surprising, have been proposed to contribute to the formation of positive symptoms, a failure to build neural representations of decision values may result in negative symptoms. Here, we review behavioral and neuroimaging research in schizophrenia and focus on studies that implemented reinforcement learning models. In addition, we discuss studies that combined reinforcement learning with measures of dopamine. Thereby, we suggest how reinforcement learning abnormalities in schizophrenia may contribute to the formation of psychotic symptoms and may interact with cognitive deficits. These ideas point toward an interplay of more rigid versus flexible control over reinforcement learning. Pronounced deficits in the flexible or model-based domain may allow for a detailed characterization of well-established cognitive deficits in schizophrenia patients based on computational models of learning. Finally, we propose a framework based on the potentially crucial contribution of dopamine to dysfunctional reinforcement learning on the level of neural networks. Future research may strongly benefit from computational modeling but also requires further methodological improvement for clinical group studies. These research tools may help to improve our understanding of disease-specific mechanisms and may help to identify clinically relevant subgroups of the heterogeneous entity schizophrenia. PMID:24391603

  18. Reinforcement Learning in Large Scale Systems Using State Generalization and Multi-Agent Techniques

    NASA Astrophysics Data System (ADS)

    Kimura, Hajime; Aoki, Kei; Kobayashi, Shigenobu

    This paper introduces several problems in reinforcement learning of industrial applications, and shows some techniques to overcome it. Reinforcement learning is known as on-line learning of an input-output mapping through a process of trial and error interactions with its uncertain environment, however, the trial and error will cause fatal damages in real applications. We introduce a planning method, based on reinforcement learning in the simulator. It can be seen as a stochastic approximation of dynamic programming in Markov decision processes. But in large problems, simple grid-tiling to quantize state space for tabular Q-learning is still infeasible. We introduce a generalization technique to approximate value functions in continuous state space, and a multiagent architecture to solve large scale problems. The efficiency of these techniques are shown through experiments in a sewage water-flow control system.

  19. Reinforcement-learning-based output-feedback control of nonstrict nonlinear discrete-time systems with application to engine emission control.

    PubMed

    Shih, Peter; Kaul, Brian C; Jagannathan, Sarangapani; Drallmeier, James A

    2009-10-01

    A novel reinforcement-learning-based output adaptive neural network (NN) controller, which is also referred to as the adaptive-critic NN controller, is developed to deliver the desired tracking performance for a class of nonlinear discrete-time systems expressed in nonstrict feedback form in the presence of bounded and unknown disturbances. The adaptive-critic NN controller consists of an observer, a critic, and two action NNs. The observer estimates the states and output, and the two action NNs provide virtual and actual control inputs to the nonlinear discrete-time system. The critic approximates a certain strategic utility function, and the action NNs minimize the strategic utility function and control inputs. All NN weights adapt online toward minimization of a performance index, utilizing the gradient-descent-based rule, in contrast with iteration-based adaptive-critic schemes. Lyapunov functions are used to show the stability of the closed-loop tracking error, weights, and observer estimates. Separation and certainty equivalence principles, persistency of excitation condition, and linearity in the unknown parameter assumption are not needed. Experimental results on a spark ignition (SI) engine operating lean at an equivalence ratio of 0.75 show a significant (25%) reduction in cyclic dispersion in heat release with control, while the average fuel input changes by less than 1% compared with the uncontrolled case. Consequently, oxides of nitrogen (NO(x)) drop by 30%, and unburned hydrocarbons drop by 16% with control. Overall, NO(x)'s are reduced by over 80% compared with stoichiometric levels. PMID:19336317

  20. Microstimulation of the human substantia nigra alters reinforcement learning.

    PubMed

    Ramayya, Ashwin G; Misra, Amrit; Baltuch, Gordon H; Kahana, Michael J

    2014-05-14

    Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action-reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action-reward associations rather than stimulus-reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action-reward associations during reinforcement learning. PMID:24828643

  1. Role of Dopamine D2 Receptors in Human Reinforcement Learning

    PubMed Central

    Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W

    2014-01-01

    Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well. PMID:24713613

  2. Social Learning, Reinforcement and Crime: Evidence from Three European Cities

    ERIC Educational Resources Information Center

    Tittle, Charles R.; Antonaccio, Olena; Botchkovar, Ekaterina

    2012-01-01

    This study reports a cross-cultural test of Social Learning Theory using direct measures of social learning constructs and focusing on the causal structure implied by the theory. Overall, the results strongly confirm the main thrust of the theory. Prior criminal reinforcement and current crime-favorable definitions are highly related in all three…

  3. Democratic reinforcement: learning via self-organization

    SciTech Connect

    Stassinopoulos, D.; Bak, P.

    1995-12-31

    The problem of learning in the absence of external intelligence is discussed in the context of a simple model. The model consists of a set of randomly connected, or layered integrate-and fire neurons. Inputs to and outputs from the environment are connected randomly to subsets of neurons. The connections between firing neurons are strengthened or weakened according to whether the action is successful or not. The model departs from the traditional gradient-descent based approaches to learning by operating at a highly susceptible ``critical`` state, with low activity and sparse connections between firing neurons. Quantitative studies on the performance of our model in a simple association task show that by tuning our system close to this critical state we can obtain dramatic gains in performance.

  4. Reinforcement-learning-based dual-control methodology for complex nonlinear discrete-time systems with application to spark engine EGR operation.

    PubMed

    Shih, Peter; Kaul, Brian C; Jagannathan, S; Drallmeier, James A

    2008-08-01

    A novel reinforcement-learning-based dual-control methodology adaptive neural network (NN) controller is developed to deliver a desired tracking performance for a class of complex feedback nonlinear discrete-time systems, which consists of a second-order nonlinear discrete-time system in nonstrict feedback form and an affine nonlinear discrete-time system, in the presence of bounded and unknown disturbances. For example, the exhaust gas recirculation (EGR) operation of a spark ignition (SI) engine is modeled by using such a complex nonlinear discrete-time system. A dual-controller approach is undertaken where primary adaptive critic NN controller is designed for the nonstrict feedback nonlinear discrete-time system whereas the secondary one for the affine nonlinear discrete-time system but the controllers together offer the desired performance. The primary adaptive critic NN controller includes an NN observer for estimating the states and output, an NN critic, and two action NNs for generating virtual control and actual control inputs for the nonstrict feedback nonlinear discrete-time system, whereas an additional critic NN and an action NN are included for the affine nonlinear discrete-time system by assuming the state availability. All NN weights adapt online towards minimization of a certain performance index, utilizing gradient-descent-based rule. Using Lyapunov theory, the uniformly ultimate boundedness (UUB) of the closed-loop tracking error, weight estimates, and observer estimates are shown. The adaptive critic NN controller performance is evaluated on an SI engine operating with high EGR levels where the controller objective is to reduce cyclic dispersion in heat release while minimizing fuel intake. Simulation and experimental results indicate that engine out emissions drop significantly at 20% EGR due to reduction in dispersion in heat release thus verifying the dual-control approach. PMID:18701368

  5. Human-level control through deep reinforcement learning

    NASA Astrophysics Data System (ADS)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-01

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  6. Human-level control through deep reinforcement learning.

    PubMed

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-26

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks

  7. Novel reinforcement learning approach for difficult control problems

    NASA Astrophysics Data System (ADS)

    Becus, Georges A.; Thompson, Edward A.

    1997-09-01

    We review work conducted over the past several years and aimed at developing reinforcement learning architectures for solving difficult control problems and based on and inspired by associative control process (ACP) networks. We briefly review ACP networks able to reproduce many classical instrumental conditioning test results observed in animal research and to engage in real-time, closed-loop, goal-seeking interactions with their environment. Chronologically, our contributions include the ideally interfaced ACP network which is endowed with hierarchical, attention, and failure recognition interface mechanisms which greatly enhanced the capabilities of the original ACP network. When solving the cart-pole problem, it achieves 100 percent reliability and a reduction in training time similar to that of Baird and Klopf's modified ACP network and additionally an order of magnitude reduction in number of failures experienced for successful training. Next we introduced the command and control center/internal drive (Cid) architecture for artificial neural learning systems. It consists of a hierarchy of command and control centers governing motor selection networks. Internal drives, similar hunger, thirst, or reproduction in biological systems, are formed within the controller to facilitate learning. Efficiency, reliability, and adjustability of this architecture were demonstrated on the benchmark cart-pole control problem. A comparison with other artificial learning systems indicates that it learns over 100 times faster than Barto, et al's adaptive search element/adaptive critic element, experiencing less failures by more than an order of magnitude while capable of being fine-tuned by the user, on- line, for improved performance without additional training. Finally we present work in progress on a 'peaks and valleys' scheme which moves away from the one-dimensional learning mechanism currently found in Cid and shows promises in solving even more difficult learning control

  8. Social Cognition as Reinforcement Learning: Feedback Modulates Emotion Inference.

    PubMed

    Zaki, Jamil; Kallman, Seth; Wimmer, G Elliott; Ochsner, Kevin; Shohamy, Daphna

    2016-09-01

    Neuroscientific studies of social cognition typically employ paradigms in which perceivers draw single-shot inferences about the internal states of strangers. Real-world social inference features much different parameters: People often encounter and learn about particular social targets (e.g., friends) over time and receive feedback about whether their inferences are correct or incorrect. Here, we examined this process and, more broadly, the intersection between social cognition and reinforcement learning. Perceivers were scanned using fMRI while repeatedly encountering three social targets who produced conflicting visual and verbal emotional cues. Perceivers guessed how targets felt and received feedback about whether they had guessed correctly. Visual cues reliably predicted one target's emotion, verbal cues predicted a second target's emotion, and neither reliably predicted the third target's emotion. Perceivers successfully used this information to update their judgments over time. Furthermore, trial-by-trial learning signals-estimated using two reinforcement learning models-tracked activity in ventral striatum and ventromedial pFC, structures associated with reinforcement learning, and regions associated with updating social impressions, including TPJ. These data suggest that learning about others' emotions, like other forms of feedback learning, relies on domain-general reinforcement mechanisms as well as domain-specific social information processing. PMID:27167401

  9. Reinforcement learning for resource allocation in LEO satellite networks.

    PubMed

    Usaha, Wipawee; Barria, Javier A

    2007-06-01

    In this paper, we develop and assess online decision-making algorithms for call admission and routing for low Earth orbit (LEO) satellite networks. It has been shown in a recent paper that, in a LEO satellite system, a semi-Markov decision process formulation of the call admission and routing problem can achieve better performance in terms of an average revenue function than existing routing methods. However, the conventional dynamic programming (DP) numerical solution becomes prohibited as the problem size increases. In this paper, two solution methods based on reinforcement learning (RL) are proposed in order to circumvent the computational burden of DP. The first method is based on an actor-critic method with temporal-difference (TD) learning. The second method is based on a critic-only method, called optimistic TD learning. The algorithms enhance performance in terms of requirements in storage, computational complexity and computational time, and in terms of an overall long-term average revenue function that penalizes blocked calls. Numerical studies are carried out, and the results obtained show that the RL framework can achieve up to 56% higher average revenue over existing routing methods used in LEO satellite networks with reasonable storage and computational requirements. PMID:17550108

  10. Decentralized reinforcement-learning control and emergence of motion patterns

    NASA Astrophysics Data System (ADS)

    Svinin, Mikhail; Yamada, Kazuyaki; Okhura, Kazuhiro; Ueda, Kanji

    1998-10-01

    In this paper we propose a system for studying emergence of motion patterns in autonomous mobile robotic systems. The system implements an instance-based reinforcement learning control. Three spaces are of importance in formulation of the control scheme. They are the work space, the sensor space, and the action space. Important feature of our system is that all these spaces are assumed to be continuous. The core part of the system is a classifier system. Based on the sensory state space analysis, the control is decentralized and is specified at the lowest level of the control system. However, the local controllers are implicitly connected through the perceived environment information. Therefore, they constitute a dynamic environment with respect to each other. The proposed control scheme is tested under simulation for a mobile robot in a navigation task. It is shown that some patterns of global behavior--such as collision avoidance, wall-following, light-seeking--can emerge from the local controllers.

  11. Improved Adaptive-Reinforcement Learning Control for morphing unmanned air vehicles.

    PubMed

    Valasek, John; Doebbler, James; Tandale, Monish D; Meade, Andrew J

    2008-08-01

    This paper presents an improved Adaptive-Reinforcement Learning Control methodology for the problem of unmanned air vehicle morphing control. The reinforcement learning morphing control function that learns the optimal shape change policy is integrated with an adaptive dynamic inversion control trajectory tracking function. An episodic unsupervised learning simulation using the Q-learning method is developed to replace an earlier and less accurate Actor-Critic algorithm. Sequential Function Approximation, a Galerkin-based scattered data approximation scheme, replaces a K-Nearest Neighbors (KNN) method and is used to generalize the learning from previously experienced quantized states and actions to the continuous state-action space, all of which may not have been experienced before. The improved method showed smaller errors and improved learning of the optimal shape compared to the KNN. PMID:18632393

  12. Learning arm's posture control using reinforcement learning and feedback-error-learning.

    PubMed

    Kambara, H; Kim, J; Sato, M; Koike, Y

    2004-01-01

    In this paper, we propose a learning model using the Actor-Critic method and the feedback-error-learning scheme. The Actor-Critic method, which is one of the major frameworks in reinforcement learning, has attracted attention as a computational learning model in the basal ganglia. Meanwhile, the feedback-error-learning is learning architecture proposed as a computationally coherent model of cerebellar motor learning. This learning architecture's purpose is to acquire a feed-forward controller by using a feedback controller's output as an error signal. In past researches, a predetermined constant gain feedback controller was used for the feedback-error-learning. We use the Actor-Critic method for obtaining a feedback controller in the feedback-error-earning. By applying the proposed learning model to an arm's posture control, we show that high-performance feedback and feed-forward controller can be acquired from only by using a scalar value of reward. PMID:17271719

  13. The Computational Development of Reinforcement Learning during Adolescence.

    PubMed

    Palminteri, Stefano; Kilford, Emma J; Coricelli, Giorgio; Blakemore, Sarah-Jayne

    2016-06-01

    Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence. PMID:27322574

  14. The Computational Development of Reinforcement Learning during Adolescence

    PubMed Central

    Palminteri, Stefano; Coricelli, Giorgio; Blakemore, Sarah-Jayne

    2016-01-01

    Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents’ behaviour was better explained by a basic reinforcement learning algorithm, adults’ behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence. PMID:27322574

  15. Punishment insensitivity and impaired reinforcement learning in preschoolers

    PubMed Central

    Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.

    2013-01-01

    Background Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a developmental vulnerability to psychopathic traits. Methods 157 preschoolers (mean age 4.7 ±0.8 years) participated in a substudy that was embedded within a larger project. Children completed the “Stars-in-Jars” task, which involved learning to select rewarded jars and avoid punished jars. Maternal report of responsiveness to socialization was assessed with the Punishment Insensitivity and Low Concern for Others scales of the Multidimensional Assessment of Preschool Disruptive Behavior (MAP-DB). Results Punishment Insensitivity, but not Low Concern for Others, was significantly associated with reinforcement learning in multivariate models that accounted for age and sex. Specifically, higher Punishment Insensitivity was associated with significantly lower overall performance and more errors on punished trials (“passive avoidance”). Conclusions Impairments in reinforcement learning manifest in preschoolers who are high in maternal ratings of Punishment Insensitivity. If replicated, these findings may help to pinpoint the neurodevelopmental antecedents of psychopathic tendencies and suggest novel intervention targets beginning in early childhood. PMID:24033313

  16. Stochastic optimization of multireservoir systems via reinforcement learning

    NASA Astrophysics Data System (ADS)

    Lee, Jin-Hee; Labadie, John W.

    2007-11-01

    Although several variants of stochastic dynamic programming have been applied to optimal operation of multireservoir systems, they have been plagued by a high-dimensional state space and the inability to accurately incorporate the stochastic environment as characterized by temporally and spatially correlated hydrologic inflows. Reinforcement learning has emerged as an effective approach to solving sequential decision problems by combining concepts from artificial intelligence, cognitive science, and operations research. A reinforcement learning system has a mathematical foundation similar to dynamic programming and Markov decision processes, with the goal of maximizing the long-term reward or returns as conditioned on the state of the system environment and the immediate reward obtained from operational decisions. Reinforcement learning can include Monte Carlo simulation where transition probabilities and rewards are not explicitly known a priori. The Q-Learning method in reinforcement learning is demonstrated on the two-reservoir Geum River system, South Korea, and is shown to outperform implicit stochastic dynamic programming and sampling stochastic dynamic programming methods.

  17. Learning the specific quality of taste reinforcement in larval Drosophila

    PubMed Central

    Schleyer, Michael; Miura, Daisuke; Tanimura, Teiichi; Gerber, Bertram

    2015-01-01

    The only property of reinforcement insects are commonly thought to learn about is its value. We show that larval Drosophila not only remember the value of reinforcement (How much?), but also its quality (What?). This is demonstrated both within the appetitive domain by using sugar vs amino acid as different reward qualities, and within the aversive domain by using bitter vs high-concentration salt as different qualities of punishment. From the available literature, such nuanced memories for the quality of reinforcement are unexpected and pose a challenge to present models of how insect memory is organized. Given that animals as simple as larval Drosophila, endowed with but 10,000 neurons, operate with both reinforcement value and quality, we suggest that both are fundamental aspects of mnemonic processing—in any brain. DOI: http://dx.doi.org/10.7554/eLife.04711.001 PMID:25622533

  18. Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms

    PubMed Central

    Daniel, Reka; Geana, Andra; Gershman, Samuel J.; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C.

    2015-01-01

    In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this “representation learning” process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the “curse of dimensionality” in reinforcement learning. PMID:26019331

  19. Human Operant Learning under Concurrent Reinforcement of Response Variability

    ERIC Educational Resources Information Center

    Maes, J. H. R.; van der Goot, M.

    2006-01-01

    This study asked whether the concurrent reinforcement of behavioral variability facilitates learning to emit a difficult target response. Sixty students repeatedly pressed sequences of keys, with an originally infrequently occurring target sequence consistently being followed by positive feedback. Three conditions differed in the feedback given to…

  20. Kinesthetic Reinforcement-Is It a Boon to Learning?

    ERIC Educational Resources Information Center

    Bohrer, Roxilu K.

    1970-01-01

    Language instruction, particularly in the elementary school, should be reinforced through the use of visual aids and through associated physical activity. Kinesthetic experiences provide an opportunity to make use of non-verbal cues to meaning, enliven classroom activities, and maximize learning for pupils. The author discusses the educational…

  1. Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.

    PubMed

    Ezaki, Takahiro; Horita, Yutaka; Takezawa, Masanori; Masuda, Naoki

    2016-07-01

    Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody) conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations. PMID:27438888

  2. Reinforcement Learning in Young Adults with Developmental Language Impairment

    ERIC Educational Resources Information Center

    Lee, Joanna C.; Tomblin, J. Bruce

    2012-01-01

    The aim of the study was to examine reinforcement learning (RL) in young adults with developmental language impairment (DLI) within the context of a neurocomputational model of the basal ganglia-dopamine system (Frank, Seeberger, & O'Reilly, 2004). Two groups of young adults, one with DLI and the other without, were recruited. A probabilistic…

  3. Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin

    PubMed Central

    Ezaki, Takahiro; Horita, Yutaka; Masuda, Naoki

    2016-01-01

    Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner’s dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody) conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations. PMID:27438888

  4. Reinforcement learning techniques for controlling resources in power networks

    NASA Astrophysics Data System (ADS)

    Kowli, Anupama Sunil

    As power grids transition towards increased reliance on renewable generation, energy storage and demand response resources, an effective control architecture is required to harness the full functionalities of these resources. There is a critical need for control techniques that recognize the unique characteristics of the different resources and exploit the flexibility afforded by them to provide ancillary services to the grid. The work presented in this dissertation addresses these needs. Specifically, new algorithms are proposed, which allow control synthesis in settings wherein the precise distribution of the uncertainty and its temporal statistics are not known. These algorithms are based on recent developments in Markov decision theory, approximate dynamic programming and reinforcement learning. They impose minimal assumptions on the system model and allow the control to be "learned" based on the actual dynamics of the system. Furthermore, they can accommodate complex constraints such as capacity and ramping limits on generation resources, state-of-charge constraints on storage resources, comfort-related limitations on demand response resources and power flow limits on transmission lines. Numerical studies demonstrating applications of these algorithms to practical control problems in power systems are discussed. Results demonstrate how the proposed control algorithms can be used to improve the performance and reduce the computational complexity of the economic dispatch mechanism in a power network. We argue that the proposed algorithms are eminently suitable to develop operational decision-making tools for large power grids with many resources and many sources of uncertainty.

  5. A reinforcement learning model of joy, distress, hope and fear

    NASA Astrophysics Data System (ADS)

    Broekens, Joost; Jacobs, Elmer; Jonker, Catholijn M.

    2015-07-01

    In this paper we computationally study the relation between adaptive behaviour and emotion. Using the reinforcement learning framework, we propose that learned state utility, ?, models fear (negative) and hope (positive) based on the fact that both signals are about anticipation of loss or gain. Further, we propose that joy/distress is a signal similar to the error signal. We present agent-based simulation experiments that show that this model replicates psychological and behavioural dynamics of emotion. This work distinguishes itself by assessing the dynamics of emotion in an adaptive agent framework - coupling it to the literature on habituation, development, extinction and hope theory. Our results support the idea that the function of emotion is to provide a complex feedback signal for an organism to adapt its behaviour. Our work is relevant for understanding the relation between emotion and adaptation in animals, as well as for human-robot interaction, in particular how emotional signals can be used to communicate between adaptive agents and humans.

  6. Investigation of a Reinforcement-Based Toilet Training Procedure for Children with Autism.

    ERIC Educational Resources Information Center

    Cicero, Frank R.; Pfadt, Al

    2002-01-01

    This study evaluated the effectiveness of a reinforcement-based toilet training intervention with three children with autism. Procedures included positive reinforcement, graduated guidance, scheduled practice trials, and forward prompting. All three children reduced urination accidents to zero and learned to request bathroom use spontaneously…

  7. Neurofeedback in Learning Disabled Children: Visual versus Auditory Reinforcement.

    PubMed

    Fernández, Thalía; Bosch-Bayard, Jorge; Harmony, Thalía; Caballero, María I; Díaz-Comas, Lourdes; Galán, Lídice; Ricardo-Garcell, Josefina; Aubert, Eduardo; Otero-Ojeda, Gloria

    2016-03-01

    Children with learning disabilities (LD) frequently have an EEG characterized by an excess of theta and a deficit of alpha activities. NFB using an auditory stimulus as reinforcer has proven to be a useful tool to treat LD children by positively reinforcing decreases of the theta/alpha ratio. The aim of the present study was to optimize the NFB procedure by comparing the efficacy of visual (with eyes open) versus auditory (with eyes closed) reinforcers. Twenty LD children with an abnormally high theta/alpha ratio were randomly assigned to the Auditory or the Visual group, where a 500 Hz tone or a visual stimulus (a white square), respectively, was used as a positive reinforcer when the value of the theta/alpha ratio was reduced. Both groups had signs consistent with EEG maturation, but only the Auditory Group showed behavioral/cognitive improvements. In conclusion, the auditory reinforcer was more efficacious in reducing the theta/alpha ratio, and it improved the cognitive abilities more than the visual reinforcer. PMID:26294269

  8. Working memory contributions to reinforcement learning impairments in schizophrenia.

    PubMed

    Collins, Anne G E; Brown, Jaime K; Gold, James M; Waltz, James A; Frank, Michael J

    2014-10-01

    Previous research has shown that patients with schizophrenia are impaired in reinforcement learning tasks. However, behavioral learning curves in such tasks originate from the interaction of multiple neural processes, including the basal ganglia- and dopamine-dependent reinforcement learning (RL) system, but also prefrontal cortex-dependent cognitive strategies involving working memory (WM). Thus, it is unclear which specific system induces impairments in schizophrenia. We recently developed a task and computational model allowing us to separately assess the roles of RL (slow, cumulative learning) mechanisms versus WM (fast but capacity-limited) mechanisms in healthy adult human subjects. Here, we used this task to assess patients' specific sources of impairments in learning. In 15 separate blocks, subjects learned to pick one of three actions for stimuli. The number of stimuli to learn in each block varied from two to six, allowing us to separate influences of capacity-limited WM from the incremental RL system. As expected, both patients (n = 49) and healthy controls (n = 36) showed effects of set size and delay between stimulus repetitions, confirming the presence of working memory effects. Patients performed significantly worse than controls overall, but computational model fits and behavioral analyses indicate that these deficits could be entirely accounted for by changes in WM parameters (capacity and reliability), whereas RL processes were spared. These results suggest that the working memory system contributes strongly to learning impairments in schizophrenia. PMID:25297101

  9. Off-policy reinforcement learning for H∞ control design.

    PubMed

    Luo, Biao; Wu, Huai-Ning; Huang, Tingwen

    2015-01-01

    The H∞ control design problem is considered for nonlinear systems with unknown internal system model. It is known that the nonlinear H∞ control problem can be transformed into solving the so-called Hamilton-Jacobi-Isaacs (HJI) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, model-based approaches cannot be used for approximately solving HJI equation, when the accurate system model is unavailable or costly to obtain in practice. To overcome these difficulties, an off-policy reinforcement leaning (RL) method is introduced to learn the solution of HJI equation from real system data instead of mathematical system model, and its convergence is proved. In the off-policy RL method, the system data can be generated with arbitrary policies rather than the evaluating policy, which is extremely important and promising for practical systems. For implementation purpose, a neural network (NN)-based actor-critic structure is employed and a least-square NN weight update algorithm is derived based on the method of weighted residuals. Finally, the developed NN-based off-policy RL method is tested on a linear F16 aircraft plant, and further applied to a rotational/translational actuator system. PMID:25532162

  10. Learning to use working memory: a reinforcement learning gating model of rule acquisition in rats.

    PubMed

    Lloyd, Kevin; Becker, Nadine; Jones, Matthew W; Bogacz, Rafal

    2012-01-01

    Learning to form appropriate, task-relevant working memory representations is a complex process central to cognition. Gating models frame working memory as a collection of past observations and use reinforcement learning (RL) to solve the problem of when to update these observations. Investigation of how gating models relate to brain and behavior remains, however, at an early stage. The current study sought to explore the ability of simple RL gating models to replicate rule learning behavior in rats. Rats were trained in a maze-based spatial learning task that required animals to make trial-by-trial choices contingent upon their previous experience. Using an abstract version of this task, we tested the ability of two gating algorithms, one based on the Actor-Critic and the other on the State-Action-Reward-State-Action (SARSA) algorithm, to generate behavior consistent with the rats'. Both models produced rule-acquisition behavior consistent with the experimental data, though only the SARSA gating model mirrored faster learning following rule reversal. We also found that both gating models learned multiple strategies in solving the initial task, a property which highlights the multi-agent nature of such models and which is of importance in considering the neural basis of individual differences in behavior. PMID:23115551

  11. Reinforcement learning agents providing advice in complex video games

    NASA Astrophysics Data System (ADS)

    Taylor, Matthew E.; Carboni, Nicholas; Fachantidis, Anestis; Vlahavas, Ioannis; Torrey, Lisa

    2014-01-01

    This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

  12. Pleasurable music affects reinforcement learning according to the listener.

    PubMed

    Gold, Benjamin P; Frank, Michael J; Bogert, Brigitte; Brattico, Elvira

    2013-01-01

    Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875

  13. Pleasurable music affects reinforcement learning according to the listener

    PubMed Central

    Gold, Benjamin P.; Frank, Michael J.; Bogert, Brigitte; Brattico, Elvira

    2013-01-01

    Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875

  14. Emotional Multiagent Reinforcement Learning in Spatial Social Dilemmas.

    PubMed

    Yu, Chao; Zhang, Minjie; Ren, Fenghui; Tan, Guozhen

    2015-12-01

    Social dilemmas have attracted extensive interest in the research of multiagent systems in order to study the emergence of cooperative behaviors among selfish agents. Understanding how agents can achieve cooperation in social dilemmas through learning from local experience is a critical problem that has motivated researchers for decades. This paper investigates the possibility of exploiting emotions in agent learning in order to facilitate the emergence of cooperation in social dilemmas. In particular, the spatial version of social dilemmas is considered to study the impact of local interactions on the emergence of cooperation in the whole system. A double-layered emotional multiagent reinforcement learning framework is proposed to endow agents with internal cognitive and emotional capabilities that can drive these agents to learn cooperative behaviors. Experimental results reveal that various network topologies and agent heterogeneities have significant impacts on agent learning behaviors in the proposed framework, and under certain circumstances, high levels of cooperation can be achieved among the agents. PMID:25769173

  15. Robot cognitive control with a neurophysiologically inspired reinforcement learning model.

    PubMed

    Khamassi, Mehdi; Lallée, Stéphane; Enel, Pierre; Procyk, Emmanuel; Dominey, Peter F

    2011-01-01

    A major challenge in modern robotics is to liberate robots from controlled industrial settings, and allow them to interact with humans and changing environments in the real-world. The current research attempts to determine if a neurophysiologically motivated model of cortical function in the primate can help to address this challenge. Primates are endowed with cognitive systems that allow them to maximize the feedback from their environment by learning the values of actions in diverse situations and by adjusting their behavioral parameters (i.e., cognitive control) to accommodate unexpected events. In such contexts uncertainty can arise from at least two distinct sources - expected uncertainty resulting from noise during sensory-motor interaction in a known context, and unexpected uncertainty resulting from the changing probabilistic structure of the environment. However, it is not clear how neurophysiological mechanisms of reinforcement learning and cognitive control integrate in the brain to produce efficient behavior. Based on primate neuroanatomy and neurophysiology, we propose a novel computational model for the interaction between lateral prefrontal and anterior cingulate cortex reconciling previous models dedicated to these two functions. We deployed the model in two robots and demonstrate that, based on adaptive regulation of a meta-parameter β that controls the exploration rate, the model can robustly deal with the two kinds of uncertainties in the real-world. In addition the model could reproduce monkey behavioral performance and neurophysiological data in two problem-solving tasks. A last experiment extends this to human-robot interaction with the iCub humanoid, and novel sources of uncertainty corresponding to "cheating" by the human. The combined results provide concrete evidence for the ability of neurophysiologically inspired cognitive systems to control advanced robots in the real-world. PMID:21808619

  16. Robot Cognitive Control with a Neurophysiologically Inspired Reinforcement Learning Model

    PubMed Central

    Khamassi, Mehdi; Lallée, Stéphane; Enel, Pierre; Procyk, Emmanuel; Dominey, Peter F.

    2011-01-01

    A major challenge in modern robotics is to liberate robots from controlled industrial settings, and allow them to interact with humans and changing environments in the real-world. The current research attempts to determine if a neurophysiologically motivated model of cortical function in the primate can help to address this challenge. Primates are endowed with cognitive systems that allow them to maximize the feedback from their environment by learning the values of actions in diverse situations and by adjusting their behavioral parameters (i.e., cognitive control) to accommodate unexpected events. In such contexts uncertainty can arise from at least two distinct sources – expected uncertainty resulting from noise during sensory-motor interaction in a known context, and unexpected uncertainty resulting from the changing probabilistic structure of the environment. However, it is not clear how neurophysiological mechanisms of reinforcement learning and cognitive control integrate in the brain to produce efficient behavior. Based on primate neuroanatomy and neurophysiology, we propose a novel computational model for the interaction between lateral prefrontal and anterior cingulate cortex reconciling previous models dedicated to these two functions. We deployed the model in two robots and demonstrate that, based on adaptive regulation of a meta-parameter β that controls the exploration rate, the model can robustly deal with the two kinds of uncertainties in the real-world. In addition the model could reproduce monkey behavioral performance and neurophysiological data in two problem-solving tasks. A last experiment extends this to human–robot interaction with the iCub humanoid, and novel sources of uncertainty corresponding to “cheating” by the human. The combined results provide concrete evidence for the ability of neurophysiologically inspired cognitive systems to control advanced robots in the real-world. PMID:21808619

  17. Time-Extended Policies in Mult-Agent Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Agogino, Adrian K.

    2004-01-01

    Reinforcement learning methods perform well in many domains where a single agent needs to take a sequence of actions to perform a task. These methods use sequences of single-time-step rewards to create a policy that tries to maximize a time-extended utility, which is a (possibly discounted) sum of these rewards. In this paper we build on our previous work showing how these methods can be extended to a multi-agent environment where each agent creates its own policy that works towards maximizing a time-extended global utility over all agents actions. We show improved methods for creating time-extended utilities for the agents that are both "aligned" with the global utility and "learnable." We then show how to crate single-time-step rewards while avoiding the pi fall of having rewards aligned with the global reward leading to utilities not aligned with the global utility. Finally, we apply these reward functions to the multi-agent Gridworld problem. We explicitly quantify a utility's learnability and alignment, and show that reinforcement learning agents using the prescribed reward functions successfully tradeoff learnability and alignment. As a result they outperform both global (e.g., team games ) and local (e.g., "perfectly learnable" ) reinforcement learning solutions by as much as an order of magnitude.

  18. Acceleration of reinforcement learning by policy evaluation using nonstationary iterative method.

    PubMed

    Senda, Kei; Hattori, Suguru; Hishinuma, Toru; Kohda, Takehisa

    2014-12-01

    Typical methods for solving reinforcement learning problems iterate two steps, policy evaluation and policy improvement. This paper proposes algorithms for the policy evaluation to improve learning efficiency. The proposed algorithms are based on the Krylov Subspace Method (KSM), which is a nonstationary iterative method. The algorithms based on KSM are tens to hundreds times more efficient than existing algorithms based on the stationary iterative methods. Algorithms based on KSM are far more efficient than they have been generally expected. This paper clarifies what makes algorithms based on KSM makes more efficient with numerical examples and theoretical discussions. PMID:24733037

  19. Reinforcement learning output feedback NN control using deterministic learning technique.

    PubMed

    Xu, Bin; Yang, Chenguang; Shi, Zhongke

    2014-03-01

    In this brief, a novel adaptive-critic-based neural network (NN) controller is investigated for nonlinear pure-feedback systems. The controller design is based on the transformed predictor form, and the actor-critic NN control architecture includes two NNs, whereas the critic NN is used to approximate the strategic utility function, and the action NN is employed to minimize both the strategic utility function and the tracking error. A deterministic learning technique has been employed to guarantee that the partial persistent excitation condition of internal states is satisfied during tracking control to a periodic reference orbit. The uniformly ultimate boundedness of closed-loop signals is shown via Lyapunov stability analysis. Simulation results are presented to demonstrate the effectiveness of the proposed control. PMID:24807456

  20. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning.

    PubMed

    Pilarski, Patrick M; Dawson, Michael R; Degris, Thomas; Fahimi, Farbod; Carey, Jason P; Sutton, Richard S

    2011-01-01

    As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis. PMID:22275543

  1. Towards autonomous neuroprosthetic control using Hebbian reinforcement learning

    NASA Astrophysics Data System (ADS)

    Mahmoudi, Babak; Pohlmeyer, Eric A.; Prins, Noeline W.; Geng, Shijia; Sanchez, Justin C.

    2013-12-01

    Objective. Our goal was to design an adaptive neuroprosthetic controller that could learn the mapping from neural states to prosthetic actions and automatically adjust adaptation using only a binary evaluative feedback as a measure of desirability/undesirability of performance. Approach. Hebbian reinforcement learning (HRL) in a connectionist network was used for the design of the adaptive controller. The method combines the efficiency of supervised learning with the generality of reinforcement learning. The convergence properties of this approach were studied using both closed-loop control simulations and open-loop simulations that used primate neural data from robot-assisted reaching tasks. Main results. The HRL controller was able to perform classification and regression tasks using its episodic and sequential learning modes, respectively. In our experiments, the HRL controller quickly achieved convergence to an effective control policy, followed by robust performance. The controller also automatically stopped adapting the parameters after converging to a satisfactory control policy. Additionally, when the input neural vector was reorganized, the controller resumed adaptation to maintain performance. Significance. By estimating an evaluative feedback directly from the user, the HRL control algorithm may provide an efficient method for autonomous adaptation of neuroprosthetic systems. This method may enable the user to teach the controller the desired behavior using only a simple feedback signal.

  2. Cerebellar and prefrontal cortex contributions to adaptation, strategies, and reinforcement learning.

    PubMed

    Taylor, Jordan A; Ivry, Richard B

    2014-01-01

    Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. PMID:24916295

  3. Cerebellar and Prefrontal Cortex Contributions to Adaptation, Strategies, and Reinforcement Learning

    PubMed Central

    Taylor, Jordan A.; Ivry, Richard B.

    2014-01-01

    Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. PMID:24916295

  4. A Discussion of Possibility of Reinforcement Learning Using Event-Related Potential in BCI

    NASA Astrophysics Data System (ADS)

    Yamagishi, Yuya; Tsubone, Tadashi; Wada, Yasuhiro

    Recently, Brain computer interface (BCI) which is a direct connecting pathway an external device such as a computer or a robot and a human brain have gotten a lot of attention. Since BCI can control the machines as robots by using the brain activity without using the voluntary muscle, the BCI may become a useful communication tool for handicapped persons, for instance, amyotrophic lateral sclerosis patients. However, in order to realize the BCI system which can perform precise tasks on various environments, it is necessary to design the control rules to adapt to the dynamic environments. Reinforcement learning is one approach of the design of the control rule. If this reinforcement leaning can be performed by the brain activity, it leads to the attainment of BCI that has general versatility. In this research, we paid attention to P300 of event-related potential as an alternative signal of the reward of reinforcement learning. We discriminated between the success and the failure trials from P300 of the EEG of the single trial by using the proposed discrimination algorithm based on Support vector machine. The possibility of reinforcement learning was examined from the viewpoint of the number of discriminated trials. It was shown that there was a possibility to be able to learn in most subjects.

  5. Distributed Reinforcement Learning Approach for Vehicular Ad Hoc Networks

    NASA Astrophysics Data System (ADS)

    Wu, Celimuge; Kumekawa, Kazuya; Kato, Toshihiko

    In Vehicular Ad hoc Networks (VANETs), general purpose ad hoc routing protocols such as AODV cannot work efficiently due to the frequent changes in network topology caused by vehicle movement. This paper proposes a VANET routing protocol QLAODV (Q-Learning AODV) which suits unicast applications in high mobility scenarios. QLAODV is a distributed reinforcement learning routing protocol, which uses a Q-Learning algorithm to infer network state information and uses unicast control packets to check the path availability in a real time manner in order to allow Q-Learning to work efficiently in a highly dynamic network environment. QLAODV is favored by its dynamic route change mechanism, which makes it capable of reacting quickly to network topology changes. We present an analysis of the performance of QLAODV by simulation using different mobility models. The simulation results show that QLAODV can efficiently handle unicast applications in VANETs.

  6. Utilising reinforcement learning to develop strategies for driving auditory neural implants

    NASA Astrophysics Data System (ADS)

    Lee, Geoffrey W.; Zambetta, Fabio; Li, Xiaodong; Paolini, Antonio G.

    2016-08-01

    Objective. In this paper we propose a novel application of reinforcement learning to the area of auditory neural stimulation. We aim to develop a simulation environment which is based off real neurological responses to auditory and electrical stimulation in the cochlear nucleus (CN) and inferior colliculus (IC) of an animal model. Using this simulator we implement closed loop reinforcement learning algorithms to determine which methods are most effective at learning effective acoustic neural stimulation strategies. Approach. By recording a comprehensive set of acoustic frequency presentations and neural responses from a set of animals we created a large database of neural responses to acoustic stimulation. Extensive electrical stimulation in the CN and the recording of neural responses in the IC provides a mapping of how the auditory system responds to electrical stimuli. The combined dataset is used as the foundation for the simulator, which is used to implement and test learning algorithms. Main results. Reinforcement learning, utilising a modified n-Armed Bandit solution, is implemented to demonstrate the model’s function. We show the ability to effectively learn stimulation patterns which mimic the cochlea’s ability to covert acoustic frequencies to neural activity. Time taken to learn effective replication using neural stimulation takes less than 20 min under continuous testing. Significance. These results show the utility of reinforcement learning in the field of neural stimulation. These results can be coupled with existing sound processing technologies to develop new auditory prosthetics that are adaptable to the recipients current auditory pathway. The same process can theoretically be abstracted to other sensory and motor systems to develop similar electrical replication of neural signals.

  7. Oculomotor learning revisited: a model of reinforcement learning in the basal ganglia incorporating an efference copy of motor actions

    PubMed Central

    Fee, Michale S.

    2012-01-01

    In its simplest formulation, reinforcement learning is based on the idea that if an action taken in a particular context is followed by a favorable outcome, then, in the same context, the tendency to produce that action should be strengthened, or reinforced. While reinforcement learning forms the basis of many current theories of basal ganglia (BG) function, these models do not incorporate distinct computational roles for signals that convey context, and those that convey what action an animal takes. Recent experiments in the songbird suggest that vocal-related BG circuitry receives two functionally distinct excitatory inputs. One input is from a cortical region that carries context information about the current “time” in the motor sequence. The other is an efference copy of motor commands from a separate cortical brain region that generates vocal variability during learning. Based on these findings, I propose here a general model of vertebrate BG function that combines context information with a distinct motor efference copy signal. The signals are integrated by a learning rule in which efference copy inputs gate the potentiation of context inputs (but not efference copy inputs) onto medium spiny neurons in response to a rewarded action. The hypothesis is described in terms of a circuit that implements the learning of visually guided saccades. The model makes testable predictions about the anatomical and functional properties of hypothesized context and efference copy inputs to the striatum from both thalamic and cortical sources. PMID:22754501

  8. Credit assignment in movement-dependent reinforcement learning.

    PubMed

    McDougle, Samuel D; Boggess, Matthew J; Crossley, Matthew J; Parvin, Darius; Ivry, Richard B; Taylor, Jordan A

    2016-06-14

    When a person fails to obtain an expected reward from an object in the environment, they face a credit assignment problem: Did the absence of reward reflect an extrinsic property of the environment or an intrinsic error in motor execution? To explore this problem, we modified a popular decision-making task used in studies of reinforcement learning, the two-armed bandit task. We compared a version in which choices were indicated by key presses, the standard response in such tasks, to a version in which the choices were indicated by reaching movements, which affords execution failures. In the key press condition, participants exhibited a strong risk aversion bias; strikingly, this bias reversed in the reaching condition. This result can be explained by a reinforcement model wherein movement errors influence decision-making, either by gating reward prediction errors or by modifying an implicit representation of motor competence. Two further experiments support the gating hypothesis. First, we used a condition in which we provided visual cues indicative of movement errors but informed the participants that trial outcomes were independent of their actual movements. The main result was replicated, indicating that the gating process is independent of participants' explicit sense of control. Second, individuals with cerebellar degeneration failed to modulate their behavior between the key press and reach conditions, providing converging evidence of an implicit influence of movement error signals on reinforcement learning. These results provide a mechanistically tractable solution to the credit assignment problem. PMID:27247404

  9. Dynamics of learning in coupled oscillators tutored with delayed reinforcements

    NASA Astrophysics Data System (ADS)

    Trevisan, M. A.; Bouzat, S.; Samengo, I.; Mindlin, G. B.

    2005-07-01

    In this work we analyze the solutions of a simple system of coupled phase oscillators in which the connectivity is learned dynamically. The model is inspired by the process of learning of birdsongs by oscine birds. An oscillator acts as the generator of a basic rhythm and drives slave oscillators which are responsible for different motor actions. The driving signal arrives at each driven oscillator through two different pathways. One of them is a direct pathway. The other one is a reinforcement pathway, through which the signal arrives delayed. The coupling coefficients between the driving oscillator and the slave ones evolve in time following a Hebbian-like rule. We discuss the conditions under which a driven oscillator is capable of learning to lock to the driver. The resulting phase difference and connectivity are a function of the delay of the reinforcement. Around some specific delays, the system is capable of generating dramatic changes in the phase difference between the driver and the driven systems. We discuss the dynamical mechanism responsible for this effect and possible applications of this learning scheme.

  10. The emergence of saliency and novelty responses from Reinforcement Learning principles.

    PubMed

    Laurent, Patryk A

    2008-12-01

    Recent attempts to map reward-based learning models, like Reinforcement Learning [Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An introduction. Cambridge, MA: MIT Press], to the brain are based on the observation that phasic increases and decreases in the spiking of dopamine-releasing neurons signal differences between predicted and received reward [Gillies, A., & Arbuthnott, G. (2000). Computational models of the basal ganglia. Movement Disorders, 15(5), 762-770; Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80(1), 1-27]. However, this reward-prediction error is only one of several signals communicated by that phasic activity; another involves an increase in dopaminergic spiking, reflecting the appearance of salient but unpredicted non-reward stimuli [Doya, K. (2002). Metalearning and neuromodulation. Neural Networks, 15(4-6), 495-506; Horvitz, J. C. (2000). Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events. Neuroscience, 96(4), 651-656; Redgrave, P., & Gurney, K. (2006). The short-latency dopamine signal: A role in discovering novel actions? Nature Reviews Neuroscience, 7(12), 967-975], especially when an organism subsequently orients towards the stimulus [Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80(1), 1-27]. To explain these findings, Kakade and Dayan [Kakade, S., & Dayan, P. (2002). Dopamine: Generalization and bonuses. Neural Networks, 15(4-6), 549-559.] and others have posited that novel, unexpected stimuli are intrinsically rewarding. The simulation reported in this article demonstrates that this assumption is not necessary because the effect it is intended to capture emerges from the reward-prediction learning mechanisms of Reinforcement Learning. Thus, Reinforcement Learning principles can be used to understand not just reward-related activity of the dopaminergic neurons of the basal ganglia, but also some

  11. Decision theory, reinforcement learning, and the brain.

    PubMed

    Dayan, Peter; Daw, Nathaniel D

    2008-12-01

    Decision making is a core competence for animals and humans acting and surviving in environments they only partially comprehend, gaining rewards and punishments for their troubles. Decision-theoretic concepts permeate experiments and computational models in ethology, psychology, and neuroscience. Here, we review a well-known, coherent Bayesian approach to decision making, showing how it unifies issues in Markovian decision problems, signal detection psychophysics, sequential sampling, and optimal exploration and discuss paradigmatic psychological and neural examples of each problem. We discuss computational issues concerning what subjects know about their task and how ambitious they are in seeking optimal solutions; we address algorithmic topics concerning model-based and model-free methods for making choices; and we highlight key aspects of the neural implementation of decision making. PMID:19033240

  12. Reconciling Reinforcement Learning Models with Behavioral Extinction and Renewal: Implications for Addiction, Relapse, and Problem Gambling

    ERIC Educational Resources Information Center

    Redish, A. David; Jensen, Steve; Johnson, Adam; Kurth-Nelson, Zeb

    2007-01-01

    Because learned associations are quickly renewed following extinction, the extinction process must include processes other than unlearning. However, reinforcement learning models, such as the temporal difference reinforcement learning (TDRL) model, treat extinction as an unlearning of associated value and are thus unable to capture renewal. TDRL…

  13. Distributed reinforcement learning for adaptive and robust network intrusion response

    NASA Astrophysics Data System (ADS)

    Malialis, Kleanthis; Devlin, Sam; Kudenko, Daniel

    2015-07-01

    Distributed denial of service (DDoS) attacks constitute a rapidly evolving threat in the current Internet. Multiagent Router Throttling is a novel approach to defend against DDoS attacks where multiple reinforcement learning agents are installed on a set of routers and learn to rate-limit or throttle traffic towards a victim server. The focus of this paper is on online learning and scalability. We propose an approach that incorporates task decomposition, team rewards and a form of reward shaping called difference rewards. One of the novel characteristics of the proposed system is that it provides a decentralised coordinated response to the DDoS problem, thus being resilient to DDoS attacks themselves. The proposed system learns remarkably fast, thus being suitable for online learning. Furthermore, its scalability is successfully demonstrated in experiments involving 1000 learning agents. We compare our approach against a baseline and a popular state-of-the-art throttling technique from the network security literature and show that the proposed approach is more effective, adaptive to sophisticated attack rate dynamics and robust to agent failures.

  14. Statistical mechanics approach to a reinforcement learning model with memory

    NASA Astrophysics Data System (ADS)

    Lipowski, Adam; Gontarek, Krzysztof; Ausloos, Marcel

    2009-05-01

    We introduce a two-player model of reinforcement learning with memory. Past actions of an iterated game are stored in a memory and used to determine player’s next action. To examine the behaviour of the model some approximate methods are used and confronted against numerical simulations and exact master equation. When the length of memory of players increases to infinity the model undergoes an absorbing-state phase transition. Performance of examined strategies is checked in the prisoner’ dilemma game. It turns out that it is advantageous to have a large memory in symmetric games, but it is better to have a short memory in asymmetric ones.

  15. Toward Generalization of Automated Temporal Abstraction to Partially Observable Reinforcement Learning.

    PubMed

    Çilden, Erkin; Polat, Faruk

    2015-08-01

    Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task. Automatic extraction of abstractions during RL process is difficult but has many challenges such as dealing with the curse of dimensionality. Various studies have explored the subject under the assumption that the problem domain is fully observable by the learning agent. Learning abstractions for partially observable RL is a relatively less explored area. In this paper, we adapt an existing automatic abstraction method, namely extended sequence tree, originally designed for fully observable problems. The modified method covers a certain family of model-based partially observable RL settings. We also introduce belief state discretization methods that can be used with this new abstraction mechanism. The effectiveness of the proposed abstraction method is shown empirically by experimenting on well-known benchmark problems. PMID:25216494

  16. Reinforcement Learning in Distributed Domains: Beyond Team Games

    NASA Technical Reports Server (NTRS)

    Wolpert, David H.; Sill, Joseph; Turner, Kagan

    2000-01-01

    Distributed search algorithms are crucial in dealing with large optimization problems, particularly when a centralized approach is not only impractical but infeasible. Many machine learning concepts have been applied to search algorithms in order to improve their effectiveness. In this article we present an algorithm that blends Reinforcement Learning (RL) and hill climbing directly, by using the RL signal to guide the exploration step of a hill climbing algorithm. We apply this algorithm to the domain of a constellations of communication satellites where the goal is to minimize the loss of importance weighted data. We introduce the concept of 'ghost' traffic, where correctly setting this traffic induces the satellites to act to optimize the world utility. Our results indicated that the bi-utility search introduced in this paper outperforms both traditional hill climbing algorithms and distributed RL approaches such as team games.

  17. Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive.

    PubMed

    Collins, Anne G E; Frank, Michael J

    2014-07-01

    The striatal dopaminergic system has been implicated in reinforcement learning (RL), motor performance, and incentive motivation. Various computational models have been proposed to account for each of these effects individually, but a formal analysis of their interactions is lacking. Here we present a novel algorithmic model expanding the classical actor-critic architecture to include fundamental interactive properties of neural circuit models, incorporating both incentive and learning effects into a single theoretical framework. The standard actor is replaced by a dual opponent actor system representing distinct striatal populations, which come to differentially specialize in discriminating positive and negative action values. Dopamine modulates the degree to which each actor component contributes to both learning and choice discriminations. In contrast to standard frameworks, this model simultaneously captures documented effects of dopamine on both learning and choice incentive-and their interactions-across a variety of studies, including probabilistic RL, effort-based choice, and motor skill learning. PMID:25090423

  18. Reinforcement learning for congestion-avoidance in packet flow

    NASA Astrophysics Data System (ADS)

    Horiguchi, Tsuyoshi; Hayashi, Keisuke; Tretiakov, Alexei

    2005-04-01

    Occurrence of congestion of packet flow in computer networks is one of the unfavorable problems in packet communication and hence its avoidance should be investigated. We use a neural network model for packet routing control in a computer network proposed in a previous paper by Horiguchi and Ishioka (Physica A 297 (2001) 521). If we assume that the packets are not sent to nodes whose buffers are already full of packets, then we find that traffic congestion occurs when the number of packets in the computer network is larger than some critical value. In order to avoid the congestion, we introduce reinforcement learning for a control parameter in the neural network model. We find that the congestion is avoided by the reinforcement learning and at the same time we have good performance for the throughput. We investigate the packet flow on computer networks of various types of topology such as a regular network, a network with fractal structure, a small-world network, a scale-free network and so on.

  19. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia

    PubMed Central

    Markou, Athina; Salamone, John D.; Bussey, Timothy; Mar, Adam; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-01-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu). A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. PMID:23994273

  20. "Notice of Violation of IEEE Publication Principles" Multiobjective Reinforcement Learning: A Comprehensive Overview.

    PubMed

    Liu, Chunming; Xu, Xin; Hu, Dewen

    2013-04-29

    Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted. PMID:24240065

  1. Reinforcement learning for routing in cognitive radio ad hoc networks.

    PubMed

    Al-Rawi, Hasan A A; Yau, Kok-Lim Alvin; Mohamad, Hafizal; Ramli, Nordin; Hashim, Wahidah

    2014-01-01

    Cognitive radio (CR) enables unlicensed users (or secondary users, SUs) to sense for and exploit underutilized licensed spectrum owned by the licensed users (or primary users, PUs). Reinforcement learning (RL) is an artificial intelligence approach that enables a node to observe, learn, and make appropriate decisions on action selection in order to maximize network performance. Routing enables a source node to search for a least-cost route to its destination node. While there have been increasing efforts to enhance the traditional RL approach for routing in wireless networks, this research area remains largely unexplored in the domain of routing in CR networks. This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate) through simulation. New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing. Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs' network performance without significantly jeopardizing PUs' network performance, specifically SUs' interference to PUs. PMID:25140350

  2. A multiplicative reinforcement learning model capturing learning dynamics and interindividual variability in mice.

    PubMed

    Bathellier, Brice; Tee, Sui Poh; Hrovat, Christina; Rumpel, Simon

    2013-12-01

    Both in humans and in animals, different individuals may learn the same task with strikingly different speeds; however, the sources of this variability remain elusive. In standard learning models, interindividual variability is often explained by variations of the learning rate, a parameter indicating how much synapses are updated on each learning event. Here, we theoretically show that the initial connectivity between the neurons involved in learning a task is also a strong determinant of how quickly the task is learned, provided that connections are updated in a multiplicative manner. To experimentally test this idea, we trained mice to perform an auditory Go/NoGo discrimination task followed by a reversal to compare learning speed when starting from naive or already trained synaptic connections. All mice learned the initial task, but often displayed sigmoid-like learning curves, with a variable delay period followed by a steep increase in performance, as often observed in operant conditioning. For all mice, learning was much faster in the subsequent reversal training. An accurate fit of all learning curves could be obtained with a reinforcement learning model endowed with a multiplicative learning rule, but not with an additive rule. Surprisingly, the multiplicative model could explain a large fraction of the interindividual variability by variations in the initial synaptic weights. Altogether, these results demonstrate the power of multiplicative learning rules to account for the full dynamics of biological learning and suggest an important role of initial wiring in the brain for predispositions to different tasks. PMID:24255115

  3. Adventitious Reinforcement of Maladaptive Stimulus Control Interferes with Learning.

    PubMed

    Saunders, Kathryn J; Hine, Kathleen; Hayashi, Yusuke; Williams, Dean C

    2016-09-01

    Persistent error patterns sometimes develop when teaching new discriminations. These patterns can be adventitiously reinforced, especially during long periods of chance-level responding (including baseline). Such behaviors can interfere with learning a new discrimination. They can also disrupt already learned discriminations, if they re-emerge during teaching procedures that generate errors. We present an example of this process. Our goal was to teach a boy with intellectual disabilities to touch one of two shapes on a computer screen (in technical terms, a simple simultaneous discrimination). We used a size-fading procedure. The correct stimulus was at full size, and the incorrect-stimulus size increased in increments of 10 %. Performance was nearly error free up to and including 60 % of full size. In a probe session with the incorrect stimulus at full size, however, accuracy plummeted. Also, a pattern of switching between choices, which apparently had been established in classroom instruction, re-emerged. The switching pattern interfered with already-learned discriminations. Despite having previously mastered a fading step with the incorrect stimulus up to 60 %, we were unable to maintain consistently high accuracy beyond 20 % of full size. We refined the teaching program such that fading was done in smaller steps (5 %), and decisions to "step back" to a smaller incorrect stimulus were made after every 5-instead of 20-trials. Errors were rare, switching behavior stopped, and he mastered the discrimination. This is a practical example of the importance of designing instruction that prevents adventitious reinforcement of maladaptive discriminated response patterns by reducing errors during acquisition. PMID:27622128

  4. The Effect of Tutoring on Children's Learning Under Two Conditions of Reinforcement.

    ERIC Educational Resources Information Center

    Zach, Lillian; And Others

    Studies were some problems of learning motivation and extrinsic reinforcement in a group of disadvantaged youngsters. Also tested was the hypothesis that learning would be facilitated for those children who received regular individual tutoring in addition to classroom instruction, regardless of conditions of reinforcement. Subjects were 60 Negro…

  5. Don't Think, Just Feel the Music: Individuals with Strong Pavlovian-to-Instrumental Transfer Effects Rely Less on Model-based Reinforcement Learning.

    PubMed

    Sebold, Miriam; Schad, Daniel J; Nebe, Stephan; Garbusow, Maria; Jünger, Elisabeth; Kroemer, Nils B; Kathmann, Norbert; Zimmermann, Ulrich S; Smolka, Michael N; Rapp, Michael A; Heinz, Andreas; Huys, Quentin J M

    2016-07-01

    Behavioral choice can be characterized along two axes. One axis distinguishes reflexive, model-free systems that slowly accumulate values through experience and a model-based system that uses knowledge to reason prospectively. The second axis distinguishes Pavlovian valuation of stimuli from instrumental valuation of actions or stimulus-action pairs. This results in four values and many possible interactions between them, with important consequences for accounts of individual variation. We here explored whether individual variation along one axis was related to individual variation along the other. Specifically, we asked whether individuals' balance between model-based and model-free learning was related to their tendency to show Pavlovian interferences with instrumental decisions. In two independent samples with a total of 243 participants, Pavlovian-instrumental transfer effects were negatively correlated with the strength of model-based reasoning in a two-step task. This suggests a potential common underlying substrate predisposing individuals to both have strong Pavlovian interference and be less model-based and provides a framework within which to interpret the observation of both effects in addiction. PMID:26942321

  6. Curiosity driven reinforcement learning for motion planning on humanoids

    PubMed Central

    Frank, Mikhail; Leitner, Jürgen; Stollenga, Marijn; Förster, Alexander; Schmidhuber, Jürgen

    2014-01-01

    Most previous work on artificial curiosity (AC) and intrinsic motivation focuses on basic concepts and theory. Experimental results are generally limited to toy scenarios, such as navigation in a simulated maze, or control of a simple mechanical system with one or two degrees of freedom. To study AC in a more realistic setting, we embody a curious agent in the complex iCub humanoid robot. Our novel reinforcement learning (RL) framework consists of a state-of-the-art, low-level, reactive control layer, which controls the iCub while respecting constraints, and a high-level curious agent, which explores the iCub's state-action space through information gain maximization, learning a world model from experience, controlling the actual iCub hardware in real-time. To the best of our knowledge, this is the first ever embodied, curious agent for real-time motion planning on a humanoid. We demonstrate that it can learn compact Markov models to represent large regions of the iCub's configuration space, and that the iCub explores intelligently, showing interest in its physical constraints as well as in objects it finds in its environment. PMID:24432001

  7. Curiosity driven reinforcement learning for motion planning on humanoids.

    PubMed

    Frank, Mikhail; Leitner, Jürgen; Stollenga, Marijn; Förster, Alexander; Schmidhuber, Jürgen

    2014-01-01

    Most previous work on artificial curiosity (AC) and intrinsic motivation focuses on basic concepts and theory. Experimental results are generally limited to toy scenarios, such as navigation in a simulated maze, or control of a simple mechanical system with one or two degrees of freedom. To study AC in a more realistic setting, we embody a curious agent in the complex iCub humanoid robot. Our novel reinforcement learning (RL) framework consists of a state-of-the-art, low-level, reactive control layer, which controls the iCub while respecting constraints, and a high-level curious agent, which explores the iCub's state-action space through information gain maximization, learning a world model from experience, controlling the actual iCub hardware in real-time. To the best of our knowledge, this is the first ever embodied, curious agent for real-time motion planning on a humanoid. We demonstrate that it can learn compact Markov models to represent large regions of the iCub's configuration space, and that the iCub explores intelligently, showing interest in its physical constraints as well as in objects it finds in its environment. PMID:24432001

  8. Time representation in reinforcement learning models of the basal ganglia

    PubMed Central

    Gershman, Samuel J.; Moustafa, Ahmed A.; Ludvig, Elliot A.

    2014-01-01

    Reinforcement learning (RL) models have been influential in understanding many aspects of basal ganglia function, from reward prediction to action selection. Time plays an important role in these models, but there is still no theoretical consensus about what kind of time representation is used by the basal ganglia. We review several theoretical accounts and their supporting evidence. We then discuss the relationship between RL models and the timing mechanisms that have been attributed to the basal ganglia. We hypothesize that a single computational system may underlie both RL and interval timing—the perception of duration in the range of seconds to hours. This hypothesis, which extends earlier models by incorporating a time-sensitive action selection mechanism, may have important implications for understanding disorders like Parkinson's disease in which both decision making and timing are impaired. PMID:24409138

  9. Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

    PubMed

    Uchibe, Eiji; Doya, Kenji

    2008-12-01

    Understanding the design principle of reward functions is a substantial challenge both in artificial intelligence and neuroscience. Successful acquisition of a task usually requires not only rewards for goals, but also for intermediate states to promote effective exploration. This paper proposes a method for designing 'intrinsic' rewards of autonomous agents by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use Cyber Rodent robots, in which collision avoidance, recharging from battery packs, and 'mating' by software reproduction are three major 'extrinsic' rewards. We show in hardware experiments that the robots can find appropriate 'intrinsic' rewards for the vision of battery packs and other robots to promote approach behaviors. PMID:19013054

  10. Reinforcement Learning of Targeted Movement in a Spiking Neuronal Model of Motor Cortex

    PubMed Central

    Chadderdon, George L.; Neymotin, Samuel A.; Kerr, Cliff C.; Lytton, William W.

    2012-01-01

    Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint “forearm” to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1), no learning (0), or punishment (−1), corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior. PMID:23094042

  11. A Robust Reinforcement Learning Control Design Method for Nonlinear System with Partially Unknown Structure

    NASA Astrophysics Data System (ADS)

    Nakano, Kazuhiro; Obayashi, Masanao; Kuremoto, Takashi; Kobayashi, Kunikazu

    We propose a robust control system which has robustness for disturbance and can deal with a nonlinear system with partially unknown structure by fusing reinforcement learning and robust control theory. First, we solved an optimal control problem without using unknown part of functions of the system, using neural network and the repetition learning of reinforcement learning algorithm. Second, we built the robust reinforcement learning control system which permits uncertainty and has robustness for disturbance by fusing the idea of H infinity control theory with above system.

  12. The left hemisphere learns what is right: Hemispatial reward learning depends on reinforcement learning processes in the contralateral hemisphere.

    PubMed

    Aberg, Kristoffer Carl; Doell, Kimberly Crystal; Schwartz, Sophie

    2016-08-01

    Orienting biases refer to consistent, trait-like direction of attention or locomotion toward one side of space. Recent studies suggest that such hemispatial biases may determine how well people memorize information presented in the left or right hemifield. Moreover, lesion studies indicate that learning rewarded stimuli in one hemispace depends on the integrity of the contralateral striatum. However, the exact neural and computational mechanisms underlying the influence of individual orienting biases on reward learning remain unclear. Because reward-based behavioural adaptation depends on the dopaminergic system and prediction error (PE) encoding in the ventral striatum, we hypothesized that hemispheric asymmetries in dopamine (DA) function may determine individual spatial biases in reward learning. To test this prediction, we acquired fMRI in 33 healthy human participants while they performed a lateralized reward task. Learning differences between hemispaces were assessed by presenting stimuli, assigned to different reward probabilities, to the left or right of central fixation, i.e. presented in the left or right visual hemifield. Hemispheric differences in DA function were estimated through differential fMRI responses to positive vs. negative feedback in the left vs. right ventral striatum, and a computational approach was used to identify the neural correlates of PEs. Our results show that spatial biases favoring reward learning in the right (vs. left) hemifield were associated with increased reward responses in the left hemisphere and relatively better neural encoding of PEs for stimuli presented in the right (vs. left) hemifield. These findings demonstrate that trait-like spatial biases implicate hemisphere-specific learning mechanisms, with individual differences between hemispheres contributing to reinforcing spatial biases. PMID:27221149

  13. The Drive-Reinforcement Neuronal Model: A Real-Time Learning Mechanism For Unsupervised Learning

    NASA Astrophysics Data System (ADS)

    Klopf, A. H.

    1988-05-01

    The drive-reinforcement neuronal model is described as an example of a newly discovered class of real-time learning mechanisms that correlate earlier derivatives of inputs with later derivatives of outputs. The drive-reinforcement neuronal model has been demonstrated to predict a wide range of classical conditioning phenomena in animal learning. A variety of classes of connectionist and neural network models have been investigated in recent years (Hinton and Anderson, 1981; Levine, 1983; Barto, 1985; Feldman, 1985; Rumelhart and McClelland, 1986). After a brief review of these models, discussion will focus on the class of real-time models because they appear to be making the strongest contact with the experimental evidence of animal learning. Theoretical models in physics have inspired Boltzmann machines (Ackley, Hinton, and Sejnowski, 1985) and what are sometimes called Hopfield networks (Hopfield, 1982; Hopfield and Tank, 1986). These connectionist models utilize symmetric connections and adaptive equilibrium processes during which the networks settle into minimal energy states. Networks utilizing error-correction learning mechanisms go back to Rosenblatt's (1962) perception and Widrow's (1962) adaline and currently take the form of back propagation networks (Parker, 1985; Rumelhart, Hinton, and Williams, 1985, 1986). These networks require a "teacher" or "trainer" to provide error signals indicating the difference between desired and actual responses. Networks employing real-time learning mechanisms, in which the temporal association of signals is of fundamental importance, go back to Hebb (1949). Real-time learning mechanisms may require no teacher or trainer and thus may lend themselves to unsupervised learning. Such models have been extended by Klopf (1972, 1982), who introduced the notions of synaptic eligibility and generalized reinforcement. Sutton and Barto (1981) advanced this class of models by proposing that a derivative of the theoretical neuron's out

  14. Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

    PubMed

    Frémaux, Nicolas; Sprekeler, Henning; Gerstner, Wulfram

    2013-04-01

    Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity. PMID:23592970

  15. Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons

    PubMed Central

    Frémaux, Nicolas; Sprekeler, Henning; Gerstner, Wulfram

    2013-01-01

    Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity. PMID:23592970

  16. Beamforming and Power Control in Sensor Arrays Using Reinforcement Learning

    PubMed Central

    Almeida, Náthalee C.; Fernandes, Marcelo A.C.; Neto, Adrião D.D.

    2015-01-01

    The use of beamforming and power control, combined or separately, has advantages and disadvantages, depending on the application. The combined use of beamforming and power control has been shown to be highly effective in applications involving the suppression of interference signals from different sources. However, it is necessary to identify efficient methodologies for the combined operation of these two techniques. The most appropriate technique may be obtained by means of the implementation of an intelligent agent capable of making the best selection between beamforming and power control. The present paper proposes an algorithm using reinforcement learning (RL) to determine the optimal combination of beamforming and power control in sensor arrays. The RL algorithm used was Q-learning, employing an ε-greedy policy, and training was performed using the offline method. The simulations showed that RL was effective for implementation of a switching policy involving the different techniques, taking advantage of the positive characteristics of each technique in terms of signal reception. PMID:25808769

  17. Electrophysiological correlates of reinforcement learning in young people with Tourette syndrome with and without co-occurring ADHD symptoms.

    PubMed

    Shephard, Elizabeth; Jackson, Georgina M; Groom, Madeleine J

    2016-06-01

    Altered reinforcement learning is implicated in the causes of Tourette syndrome (TS) and attention-deficit/hyperactivity disorder (ADHD). TS and ADHD frequently co-occur but how this affects reinforcement learning has not been investigated. We examined the ability of young people with TS (n=18), TS+ADHD (N=17), ADHD (n=13) and typically developing controls (n=20) to learn and reverse stimulus-response (S-R) associations based on positive and negative reinforcement feedback. We used a 2 (TS-yes, TS-no)×2 (ADHD-yes, ADHD-no) factorial design to assess the effects of TS, ADHD, and their interaction on behavioural (accuracy, RT) and event-related potential (stimulus-locked P3, feedback-locked P2, feedback-related negativity, FRN) indices of learning and reversing the S-R associations. TS was associated with intact learning and reversal performance and largely typical ERP amplitudes. ADHD was associated with lower accuracy during S-R learning and impaired reversal learning (significantly reduced accuracy and a trend for smaller P3 amplitude). The results indicate that co-occurring ADHD symptoms impair reversal learning in TS+ADHD. The implications of these findings for behavioural tic therapies are discussed. PMID:27103231

  18. Enhanced Student Learning with Problem Based Learning

    ERIC Educational Resources Information Center

    Hollenbeck, James

    2008-01-01

    Science educators define a learning environment in which the problem drives the learning as problem based learning (PBL). Problem based learning can be a learning methodology/process or a curriculum based on its application by the teacher. This paper discusses the basic premise of Problem base learning and successful applications of such learning.…

  19. Intelligence moderates reinforcement learning: a mini-review of the neural evidence.

    PubMed

    Chen, Chong

    2015-06-01

    Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. PMID:25185818

  20. Identifying Cognitive Remediation Change Through Computational Modelling—Effects on Reinforcement Learning in Schizophrenia

    PubMed Central

    Cella, Matteo; Bishara, Anthony J.; Medin, Evelina; Swan, Sarah; Reeder, Clare; Wykes, Til

    2014-01-01

    Objective: Converging research suggests that individuals with schizophrenia show a marked impairment in reinforcement learning, particularly in tasks requiring flexibility and adaptation. The problem has been associated with dopamine reward systems. This study explores, for the first time, the characteristics of this impairment and how it is affected by a behavioral intervention—cognitive remediation. Method: Using computational modelling, 3 reinforcement learning parameters based on the Wisconsin Card Sorting Test (WCST) trial-by-trial performance were estimated: R (reward sensitivity), P (punishment sensitivity), and D (choice consistency). In Study 1 the parameters were compared between a group of individuals with schizophrenia (n = 100) and a healthy control group (n = 50). In Study 2 the effect of cognitive remediation therapy (CRT) on these parameters was assessed in 2 groups of individuals with schizophrenia, one receiving CRT (n = 37) and the other receiving treatment as usual (TAU, n = 34). Results: In Study 1 individuals with schizophrenia showed impairment in the R and P parameters compared with healthy controls. Study 2 demonstrated that sensitivity to negative feedback (P) and reward (R) improved in the CRT group after therapy compared with the TAU group. R and P parameter change correlated with WCST outputs. Improvements in R and P after CRT were associated with working memory gains and reduction of negative symptoms, respectively. Conclusion: Schizophrenia reinforcement learning difficulties negatively influence performance in shift learning tasks. CRT can improve sensitivity to reward and punishment. Identifying parameters that show change may be useful in experimental medicine studies to identify cognitive domains susceptible to improvement. PMID:24214932

  1. Automatic Skill Acquisition in Reinforcement Learning Agents Using Connection Bridge Centrality

    NASA Astrophysics Data System (ADS)

    Moradi, Parham; Shiri, Mohammad Ebrahim; Entezari, Negin

    Incorporating skills in reinforcement learning methods results in accelerate agents learning performance. The key problem of automatic skill discovery is to find subgoal states and create skills to reach them. Among the proposed algorithms, those based on graph centrality measures have achieved precise results. In this paper we propose a new graph centrality measure for identifying subgoal states that is crucial to develop useful skills. The main advantage of the proposed centrality measure is that this measure considers both local and global information of the agent states to score them that result in identifying real subgoal states. We will show through simulations for three benchmark tasks, namely, "four-room grid world", "taxi driver grid world" and "soccer simulation grid world" that a procedure based on the proposed centrality measure performs better than the procedure based on the other centrality measures.

  2. Learning in neural networks by reinforcement of irregular spiking

    NASA Astrophysics Data System (ADS)

    Xie, Xiaohui; Seung, H. Sebastian

    2004-04-01

    Artificial neural networks are often trained by using the back propagation algorithm to compute the gradient of an objective function with respect to the synaptic strengths. For a biological neural network, such a gradient computation would be difficult to implement, because of the complex dynamics of intrinsic and synaptic conductances in neurons. Here we show that irregular spiking similar to that observed in biological neurons could be used as the basis for a learning rule that calculates a stochastic approximation to the gradient. The learning rule is derived based on a special class of model networks in which neurons fire spike trains with Poisson statistics. The learning is compatible with forms of synaptic dynamics such as short-term facilitation and depression. By correlating the fluctuations in irregular spiking with a reward signal, the learning rule performs stochastic gradient ascent on the expected reward. It is applied to two examples, learning the XOR computation and learning direction selectivity using depressing synapses. We also show in simulation that the learning rule is applicable to a network of noisy integrate-and-fire neurons.

  3. Dopamine-Dependent Reinforcement of Motor Skill Learning: Evidence from Gilles de la Tourette Syndrome

    ERIC Educational Resources Information Center

    Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias

    2011-01-01

    Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…

  4. Effect of Reinforcement on Modality of Stimulus Control in Learning Disabled Students.

    ERIC Educational Resources Information Center

    Koorland, Mark A.; Wolking, William D.

    1982-01-01

    The effects of reinforcement contingencies on task performance of bisensory missing words were studied with two students (about nine years old): one learning disabled (LD) male with an auditory preference and one LD female with a visual preference. Reinforcement contingencies were found to control both students' performances. (Author/SEW)

  5. Cortical mechanisms for reinforcement learning in competitive games.

    PubMed

    Seo, Hyojung; Lee, Daeyeol

    2008-12-12

    Game theory analyses optimal strategies for multiple decision makers interacting in a social group. However, the behaviours of individual humans and animals often deviate systematically from the optimal strategies described by game theory. The behaviours of rhesus monkeys (Macaca mulatta) in simple zero-sum games showed similar patterns, but their departures from the optimal strategies were well accounted for by a simple reinforcement-learning algorithm. During a computer-simulated zero-sum game, neurons in the dorsolateral prefrontal cortex often encoded the previous choices of the animal and its opponent as well as the animal's reward history. By contrast, the neurons in the anterior cingulate cortex predominantly encoded the animal's reward history. Using simple competitive games, therefore, we have demonstrated functional specialization between different areas of the primate frontal cortex involved in outcome monitoring and action selection. Temporally extended signals related to the animal's previous choices might facilitate the association between choices and their delayed outcomes, whereas information about the choices of the opponent might be used to estimate the reward expected from a particular action. Finally, signals related to the reward history might be used to monitor the overall success of the animal's current decision-making strategy. PMID:18829430

  6. Reinforcement learning solution for HJB equation arising in constrained optimal control problem.

    PubMed

    Luo, Biao; Wu, Huai-Ning; Huang, Tingwen; Liu, Derong

    2015-11-01

    The constrained optimal control problem depends on the solution of the complicated Hamilton-Jacobi-Bellman equation (HJBE). In this paper, a data-based off-policy reinforcement learning (RL) method is proposed, which learns the solution of the HJBE and the optimal control policy from real system data. One important feature of the off-policy RL is that its policy evaluation can be realized with data generated by other behavior policies, not necessarily the target policy, which solves the insufficient exploration problem. The convergence of the off-policy RL is proved by demonstrating its equivalence to the successive approximation approach. Its implementation procedure is based on the actor-critic neural networks structure, where the function approximation is conducted with linearly independent basis functions. Subsequently, the convergence of the implementation procedure with function approximation is also proved. Finally, its effectiveness is verified through computer simulations. PMID:26356598

  7. Universal effect of dynamical reinforcement learning mechanism in spatial evolutionary games

    NASA Astrophysics Data System (ADS)

    Zhang, Hai-Feng; Wu, Zhi-Xi; Wang, Bing-Hong

    2012-06-01

    One of the prototypical mechanisms in understanding the ubiquitous cooperation in social dilemma situations is the win-stay, lose-shift rule. In this work, a generalized win-stay, lose-shift learning model—a reinforcement learning model with dynamic aspiration level—is proposed to describe how humans adapt their social behaviors based on their social experiences. In the model, the players incorporate the information of the outcomes in previous rounds with time-dependent aspiration payoffs to regulate the probability of choosing cooperation. By investigating such a reinforcement learning rule in the spatial prisoner's dilemma game and public goods game, a most noteworthy viewpoint is that moderate greediness (i.e. moderate aspiration level) favors best the development and organization of collective cooperation. The generality of this observation is tested against different regulation strengths and different types of network of interaction as well. We also make comparisons with two recently proposed models to highlight the importance of the mechanism of adaptive aspiration level in supporting cooperation in structured populations.

  8. Reinforcement Learning with Autonomous Small Unmanned Aerial Vehicles in Cluttered Environments

    NASA Technical Reports Server (NTRS)

    Tran, Loc; Cross, Charles; Montague, Gilbert; Motter, Mark; Neilan, James; Qualls, Garry; Rothhaar, Paul; Trujillo, Anna; Allen, B. Danette

    2015-01-01

    We present ongoing work in the Autonomy Incubator at NASA Langley Research Center (LaRC) exploring the efficacy of a data set aggregation approach to reinforcement learning for small unmanned aerial vehicle (sUAV) flight in dense and cluttered environments with reactive obstacle avoidance. The goal is to learn an autonomous flight model using training experiences from a human piloting a sUAV around static obstacles. The training approach uses video data from a forward-facing camera that records the human pilot's flight. Various computer vision based features are extracted from the video relating to edge and gradient information. The recorded human-controlled inputs are used to train an autonomous control model that correlates the extracted feature vector to a yaw command. As part of the reinforcement learning approach, the autonomous control model is iteratively updated with feedback from a human agent who corrects undesired model output. This data driven approach to autonomous obstacle avoidance is explored for simulated forest environments furthering autonomous flight under the tree canopy research. This enables flight in previously inaccessible environments which are of interest to NASA researchers in Earth and Atmospheric sciences.

  9. Cooperation and Coordination Between Fuzzy Reinforcement Learning Agents in Continuous State Partially Observable Markov Decision Processes

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Vengerov, David

    1999-01-01

    Successful operations of future multi-agent intelligent systems require efficient cooperation schemes between agents sharing learning experiences. We consider a pseudo-realistic world in which one or more opportunities appear and disappear in random locations. Agents use fuzzy reinforcement learning to learn which opportunities are most worthy of pursuing based on their promise rewards, expected lifetimes, path lengths and expected path costs. We show that this world is partially observable because the history of an agent influences the distribution of its future states. We consider a cooperation mechanism in which agents share experience by using and-updating one joint behavior policy. We also implement a coordination mechanism for allocating opportunities to different agents in the same world. Our results demonstrate that K cooperative agents each learning in a separate world over N time steps outperform K independent agents each learning in a separate world over K*N time steps, with this result becoming more pronounced as the degree of partial observability in the environment increases. We also show that cooperation between agents learning in the same world decreases performance with respect to independent agents. Since cooperation reduces diversity between agents, we conclude that diversity is a key parameter in the trade off between maximizing utility from cooperation when diversity is low and maximizing utility from competitive coordination when diversity is high.

  10. Evolution of cooperation facilitated by reinforcement learning with adaptive aspiration levels.

    PubMed

    Tanabe, Shoma; Masuda, Naoki

    2012-01-21

    Repeated interaction between individuals is the main mechanism for maintaining cooperation in social dilemma situations. Variants of tit-for-tat (repeating the previous action of the opponent) and the win-stay lose-shift strategy are known as strong competitors in iterated social dilemma games. On the other hand, real repeated interaction generally allows plasticity (i.e., learning) of individuals based on the experience of the past. Although plasticity is relevant to various biological phenomena, its role in repeated social dilemma games is relatively unexplored. In particular, if experience-based learning plays a key role in promotion and maintenance of cooperation, learners should evolve in the contest with nonlearners under selection pressure. By modeling players using a simple reinforcement learning model, we numerically show that learning enables the evolution of cooperation. We also show that numerically estimated adaptive dynamics appositely predict the outcome of evolutionary simulations. The analysis of the adaptive dynamics enables us to capture the obtained results as an affirmative example of the Baldwin effect, where learning accelerates the evolution to optimality. PMID:22037063

  11. Variance-penalized Markov decision processes: dynamic programming and reinforcement learning techniques

    NASA Astrophysics Data System (ADS)

    Gosavi, Abhijit

    2014-08-01

    In control systems theory, the Markov decision process (MDP) is a widely used optimization model involving selection of the optimal action in each state visited by a discrete-event system driven by Markov chains. The classical MDP model is suitable for an agent/decision-maker interested in maximizing expected revenues, but does not account for minimizing variability in the revenues. An MDP model in which the agent can maximize the revenues while simultaneously controlling the variance in the revenues is proposed. This work is rooted in machine learning/neural network concepts, where updating is based on system feedback and step sizes. First, a Bellman equation for the problem is proposed. Thereafter, convergent dynamic programming and reinforcement learning techniques for solving the MDP are provided along with encouraging numerical results on a small MDP and a preventive maintenance problem.

  12. The Cerebellum: A Neural System for the Study of Reinforcement Learning

    PubMed Central

    Swain, Rodney A.; Kerr, Abigail L.; Thompson, Richard F.

    2011-01-01

    In its strictest application, the term “reinforcement learning” refers to a computational approach to learning in which an agent (often a machine) interacts with a mutable environment to maximize reward through trial and error. The approach borrows essentials from several fields, most notably Computer Science, Behavioral Neuroscience, and Psychology. At the most basic level, a neural system capable of mediating reinforcement learning must be able to acquire sensory information about the external environment and internal milieu (either directly or through connectivities with other brain regions), must be able to select a behavior to be executed, and must be capable of providing evaluative feedback about the success of that behavior. Given that Psychology informs us that reinforcers, both positive and negative, are stimuli or consequences that increase the probability that the immediately antecedent behavior will be repeated and that reinforcer strength or viability is modulated by the organism's past experience with the reinforcer, its affect, and even the state of its muscles (e.g., eyes open or closed); it is the case that any neural system that supports reinforcement learning must also be sensitive to these same considerations. Once learning is established, such a neural system must finally be able to maintain continued response expression and prevent response drift. In this report, we examine both historical and recent evidence that the cerebellum satisfies all of these requirements. While we report evidence from a variety of learning paradigms, the majority of our discussion will focus on classical conditioning of the rabbit eye blink response as an ideal model system for the study of reinforcement and reinforcement learning. PMID:21427778

  13. A Judgement-Based Model of Workplace Learning

    ERIC Educational Resources Information Center

    Athanasou, James A.

    2004-01-01

    The purpose of this paper is to outline a judgement-based model of adult learning. This approach is set out as a Perceptual-Judgemental-Reinforcement approach to social learning under conditions of complexity and where there is no single, clearly identified correct response. The model builds upon the Hager-Halliday thesis of workplace learning and…

  14. Multiobjective Reinforcement Learning for Traffic Signal Control Using Vehicular Ad Hoc Network

    NASA Astrophysics Data System (ADS)

    Houli, Duan; Zhiheng, Li; Yi, Zhang

    2010-12-01

    We propose a new multiobjective control algorithm based on reinforcement learning for urban traffic signal control, named multi-RL. A multiagent structure is used to describe the traffic system. A vehicular ad hoc network is used for the data exchange among agents. A reinforcement learning algorithm is applied to predict the overall value of the optimization objective given vehicles' states. The policy which minimizes the cumulative value of the optimization objective is regarded as the optimal one. In order to make the method adaptive to various traffic conditions, we also introduce a multiobjective control scheme in which the optimization objective is selected adaptively to real-time traffic states. The optimization objectives include the vehicle stops, the average waiting time, and the maximum queue length of the next intersection. In addition, we also accommodate a priority control to the buses and the emergency vehicles through our model. The simulation results indicated that our algorithm could perform more efficiently than traditional traffic light control methods.

  15. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning

    PubMed Central

    Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, BJ

    2014-01-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The current study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents towards action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggests possible explanations for how peers may motivate adolescent behavior. PMID:24550063

  16. Integral reinforcement learning for continuous-time input-affine nonlinear systems with simultaneous invariant explorations.

    PubMed

    Lee, Jae Young; Park, Jin Bae; Choi, Yoon Ho

    2015-05-01

    This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral Q -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods. PMID:25163070

  17. An Evaluation of Pedagogical Tutorial Tactics for a Natural Language Tutoring System: A Reinforcement Learning Approach

    ERIC Educational Resources Information Center

    Chi, Min; VanLehn, Kurt; Litman, Diane; Jordan, Pamela

    2011-01-01

    Pedagogical strategies are policies for a tutor to decide the next action when there are multiple actions available. When the content is controlled to be the same across experimental conditions, there has been little evidence that tutorial decisions have an impact on students' learning. In this paper, we applied Reinforcement Learning (RL) to…

  18. On the Evolutionary Bases of Consumer Reinforcement

    ERIC Educational Resources Information Center

    Nicholson, Michael; Xiao, Sarah Hong

    2010-01-01

    This article locates consumer behavior analysis within the modern neo-Darwinian synthesis, seeking to establish an interface between the ultimate-level theorizing of human evolutionary psychology and the proximate level of inquiry typically favored by operant learning theorists. Following an initial overview of the central tenets of neo-Darwinism,…

  19. A nanostructured carbon-reinforced polyisobutylene-based thermoplastic elastomer.

    PubMed

    Puskas, Judit E; Foreman-Orlowski, Elizabeth A; Lim, Goy Teck; Porosky, Sara E; Evancho-Chapman, Michelle M; Schmidt, Steven P; El Fray, Mirosława; Piatek, Marta; Prowans, Piotr; Lovejoy, Krystal

    2010-03-01

    This paper presents the synthesis and characterization of a polyisobutylene (PIB)-based nanostructured carbon-reinforced thermoplastic elastomer. This thermoplastic elastomer is based on a self-assembling block copolymer having a branched PIB core carrying -OH functional groups at each branch point, flanked by blocks of poly(isobutylene-co-para-methylstyrene). The block copolymer has thermolabile physical crosslinks and can be processed as a plastic, yet retains its rubbery properties at room temperature. The carbon-reinforced thermoplastic elastomer had more than twice the tensile strength of the neat polymer, exceeding the strength of medical grade silicone rubber, while remaining significantly softer. The carbon-reinforced thermoplastic elastomer displayed a high T(g) of 126 degrees C, rendering the material steam-sterilizable. The carbon also acted as a free radical trap, increasing the onset temperature of thermal decomposition in the neat polymer from 256.6 degrees C to 327.7 degrees C. The carbon-reinforced thermoplastic elastomer had the lowest water contact angle at 82 degrees and surface nano-topography. After 180 days of implantation into rabbit soft tissues, the carbon-reinforced thermoplastic elastomer had the thinnest tissue capsule around the microdumbbell specimens, with no eosinophiles present. The material also showed excellent integration into bones. PMID:20034664

  20. Can Traditions Emerge from the Interaction of Stimulus Enhancement and Reinforcement Learning? An Experimental Model

    PubMed Central

    MATTHEWS, LUKE J; PAUKNER, ANNIKA; SUOMI, STEPHEN J

    2010-01-01

    The study of social learning in captivity and behavioral traditions in the wild are two burgeoning areas of research, but few empirical studies have tested how learning mechanisms produce emergent patterns of tradition. Studies have examined how social learning mechanisms that are cognitively complex and possessed by few species, such as imitation, result in traditional patterns, yet traditional patterns are also exhibited by species that may not possess such mechanisms. We propose an explicit model of how stimulus enhancement and reinforcement learning could interact to produce traditions. We tested the model experimentally with tufted capuchin monkeys (Cebus apella), which exhibit traditions in the wild but have rarely demonstrated imitative abilities in captive experiments. Monkeys showed both stimulus enhancement learning and a habitual bias to perform whichever behavior first obtained them a reward. These results support our model that simple social learning mechanisms combined with reinforcement can result in traditional patterns of behavior. PMID:21135912

  1. Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning.

    PubMed

    Frank, Michael J; Moustafa, Ahmed A; Haughey, Heather M; Curran, Tim; Hutchison, Kent E

    2007-10-01

    What are the genetic and neural components that support adaptive learning from positive and negative outcomes? Here, we show with genetic analyses that three independent dopaminergic mechanisms contribute to reward and avoidance learning in humans. A polymorphism in the DARPP-32 gene, associated with striatal dopamine function, predicted relatively better probabilistic reward learning. Conversely, the C957T polymorphism of the DRD2 gene, associated with striatal D2 receptor function, predicted the degree to which participants learned to avoid choices that had been probabilistically associated with negative outcomes. The Val/Met polymorphism of the COMT gene, associated with prefrontal cortical dopamine function, predicted participants' ability to rapidly adapt behavior on a trial-to-trial basis. These findings support a neurocomputational dissociation between striatal and prefrontal dopaminergic mechanisms in reinforcement learning. Computational maximum likelihood analyses reveal independent gene effects on three reinforcement learning parameters that can explain the observed dissociations. PMID:17913879

  2. Computational models of reinforcement learning: the role of dopamine as a reward signal

    PubMed Central

    Samson, R. D.; Frank, M. J.

    2010-01-01

    Reinforcement learning is ubiquitous. Unlike other forms of learning, it involves the processing of fast yet content-poor feedback information to correct assumptions about the nature of a task or of a set of stimuli. This feedback information is often delivered as generic rewards or punishments, and has little to do with the stimulus features to be learned. How can such low-content feedback lead to such an efficient learning paradigm? Through a review of existing neuro-computational models of reinforcement learning, we suggest that the efficiency of this type of learning resides in the dynamic and synergistic cooperation of brain systems that use different levels of computations. The implementation of reward signals at the synaptic, cellular, network and system levels give the organism the necessary robustness, adaptability and processing speed required for evolutionary and behavioral success. PMID:21629583

  3. Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints

    NASA Astrophysics Data System (ADS)

    Yang, Xiong; Liu, Derong; Wang, Ding

    2014-03-01

    In this paper, an adaptive reinforcement learning-based solution is developed for the infinite-horizon optimal control problem of constrained-input continuous-time nonlinear systems in the presence of nonlinearities with unknown structures. Two different types of neural networks (NNs) are employed to approximate the Hamilton-Jacobi-Bellman equation. That is, an recurrent NN is constructed to identify the unknown dynamical system, and two feedforward NNs are used as the actor and the critic to approximate the optimal control and the optimal cost, respectively. Based on this framework, the action NN and the critic NN are tuned simultaneously, without the requirement for the knowledge of system drift dynamics. Moreover, by using Lyapunov's direct method, the weights of the action NN and the critic NN are guaranteed to be uniformly ultimately bounded, while keeping the closed-loop system stable. To demonstrate the effectiveness of the present approach, simulation results are illustrated.

  4. Personalized tuning of a reinforcement learning control algorithm for glucose regulation.

    PubMed

    Daskalaki, Elena; Diem, Peter; Mougiakakou, Stavroula G

    2013-01-01

    Artificial pancreas is in the forefront of research towards the automatic insulin infusion for patients with type 1 diabetes. Due to the high inter- and intra-variability of the diabetic population, the need for personalized approaches has been raised. This study presents an adaptive, patient-specific control strategy for glucose regulation based on reinforcement learning and more specifically on the Actor-Critic (AC) learning approach. The control algorithm provides daily updates of the basal rate and insulin-to-carbohydrate (IC) ratio in order to optimize glucose regulation. A method for the automatic and personalized initialization of the control algorithm is designed based on the estimation of the transfer entropy (TE) between insulin and glucose signals. The algorithm has been evaluated in silico in adults, adolescents and children for 10 days. Three scenarios of initialization to i) zero values, ii) random values and iii) TE-based values have been comparatively assessed. The results have shown that when the TE-based initialization is used, the algorithm achieves faster learning with 98%, 90% and 73% in the A+B zones of the Control Variability Grid Analysis for adults, adolescents and children respectively after five days compared to 95%, 78%, 41% for random initialization and 93%, 88%, 41% for zero initial values. Furthermore, in the case of children, the daily Low Blood Glucose Index reduces much faster when the TE-based tuning is applied. The results imply that automatic and personalized tuning based on TE reduces the learning period and improves the overall performance of the AC algorithm. PMID:24110480

  5. A Real-time Reinforcement Learning Control System with H∞ Tracking Performance Compensator

    NASA Astrophysics Data System (ADS)

    Uchiyama, Shogo; Obayashi, Masanao; Kuremoto, Takashi; Kobayashi, Kunikazu

    Robust control theory generally guarantees robustness and stability of the closed-loop system. It however requires a mathematical model of the system to design the control system. It therefore can't often deal with nonlinear systems due to difficulty of modeling of the system. On the other hand, reinforcement learning methods can deal with nonlinear systems without any mathematical model. It however usually doesn't guarantee the stability of the system control. In this paper, we propose a “Real-time Reinforcement Learning Control System (RRLCS)” through combining reinforcement learning to treat unknown nonlinear systems and robust control theory to guarantee the robustness and stability of the system. Moreover, we analyze the stability of the proposed system using H∞ tracking performance and Lyapunov function. Finally, through the computer simulation for controlling an inverted pendulum system, we show the effectiveness of the proposed method.

  6. A Robust Cooperated Control Method with Reinforcement Learning and Adaptive H∞ Control

    NASA Astrophysics Data System (ADS)

    Obayashi, Masanao; Uchiyama, Shogo; Kuremoto, Takashi; Kobayashi, Kunikazu

    This study proposes a robust cooperated control method combining reinforcement learning with robust control to control the system. A remarkable characteristic of the reinforcement learning is that it doesn't require model formula, however, it doesn't guarantee the stability of the system. On the other hand, robust control system guarantees stability and robustness, however, it requires model formula. We employ both the actor-critic method which is a kind of reinforcement learning with minimal amount of computation to control continuous valued actions and the traditional robust control, that is, H∞ control. The proposed system was compared method with the conventional control method, that is, the actor-critic only used, through the computer simulation of controlling the angle and the position of a crane system, and the simulation result showed the effectiveness of the proposed method.

  7. Team-Based Learning

    ERIC Educational Resources Information Center

    Michaelsen, Larry K.; Sweet, Michael

    2011-01-01

    Team-based learning (TBL), when properly implemented, includes many, if not all, of the common elements of evidence-based best practices. To explain this, a brief overview of TBL is presented. The authors examine the relationship between the best practices of evidence-based teaching and the principles that constitute team-based learning. (Contains…

  8. Integration of reinforcement learning and optimal decision-making theories of the basal ganglia.

    PubMed

    Bogacz, Rafal; Larsen, Tobias

    2011-04-01

    This article seeks to integrate two sets of theories describing action selection in the basal ganglia: reinforcement learning theories describing learning which actions to select to maximize reward and decision-making theories proposing that the basal ganglia selects actions on the basis of sensory evidence accumulated in the cortex. In particular, we present a model that integrates the actor-critic model of reinforcement learning and a model assuming that the cortico-basal-ganglia circuit implements a statistically optimal decision-making procedure. The values of cortico-striatal weights required for optimal decision making in our model differ from those provided by standard reinforcement learning models. Nevertheless, we show that an actor-critic model converges to the weights required for optimal decision making when biologically realistic limits on synaptic weights are introduced. We also describe the model's predictions concerning reaction times and neural responses during learning, and we discuss directions required for further integration of reinforcement learning and optimal decision-making theories. PMID:21222528

  9. Hydrogel-based reinforcement of 3D bioprinted constructs.

    PubMed

    Melchels, Ferry P W; Blokzijl, Maarten M; Levato, Riccardo; Peiffer, Quentin C; Ruijter, Mylène de; Hennink, Wim E; Vermonden, Tina; Malda, Jos

    2016-01-01

    Progress within the field of biofabrication is hindered by a lack of suitable hydrogel formulations. Here, we present a novel approach based on a hybrid printing technique to create cellularized 3D printed constructs. The hybrid bioprinting strategy combines a reinforcing gel for mechanical support with a bioink to provide a cytocompatible environment. In comparison with thermoplastics such as [Formula: see text]-polycaprolactone, the hydrogel-based reinforcing gel platform enables printing at cell-friendly temperatures, targets the bioprinting of softer tissues and allows for improved control over degradation kinetics. We prepared amphiphilic macromonomers based on poloxamer that form hydrolysable, covalently cross-linked polymer networks. Dissolved at a concentration of 28.6%w/w in water, it functions as reinforcing gel, while a 5%w/w gelatin-methacryloyl based gel is utilized as bioink. This strategy allows for the creation of complex structures, where the bioink provides a cytocompatible environment for encapsulated cells. Cell viability of equine chondrocytes encapsulated within printed constructs remained largely unaffected by the printing process. The versatility of the system is further demonstrated by the ability to tune the stiffness of printed constructs between 138 and 263 kPa, as well as to tailor the degradation kinetics of the reinforcing gel from several weeks up to more than a year. PMID:27431861

  10. Acquisition of Flexible Image Recognition by Coupling of Reinforcement Learning and a Neural Network

    NASA Astrophysics Data System (ADS)

    Shibata, Katsunari; Kawano, Tomohiko

    The authors have proposed a very simple autonomous learning system consisting of one neural network (NN), whose inputs are raw sensor signals and whose outputs are directly passed to actuators as control signals, and which is trained by using reinforcement learning (RL). However, the current opinion seems that such simple learning systems do not actually work on complicated tasks in the real world. In this paper, with a view to developing higher functions in robots, the authors bring up the necessity to introduce autonomous learning of a massively parallel and cohesively flexible system with massive inputs based on the consideration about the brain architecture and the sequential property of our consciousness. The authors also bring up the necessity to place more importance on “optimization” of the total system under a uniform criterion than “understandability” for humans. Thus, the authors attempt to stress the importance of their proposed system when considering the future research on robot intelligence. The experimental result in a real-world-like environment shows that image recognition from as many as 6240 visual signals can be acquired through RL under various backgrounds and light conditions without providing any knowledge about image processing or the target object. It works even for camera image inputs that were not experienced in learning. In the hidden layer, template-like representation, division of roles between hidden neurons, and representation to detect the target uninfluenced by light condition or background were observed after learning. The autonomous acquisition of such useful representations or functions makes us feel the potential towards avoidance of the frame problem and the development of higher functions.

  11. Reinforcement learning control with approximation of time-dependent agent dynamics

    NASA Astrophysics Data System (ADS)

    Kirkpatrick, Kenton Conrad

    Reinforcement Learning has received a lot of attention over the years for systems ranging from static game playing to dynamic system control. Using Reinforcement Learning for control of dynamical systems provides the benefit of learning a control policy without needing a model of the dynamics. This opens the possibility of controlling systems for which the dynamics are unknown, but Reinforcement Learning methods like Q-learning do not explicitly account for time. In dynamical systems, time-dependent characteristics can have a significant effect on the control of the system, so it is necessary to account for system time dynamics while not having to rely on a predetermined model for the system. In this dissertation, algorithms are investigated for expanding the Q-learning algorithm to account for the learning of sampling rates and dynamics approximations. For determining a proper sampling rate, it is desired to find the largest sample time that still allows the learning agent to control the system to goal achievement. An algorithm called Sampled-Data Q-learning is introduced for determining both this sample time and the control policy associated with that sampling rate. Results show that the algorithm is capable of achieving a desired sampling rate that allows for system control while not sampling "as fast as possible". Determining an approximation of an agent's dynamics can be beneficial for the control of hierarchical multiagent systems by allowing a high-level supervisor to use the dynamics approximations for task allocation decisions. To this end, algorithms are investigated for learning first- and second-order dynamics approximations. These algorithms are respectively called First-Order Dynamics Learning and Second-Order Dynamics Learning. The dynamics learning algorithms are evaluated on several examples that show their capability to learn accurate approximations of state dynamics. All of these algorithms are then evaluated on hierarchical multiagent systems

  12. Automatic tuning of the reinforcement function

    SciTech Connect

    Touzet, C.; Santos, J.M.

    1997-12-31

    The aim of this work is to present a method that helps tuning the reinforcement function parameters in a reinforcement learning approach. Since the proposal of neural based implementations for the reinforcement learning paradigm (which reduced learning time and memory requirements to realistic values) reinforcement functions have become the critical components. Using a general definition for reinforcement functions, the authors solve, in a particular case, the so called exploration versus exploitation dilemma through the careful computation of the RF parameter values. They propose an algorithm to compute, during the exploration part of the learning phase, an estimate for the parameter values. Experiments with the mobile robot Nomad 200 validate their proposals.

  13. Resource-Based Learning.

    ERIC Educational Resources Information Center

    Thomas, Margie Klink

    1999-01-01

    Provides an annotated bibliography of publications related to resource-based learning, which is defined as a student-centered learning environment grounded in learning theory in which the teacher and the library-media specialist collaborate to help students with information needs, information retrieval, analyzing and synthesizing the information,…

  14. Nucleus accumbens core lesions retard instrumental learning and performance with delayed reinforcement in the rat

    PubMed Central

    Cardinal, Rudolf N; Cheung, Timothy HC

    2005-01-01

    Background Delays between actions and their outcomes severely hinder reinforcement learning systems, but little is known of the neural mechanism by which animals overcome this problem and bridge such delays. The nucleus accumbens core (AcbC), part of the ventral striatum, is required for normal preference for a large, delayed reward over a small, immediate reward (self-controlled choice) in rats, but the reason for this is unclear. We investigated the role of the AcbC in learning a free-operant instrumental response using delayed reinforcement, performance of a previously-learned response for delayed reinforcement, and assessment of the relative magnitudes of two different rewards. Results Groups of rats with excitotoxic or sham lesions of the AcbC acquired an instrumental response with different delays (0, 10, or 20 s) between the lever-press response and reinforcer delivery. A second (inactive) lever was also present, but responding on it was never reinforced. As expected, the delays retarded learning in normal rats. AcbC lesions did not hinder learning in the absence of delays, but AcbC-lesioned rats were impaired in learning when there was a delay, relative to sham-operated controls. All groups eventually acquired the response and discriminated the active lever from the inactive lever to some degree. Rats were subsequently trained to discriminate reinforcers of different magnitudes. AcbC-lesioned rats were more sensitive to differences in reinforcer magnitude than sham-operated controls, suggesting that the deficit in self-controlled choice previously observed in such rats was a consequence of reduced preference for delayed rewards relative to immediate rewards, not of reduced preference for large rewards relative to small rewards. AcbC lesions also impaired the performance of a previously-learned instrumental response in a delay-dependent fashion. Conclusions These results demonstrate that the AcbC contributes to instrumental learning and performance by

  15. Creating a Reinforcement Learning Controller for Functional Electrical Stimulation of a Human Arm.

    PubMed

    Thomas, Philip S; Branicky, Michael; van den Bogert, Antonie; Jagodnik, Kathleen

    2008-01-01

    Clinical tests have shown that the dynamics of a human arm, controlled using Functional Electrical Stimulation (FES), can vary significantly between and during trials. In this paper, we study the application of Reinforcement Learning to create a controller that can adapt to these changing dynamics of a human arm. Development and tests were done in simulation using a two-dimensional arm model and Hill-based muscle dynamics. An actor-critic architecture is used with artificial neural networks for both the actor and the critic. We begin by training it using a Proportional Derivative (PD) controller as a supervisor. We then make clinically relevant changes to the dynamics of the arm and test the actor-critic's ability to adapt without supervision in a reasonable number of episodes. PMID:22081795

  16. Multi Objective Dynamic Job Shop Scheduling using Composite Dispatching Rule and Reinforcement Learning

    NASA Astrophysics Data System (ADS)

    Chen, Xili; Hao, Xinchang; Lin, Hao Wen; Murata, Tomohiro

    The applications of composite dispatching rules for multi objective dynamic scheduling have been widely studied in literature. In general, a composite dispatching rule is a combination of several elementary dispatching rules, which is designed to optimize multiple objectives of interest under a certain scheduling environment. The relative importance of elementary dispatching rules is modeled by weight factors. A critical issue for implementation of composite dispatching rule is that the inappropriate weight values may result in poor performance. This paper presents an offline scheduling knowledge acquisition method based on reinforcement learning using simulation technique. The scheduling knowledge is applied to adjust the appropriate weight values of elementary dispatching rules in composite manner with respect to work in process fluctuation of machines during online scheduling. Implementation of the proposed method in a two objectives dynamic job shop scheduling problem is demonstrated and the results are satisfactory.

  17. Human dorsal striatal activity during choice discriminates reinforcement learning behavior from the gambler's fallacy.

    PubMed

    Jessup, Ryan K; O'Doherty, John P

    2011-04-27

    Reinforcement learning theory has generated substantial interest in neurobiology, particularly because of the resemblance between phasic dopamine and reward prediction errors. Actor-critic theories have been adapted to account for the functions of the striatum, with parts of the dorsal striatum equated to the actor. Here, we specifically test whether the human dorsal striatum--as predicted by an actor-critic instantiation--is used on a trial-to-trial basis at the time of choice to choose in accordance with reinforcement learning theory, as opposed to a competing strategy: the gambler's fallacy. Using a partial-brain functional magnetic resonance imaging scanning protocol focused on the striatum and other ventral brain areas, we found that the dorsal striatum is more active when choosing consistent with reinforcement learning compared with the competing strategy. Moreover, an overlapping area of dorsal striatum along with the ventral striatum was found to be correlated with reward prediction errors at the time of outcome, as predicted by the actor-critic framework. These findings suggest that the same region of dorsal striatum involved in learning stimulus-response associations may contribute to the control of behavior during choice, thereby using those learned associations. Intriguingly, neither reinforcement learning nor the gambler's fallacy conformed to the optimal choice strategy on the specific decision-making task we used. Thus, the dorsal striatum may contribute to the control of behavior according to reinforcement learning even when the prescriptions of such an algorithm are suboptimal in terms of maximizing future rewards. PMID:21525269

  18. A Two-Stage Relational Reinforcement Learning with Continuous Actions for Real Service Robots

    NASA Astrophysics Data System (ADS)

    Zaragoza, Julio H.; Morales, Eduardo F.

    Reinforcement Learning is a commonly used technique in robotics, however, traditional algorithms are unable to handle large amounts of data coming from the robot’s sensors, require long training times, are unable to re-use learned policies on similar domains, and use discrete actions. This work introduces TS-RRLCA, a two stage method to tackle these problems. In the first stage, low-level data coming from the robot’s sensors is transformed into a more natural, relational representation based on rooms, walls, corners, doors and obstacles, significantly reducing the state space. We also use Behavioural Cloning, i.e., traces provided by the user to learn, in few iterations, a relational policy that can be re-used in different environments. In the second stage, we use Locally Weighted Regression to transform the initial policy into a continuous actions policy. We tested our approach with a real service robot on different environments for different navigation and following tasks. Results show how the policies can be used on different domains and perform smoother, faster and shorter paths than the original policies.

  19. Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems

    NASA Astrophysics Data System (ADS)

    Wei, Qing-Lai; Song, Rui-Zhuo; Sun, Qiu-Ye; Xiao, Wen-Dong

    2015-09-01

    This paper estimates an off-policy integral reinforcement learning (IRL) algorithm to obtain the optimal tracking control of unknown chaotic systems. Off-policy IRL can learn the solution of the HJB equation from the system data generated by an arbitrary control. Moreover, off-policy IRL can be regarded as a direct learning method, which avoids the identification of system dynamics. In this paper, the performance index function is first given based on the system tracking error and control error. For solving the Hamilton-Jacobi-Bellman (HJB) equation, an off-policy IRL algorithm is proposed. It is proven that the iterative control makes the tracking error system asymptotically stable, and the iterative performance index function is convergent. Simulation study demonstrates the effectiveness of the developed tracking control method. Project supported by the National Natural Science Foundation of China (Grant Nos. 61304079 and 61374105), the Beijing Natural Science Foundation, China (Grant Nos. 4132078 and 4143065), the China Postdoctoral Science Foundation (Grant No. 2013M530527), the Fundamental Research Funds for the Central Universities, China (Grant No. FRF-TP-14-119A2), and the Open Research Project from State Key Laboratory of Management and Control for Complex Systems, China (Grant No. 20150104).

  20. A reinforcement learning approach in rotated image recognition and its convergence analysis

    NASA Astrophysics Data System (ADS)

    Iftekharuddin, Khan M.; Li, Yaqin

    2006-08-01

    One of the major problems in automatic target recognition (ATR) involves the recognition of images in different orientations. In a classical training-testing setup for an ATR for rotated targets using neural networks, the recognition is inherently static, and the performance is largely dependent on the range of orientation angles in the training process. To alleviate this problem, we propose a reinforcement learning (RL) approach for the ATR of rotated images. The RL is implemented in an adaptive critic design (ACD) framework wherein the ACD is mainly composed of the neuro-dynamic programming of an action network and a critic network. The proposed RL provides an adaptive learning and object recognition ability without a priori training. Numerical simulations demonstrate that the proposed ACD-based ATR system can effectively recognize rotated target with the whole range of 180° rotation. Analytic characterization of the learning algorithm provides a sufficient condition for its asymptotic convergence. Finally, we obtain an upper bound for the estimation error of the cost-to-go function under the asymptotic convergence condition.

  1. Aggression as Positive Reinforcement in Mice under Various Ratio- and Time-Based Reinforcement Schedules

    ERIC Educational Resources Information Center

    May, Michael E.; Kennedy, Craig H.

    2009-01-01

    There is evidence suggesting aggression may be a positive reinforcer in many species. However, only a few studies have examined the characteristics of aggression as a positive reinforcer in mice. Four types of reinforcement schedules were examined in the current experiment using male Swiss CFW albino mice in a resident-intruder model of aggression…

  2. Active-learning strategies: the use of a game to reinforce learning in nursing education. A case study.

    PubMed

    Boctor, Lisa

    2013-03-01

    The majority of nursing students are kinesthetic learners, preferring a hands-on, active approach to education. Research shows that active-learning strategies can increase student learning and satisfaction. This study looks at the use of one active-learning strategy, a Jeopardy-style game, 'Nursopardy', to reinforce Fundamentals of Nursing material, aiding in students' preparation for a standardized final exam. The game was created keeping students varied learning styles and the NCLEX blueprint in mind. The blueprint was used to create 5 categories, with 26 total questions. Student survey results, using a five-point Likert scale showed that they did find this learning method enjoyable and beneficial to learning. More research is recommended regarding learning outcomes, when using active-learning strategies, such as games. PMID:22910398

  3. Reinforcement learning of self-regulated β-oscillations for motor restoration in chronic stroke.

    PubMed

    Naros, Georgios; Gharabaghi, Alireza

    2015-01-01

    Neurofeedback training of Motor imagery (MI)-related brain-states with brain-computer/brain-machine interfaces (BCI/BMI) is currently being explored as an experimental intervention prior to standard physiotherapy to improve the motor outcome of stroke rehabilitation. The use of BCI/BMI technology increases the adherence to MI training more efficiently than interventions with sham or no feedback. Moreover, pilot studies suggest that such a priming intervention before physiotherapy might-like some brain stimulation techniques-increase the responsiveness of the brain to the subsequent physiotherapy, thereby improving the general clinical outcome. However, there is little evidence up to now that these BCI/BMI-based interventions have achieved operate conditioning of specific brain states that facilitate task-specific functional gains beyond the practice of primed physiotherapy. In this context, we argue that BCI/BMI technology provides a valuable neurofeedback tool for rehabilitation but needs to aim at physiological features relevant for the targeted behavioral gain. Moreover, this therapeutic intervention has to be informed by concepts of reinforcement learning to develop its full potential. Such a refined neurofeedback approach would need to address the following issues: (1) Defining a physiological feedback target specific to the intended behavioral gain, e.g., β-band oscillations for cortico-muscular communication. This targeted brain state could well be different from the brain state optimal for the neurofeedback task, e.g., α-band oscillations for differentiating MI from rest; (2) Selecting a BCI/BMI classification and thresholding approach on the basis of learning principles, i.e., balancing challenge and reward of the neurofeedback task instead of maximizing the classification accuracy of the difficulty level device; and (3) Adjusting the difficulty level in the course of the training period to account for the cognitive load and the learning experience of

  4. Reinforcement learning of self-regulated β-oscillations for motor restoration in chronic stroke

    PubMed Central

    Naros, Georgios; Gharabaghi, Alireza

    2015-01-01

    Neurofeedback training of Motor imagery (MI)-related brain-states with brain-computer/brain-machine interfaces (BCI/BMI) is currently being explored as an experimental intervention prior to standard physiotherapy to improve the motor outcome of stroke rehabilitation. The use of BCI/BMI technology increases the adherence to MI training more efficiently than interventions with sham or no feedback. Moreover, pilot studies suggest that such a priming intervention before physiotherapy might—like some brain stimulation techniques—increase the responsiveness of the brain to the subsequent physiotherapy, thereby improving the general clinical outcome. However, there is little evidence up to now that these BCI/BMI-based interventions have achieved operate conditioning of specific brain states that facilitate task-specific functional gains beyond the practice of primed physiotherapy. In this context, we argue that BCI/BMI technology provides a valuable neurofeedback tool for rehabilitation but needs to aim at physiological features relevant for the targeted behavioral gain. Moreover, this therapeutic intervention has to be informed by concepts of reinforcement learning to develop its full potential. Such a refined neurofeedback approach would need to address the following issues: (1) Defining a physiological feedback target specific to the intended behavioral gain, e.g., β-band oscillations for cortico-muscular communication. This targeted brain state could well be different from the brain state optimal for the neurofeedback task, e.g., α-band oscillations for differentiating MI from rest; (2) Selecting a BCI/BMI classification and thresholding approach on the basis of learning principles, i.e., balancing challenge and reward of the neurofeedback task instead of maximizing the classification accuracy of the difficulty level device; and (3) Adjusting the difficulty level in the course of the training period to account for the cognitive load and the learning experience

  5. Expectancies in decision making, reinforcement learning, and ventral striatum.

    PubMed

    van der Meer, Matthijs A A; Redish, A David

    2010-01-01

    Decisions can arise in different ways, such as from a gut feeling, doing what worked last time, or planful deliberation. Different decision-making systems are dissociable behaviorally, map onto distinct brain systems, and have different computational demands. For instance, "model-free" decision strategies use prediction errors to estimate scalar action values from previous experience, while "model-based" strategies leverage internal forward models to generate and evaluate potentially rich outcome expectancies. Animal learning studies indicate that expectancies may arise from different sources, including not only forward models but also Pavlovian associations, and the flexibility with which such representations impact behavior may depend on how they are generated. In the light of these considerations, we review the results of van der Meer and Redish (2009a), who found that ventral striatal neurons that respond to reward delivery can also be activated at other points, notably at a decision point where hippocampal forward representations were also observed. These data suggest the possibility that ventral striatal reward representations contribute to model-based expectancies used in deliberative decision making. PMID:21221409

  6. Reinforcement function design and bias for efficient learning in mobile robots

    SciTech Connect

    Touzet, C.; Santos, J.M.

    1998-06-01

    The main paradigm in sub-symbolic learning robot domain is the reinforcement learning method. Various techniques have been developed to deal with the memorization/generalization problem, demonstrating the superior ability of artificial neural network implementations. In this paper, the authors address the issue of designing the reinforcement so as to optimize the exploration part of the learning. They also present and summarize works relative to the use of bias intended to achieve the effective synthesis of the desired behavior. Demonstrative experiments involving a self-organizing map implementation of the Q-learning and real mobile robots (Nomad 200 and Khepera) in a task of obstacle avoidance behavior synthesis are described. 3 figs., 5 tabs.

  7. Multiagent reinforcement learning: spiking and nonspiking agents in the iterated Prisoner's Dilemma.

    PubMed

    Vassiliades, Vassilis; Cleanthous, Aristodemos; Christodoulou, Chris

    2011-04-01

    This paper investigates multiagent reinforcement learning (MARL) in a general-sum game where the payoffs' structure is such that the agents are required to exploit each other in a way that benefits all agents. The contradictory nature of these games makes their study in multiagent systems quite challenging. In particular, we investigate MARL with spiking and nonspiking agents in the Iterated Prisoner's Dilemma by exploring the conditions required to enhance its cooperative outcome. The spiking agents are neural networks with leaky integrate-and-fire neurons trained with two different learning algorithms: 1) reinforcement of stochastic synaptic transmission, or 2) reward-modulated spike-timing-dependent plasticity with eligibility trace. The nonspiking agents use a tabular representation and are trained with Q- and SARSA learning algorithms, with a novel reward transformation process also being applied to the Q-learning agents. According to the results, the cooperative outcome is enhanced by: 1) transformed internal reinforcement signals and a combination of a high learning rate and a low discount factor with an appropriate exploration schedule in the case of non-spiking agents, and 2) having longer eligibility trace time constant in the case of spiking agents. Moreover, it is shown that spiking and nonspiking agents have similar behavior and therefore they can equally well be used in a multiagent interaction setting. For training the spiking agents in the case where more than one output neuron competes for reinforcement, a novel and necessary modification that enhances competition is applied to the two learning algorithms utilized, in order to avoid a possible synaptic saturation. This is done by administering to the networks additional global reinforcement signals for every spike of the output neurons that were not "responsible" for the preceding decision. PMID:21421435

  8. Selective learning impairment of delayed reinforcement autoshaped behavior caused by low doses of trimethyltin.

    PubMed

    Cohen, C A; Messing, R B; Sparber, S B

    1987-01-01

    The organometal neurotoxin trimethyltin (TMT), induces impaired learning and memory for various tasks. However, administration is also associated with other "non-specific" behavioral changes which may be responsible for effects on conditioned behaviors. To determine if TMT treatment causes a specific learning impairment, three experiments were done using variations of a delay of reinforcement autoshaping task in which rats learn to associate the presentation and retraction of a lever with the delivery of a food pellet reinforcer. No significant effects of TMT treatment were found with a short (4 s) delay of reinforcement, indicating that rats were motivated and had the sensorimotor capacity for learning. When the delay was increased to 6 s, 3.0 or 6.0 mg TMT/kg produced dose-related reductions in behaviors directed towards the lever. Performance of a group given 7.5 mg TMT/kg, while still impaired relative to controls, appeared to be better than the performance of groups given lower doses. This paradoxical effect was investigated with a latent inhibition paradigm, in which rats were pre-exposed to the Skinner boxes for several sessions without delivery of food reinforcement. Control rats showed retardation of autoshaping when food reinforcement was subsequently introduced. Rats given 7.5 mg TMT/kg exhibited elevated levels of lever responding during pre-exposure and autoshaping sessions. The results indicate that 7.5 mg TMT/kg produces learning impairments which are confounded by hyperreactivity to the environment and an inability to suppress behavior toward irrelevant stimuli. In contrast, low doses of TMT cause learning impairments which are not confounded by hyperreactivity, and may prove to be useful models for studying specific associational dysfunctions. PMID:3124161

  9. Biopolymer based nanocomposites reinforced with graphene nanoplatelets

    NASA Astrophysics Data System (ADS)

    Botta, L.; Scaffaro, R.; Mistretta, M. C.; La Mantia, F. P.

    2016-05-01

    In this work, biopolymer based nanocomposites filled with graphene nanoplatelets (GnP) were prepared by melt compounding in a batch mixer. The polymer used as matrix was a commercial biodegradable polymer-blend of PLA and a copolyester (BioFlex®). The prepared materials were characterized by scanning electron microscopy (SEM), rheological and mechanical measurements. Moreover, the effect of the GnP amount on the investigated properties was evaluated. The results indicated that the incorporation of GnP increased the stiffness of the biopolymeric matrix.

  10. Language Learning of Children as a Function of Sensory Mode of Presentation and Reinforcement Procedure.

    ERIC Educational Resources Information Center

    Oyer, Herbert J.; Frankmann, Judith P.

    Programed training filmstrips from Project LIFE (Language Instruction to Facilitate Education) were used with 114 hearing impaired children and 15 normal hearing language impaired children (4- to 13-years old) to assess the effects of auditory supplementation and a token reinforcement program on language learning and to investigate retention and…

  11. Fabrication of tungsten wire reinforced nickel-base alloy composites

    NASA Technical Reports Server (NTRS)

    Brentnall, W. D.; Toth, I. J.

    1974-01-01

    Fabrication methods for tungsten fiber reinforced nickel-base superalloy composites were investigated. Three matrix alloys in pre-alloyed powder or rolled sheet form were evaluated in terms of fabricability into composite monotape and multi-ply forms. The utility of monotapes for fabricating more complex shapes was demonstrated. Preliminary 1093C (2000F) stress rupture tests indicated that efficient utilization of fiber strength was achieved in composites fabricated by diffusion bonding processes. The fabrication of thermal fatigue specimens is also described.

  12. Rheology of Carbon Fibre Reinforced Cement-Based Mortar

    SciTech Connect

    Banfill, Phillip F. G.; Starrs, Gerry; McCarter, W. John

    2008-07-07

    Carbon fibre reinforced cement based materials (CFRCs) offer the possibility of fabricating 'smart' electrically conductive materials. Rheology of the fresh mix is crucial to satisfactory moulding and fresh CFRC conforms to the Bingham model with slight structural breakdown. Both yield stress and plastic viscosity increase with increasing fibre length and volume concentration. Using a modified Viskomat NT, the concentration dependence of CFRC rheology up to 1.5% fibre volume is reported.

  13. Rheology of Carbon Fibre Reinforced Cement-Based Mortar

    NASA Astrophysics Data System (ADS)

    Banfill, Phillip F. G.; Starrs, Gerry; McCarter, W. John

    2008-07-01

    Carbon fibre reinforced cement based materials (CFRCs) offer the possibility of fabricating "smart" electrically conductive materials. Rheology of the fresh mix is crucial to satisfactory moulding and fresh CFRC conforms to the Bingham model with slight structural breakdown. Both yield stress and plastic viscosity increase with increasing fibre length and volume concentration. Using a modified Viskomat NT, the concentration dependence of CFRC rheology up to 1.5% fibre volume is reported.

  14. Cognitively inspired reinforcement learning architecture and its application to giant-swing motion control.

    PubMed

    Uragami, Daisuke; Takahashi, Tatsuji; Matsuo, Yoshiki

    2014-02-01

    Many algorithms and methods in artificial intelligence or machine learning were inspired by human cognition. As a mechanism to handle the exploration-exploitation dilemma in reinforcement learning, the loosely symmetric (LS) value function that models causal intuition of humans was proposed (Shinohara et al., 2007). While LS shows the highest correlation with causal induction by humans, it has been reported that it effectively works in multi-armed bandit problems that form the simplest class of tasks representing the dilemma. However, the scope of application of LS was limited to the reinforcement learning problems that have K actions with only one state (K-armed bandit problems). This study proposes LS-Q learning architecture that can deal with general reinforcement learning tasks with multiple states and delayed reward. We tested the learning performance of the new architecture in giant-swing robot motion learning, where uncertainty and unknown-ness of the environment is huge. In the test, the help of ready-made internal models or functional approximation of the state space were not given. The simulations showed that while the ordinary Q-learning agent does not reach giant-swing motion because of stagnant loops (local optima with low rewards), LS-Q escapes such loops and acquires giant-swing. It is confirmed that the smaller number of states is, in other words, the more coarse-grained the division of states and the more incomplete the state observation is, the better LS-Q performs in comparison with Q-learning. We also showed that the high performance of LS-Q depends comparatively little on parameter tuning and learning time. This suggests that the proposed method inspired by human cognition works adaptively in real environments. PMID:24296286

  15. Neurocomputational mechanisms of reinforcement-guided learning in humans: a review.

    PubMed

    Cohen, Michael X

    2008-06-01

    Adapting decision making according to dynamic and probabilistic changes in action-reward contingencies is critical for survival in a competitive and resource-limited world. Much research has focused on elucidating the neural systems and computations that underlie how the brain identifies whether the consequences of actions are relatively good or bad. In contrast, less empirical research has focused on the mechanisms by which reinforcements might be used to guide decision making. Here, I review recent studies in which an attempt to bridge this gap has been made by characterizing how humans use reward information to guide and optimize decision making. Regions that have been implicated in reinforcement processing, including the striatum, orbitofrontal cortex, and anterior cingulate, also seem to mediate how reinforcements are used to adjust subsequent decision making. This research provides insights into why the brain devotes resources to evaluating reinforcements and suggests a direction for future research, from studying the mechanisms of reinforcement processing to studying the mechanisms of reinforcement learning. PMID:18589502

  16. Predictive and reinforcement learning for magneto-hydrodynamic control of hypersonic flows

    NASA Astrophysics Data System (ADS)

    Kulkarni, Nilesh Vijay

    Increasing needs for autonomy in future aerospace systems and immense progress in computing technology have motivated the development of on-line adaptive control techniques to account for modeling errors, changes in system dynamics, and faults occurring during system operation. After extensive treatment of the inner-loop adaptive control dealing mainly with stable adaptation towards desired transient behavior, adaptive optimal control has started receiving attention in literature. Motivated by the problem of optimal control of the magneto-hydrodynamic (MHD) generator at the inlet of the scramjet engine of a hypersonic flight vehicle, this thesis treats the general problem of efficiently combining off-line and on-line optimal control methods. The predictive control approach is chosen as the off-line method for designing optimal controllers using all the existing system knowledge. This controller is then adapted on-line using policy-iteration-based Q-learning, which is a stable model-free reinforcement learning approach. The combined approach is first illustrated in the optimal control of linear systems, which helps in the analysis as well as the validation of the method. A novel neural-networks-based parametric predictive control approach is then designed for the off-line optimal control of non-linear systems. The off-line approach is illustrated by applications to aircraft and spacecraft systems. This is followed by an extensive treatment of the off-line optimal control of the MHD generator using this neuro-control approach. On-line adaptation of the controller is implemented using several novel schemes derived from the policy-iteration-based Q-learning. The implementation results demonstrate the success of these on-line algorithms for adapting towards modeling errors in the off-line design.

  17. A neural-network reinforcement-learning model of domestic chicks that learn to localize the centre of closed arenas.

    PubMed

    Mannella, Francesco; Baldassarre, Gianluca

    2007-03-29

    Previous experiments have shown that when domestic chicks (Gallus gallus) are first trained to locate food elements hidden at the centre of a closed square arena and then are tested in a square arena of double the size, they search for food both at its centre and at a distance from walls similar to the distance of the centre from the walls experienced during training. This paper presents a computational model that successfully reproduces these behaviours. The model is based on a neural-network implementation of the reinforcement-learning actor - critic architecture (in this architecture the 'critic' learns to evaluate perceived states in terms of predicted future rewards, while the 'actor' learns to increase the probability of selecting the actions that lead to higher evaluations). The analysis of the model suggests which type of information and cognitive mechanisms might underlie chicks' behaviours: (i) the tendency to explore the area at a specific distance from walls might be based on the processing of the height of walls' horizontal edges, (ii) the capacity to generalize the search at the centre of square arenas independently of their size might be based on the processing of the relative position of walls' vertical edges on the horizontal plane (equalization of walls' width), and (iii) the whole behaviour exhibited in the large square arena can be reproduced by assuming the existence of an attention process that, at each time, focuses chicks' internal processing on either one of the two previously discussed information sources. The model also produces testable predictions regarding the generalization capabilities that real chicks should exhibit if trained in circular arenas of varying size. The paper also highlights the potentialities of the model to address other experiments on animals' navigation and analyses its strengths and weaknesses in comparison to other models. PMID:17255019

  18. An average-reward reinforcement learning algorithm for computing bias-optimal policies

    SciTech Connect

    Mahadevan, S.

    1996-12-31

    Average-reward reinforcement learning (ARL) is an undiscounted optimality framework that is generally applicable to a broad range of control tasks. ARL computes gain-optimal control policies that maximize the expected payoff per step. However, gain-optimality has some intrinsic limitations as an optimality criterion, since for example, it cannot distinguish between different policies that all reach an absorbing goal state, but incur varying costs. A more selective criterion is bias optimality, which can filter gain-optimal policies to select those that reach absorbing goals with the minimum cost. While several ARL algorithms for computing gain-optimal policies have been proposed, none of these algorithms can guarantee bias optimality, since this requires solving at least two nested optimality equations. In this paper, we describe a novel model-based ARL algorithm for computing bias-optimal policies. We test the proposed algorithm using an admission control queuing system, and show that it is able to utilize the queue much more efficiently than a gain-optimal method by learning bias-optimal policies.

  19. Reinforcement learning versus model predictive control: a comparison on a power system problem.

    PubMed

    Ernst, Damien; Glavic, Mevludin; Capitanescu, Florin; Wehenkel, Louis

    2009-04-01

    This paper compares reinforcement learning (RL) with model predictive control (MPC) in a unified framework and reports experimental results of their application to the synthesis of a controller for a nonlinear and deterministic electrical power oscillations damping problem. Both families of methods are based on the formulation of the control problem as a discrete-time optimal control problem. The considered MPC approach exploits an analytical model of the system dynamics and cost function and computes open-loop policies by applying an interior-point solver to a minimization problem in which the system dynamics are represented by equality constraints. The considered RL approach infers in a model-free way closed-loop policies from a set of system trajectories and instantaneous cost values by solving a sequence of batch-mode supervised learning problems. The results obtained provide insight into the pros and cons of the two approaches and show that RL may certainly be competitive with MPC even in contexts where a good deterministic system model is available. PMID:19095542

  20. Robot-assisted motor training: assistance decreases exploration during reinforcement learning.

    PubMed

    Sans-Muntadas, Albert; Duarte, Jaime E; Reinkensmeyer, David J

    2014-01-01

    Reinforcement learning (RL) is a form of motor learning that robotic therapy devices could potentially manipulate to promote neurorehabilitation. We developed a system that requires trainees to use RL to learn a predefined target movement. The system provides higher rewards for movements that are more similar to the target movement. We also developed a novel algorithm that rewards trainees of different abilities with comparable reward sizes. This algorithm measures a trainee's performance relative to their best performance, rather than relative to an absolute target performance, to determine reward. We hypothesized this algorithm would permit subjects who cannot normally achieve high reward levels to do so while still learning. In an experiment with 21 unimpaired human subjects, we found that all subjects quickly learned to make a first target movement with and without the reward equalization. However, artificially increasing reward decreased the subjects' tendency to engage in exploration and therefore slowed learning, particularly when we changed the target movement. An anti-slacking watchdog algorithm further slowed learning. These results suggest that robotic algorithms that assist trainees in achieving rewards or in preventing slacking might, over time, discourage the exploration needed for reinforcement learning. PMID:25570749

  1. A reinforcement learning mechanism responsible for the valuation of free choice

    PubMed Central

    Cockburn, Jeffrey; Collins, Anne G.E.; Frank, Michael J.

    2014-01-01

    Summary Humans exhibit a preference for options they have freely chosen over equally valued options they have not; however, the neural mechanism that drives this bias and its functional significance have yet to be identified. Here, we propose a model in which choice biases arise due to amplified positive reward prediction errors associated with free choice. Using a novel variant of a probabilistic learning task, we show that choice biases are selective to options that are predominantly associated with positive outcomes. A polymorphism in DARPP-32, a gene linked to dopaminergic striatal plasticity and individual differences in reinforcement learning, was found to predict the effect of choice as a function of value. We propose that these choice biases are the behavioral byproduct of a credit assignment mechanism responsible for ensuring the effective delivery of dopaminergic reinforcement learning signals broadcast to the striatum. PMID:25066083

  2. Basalt fiber reinforced porous aggregates-geopolymer based cellular material

    NASA Astrophysics Data System (ADS)

    Luo, Xin; Xu, Jin-Yu; Li, Weimin

    2015-09-01

    Basalt fiber reinforced porous aggregates-geopolymer based cellular material (BFRPGCM) was prepared. The stress-strain curve has been worked out. The ideal energy-absorbing efficiency has been analyzed and the application prospect has been explored. The results show the following: fiber reinforced cellular material has successively sized pore structures; the stress-strain curve has two stages: elastic stage and yielding plateau stage; the greatest value of the ideal energy-absorbing efficiency of BFRPGCM is 89.11%, which suggests BFRPGCM has excellent energy-absorbing property. Thus, it can be seen that BFRPGCM is easy and simple to make, has high plasticity, low density and excellent energy-absorbing features. So, BFRPGCM is a promising energy-absorbing material used especially in civil defense engineering.

  3. CONCRETE POURS HAVE PRODUCED A REINFORCED SUPPORT BASE FOR MTR ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    CONCRETE POURS HAVE PRODUCED A REINFORCED SUPPORT BASE FOR MTR REACTOR. PIPE TUNNEL IS UNDER CONSTRUCTION AT CENTER OF VIEW. PIPES WILL CARRY RADIOACTIVE WATER FROM REACTOR TO WATER PROCESS BUILDING. CAMERA LOOKS SOUTH INTO TUNNEL ALONG WEST SIDE OF REACTOR BASE. TWO CAISSONS ARE AT LEFT SIDE OF VIEW. NOTE "WINDOW" IN SOUTH FACE OF REACTOR BASE AND ALSO GROUP OF PENETRATIONS TO ITS LEFT. INL NEGATIVE NO. 733. Unknown Photographer, 10/6/1950 - Idaho National Engineering Laboratory, Test Reactor Area, Materials & Engineering Test Reactors, Scoville, Butte County, ID

  4. Resource-Based Learning.

    ERIC Educational Resources Information Center

    Brown, Sally, Ed.; Smith, Brenda, Ed.

    The selections in this book encompass a broad spectrum of resource-based learning experiences, and are intended to help teachers and administrators gain a better understanding of the concepts and devise effective and efficient ways to use these materials. Titles include: "Introducing Resources for Learning" (Sally Brown and Brenda Smith);…

  5. Problem-Based Learning

    ERIC Educational Resources Information Center

    Allen, Deborah E.; Donham, Richard S.; Bernhardt, Stephen A.

    2011-01-01

    In problem-based learning (PBL), students working in collaborative groups learn by resolving complex, realistic problems under the guidance of faculty. There is some evidence of PBL effectiveness in medical school settings where it began, and there are numerous accounts of PBL implementation in various undergraduate contexts, replete with…

  6. Spiking neural networks with different reinforcement learning (RL) schemes in a multiagent setting.

    PubMed

    Christodoulou, Chris; Cleanthous, Aristodemos

    2010-12-31

    This paper investigates the effectiveness of spiking agents when trained with reinforcement learning (RL) in a challenging multiagent task. In particular, it explores learning through reward-modulated spike-timing dependent plasticity (STDP) and compares it to reinforcement of stochastic synaptic transmission in the general-sum game of the Iterated Prisoner's Dilemma (IPD). More specifically, a computational model is developed where we implement two spiking neural networks as two "selfish" agents learning simultaneously but independently, competing in the IPD game. The purpose of our system (or collective) is to maximise its accumulated reward in the presence of reward-driven competing agents within the collective. This can only be achieved when the agents engage in a behaviour of mutual cooperation during the IPD. Previously, we successfully applied reinforcement of stochastic synaptic transmission to the IPD game. The current study utilises reward-modulated STDP with eligibility trace and results show that the system managed to exhibit the desired behaviour by establishing mutual cooperation between the agents. It is noted that the cooperative outcome was attained after a relatively short learning period which enhanced the accumulation of reward by the system. As in our previous implementation, the successful application of the learning algorithm to the IPD becomes possible only after we extended it with additional global reinforcement signals in order to enhance competition at the neuronal level. Moreover it is also shown that learning is enhanced (as indicated by an increased IPD cooperative outcome) through: (i) strong memory for each agent (regulated by a high eligibility trace time constant) and (ii) firing irregularity produced by equipping the agents' LIF neurons with a partial somatic reset mechanism. PMID:21793357

  7. Reinforcement-based decision making in corticostriatal circuits: mutual constraints by neurocomputational and diffusion models.

    PubMed

    Ratcliff, Roger; Frank, Michael J

    2012-05-01

    In this letter, we examine the computational mechanisms of reinforce-ment-based decision making. We bridge the gap across multiple levels of analysis, from neural models of corticostriatal circuits-the basal ganglia (BG) model (Frank, 2005 , 2006 ) to simpler but mathematically tractable diffusion models of two-choice decision making. Specifically, we generated simulated data from the BG model and fit the diffusion model (Ratcliff, 1978 ) to it. The standard diffusion model fits underestimated response times under conditions of high response and reinforcement conflict. Follow-up fits showed good fits to the data both by increasing nondecision time and by raising decision thresholds as a function of conflict and by allowing this threshold to collapse with time. This profile captures the role and dynamics of the subthalamic nucleus in BG circuitry, and as such, parametric modulations of projection strengths from this nucleus were associated with parametric increases in decision boundary and its modulation by conflict. We then present data from a human reinforcement learning experiment involving decisions with low- and high-reinforcement conflict. Again, the standard model failed to fit the data, but we found that two variants similar to those that fit the BG model data fit the experimental data, thereby providing a convergence of theoretical accounts of complex interactive decision-making mechanisms consistent with available data. This work also demonstrates how to make modest modifications to diffusion models to summarize core computations of the BG model. The result is a better fit and understanding of reinforcement-based choice data than that which would have occurred with either model alone. PMID:22295983

  8. Dopamine-dependent reinforcement of motor skill learning: evidence from Gilles de la Tourette syndrome.

    PubMed

    Palminteri, Stefano; Lebreton, Maël; Worbe, Yulia; Hartmann, Andreas; Lehéricy, Stéphane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias

    2011-08-01

    Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only affect choices but also motor skills such as typing. Here, we employed a novel paradigm to demonstrate that monetary rewards can improve motor skill learning in humans. Indeed, healthy participants progressively got faster in executing sequences of key presses that were repeatedly rewarded with 10 euro compared with 1 cent. Control tests revealed that the effect of reinforcement on motor skill learning was independent of subjects being aware of sequence-reward associations. To account for this implicit effect, we developed an actor-critic model, in which reward prediction errors are used by the critic to update state values and by the actor to facilitate action execution. To assess the role of dopamine in such computations, we applied the same paradigm in patients with Gilles de la Tourette syndrome, who were either unmedicated or treated with neuroleptics. We also included patients with focal dystonia, as an example of hyperkinetic motor disorder unrelated to dopamine. Model fit showed the following dissociation: while motor skills were affected in all patient groups, reinforcement learning was selectively enhanced in unmedicated patients with Gilles de la Tourette syndrome and impaired by neuroleptics. These results support the hypothesis that overactive dopamine transmission leads to excessive reinforcement of motor sequences, which might explain the formation of tics in Gilles de la Tourette syndrome. PMID:21727098

  9. Error-related negativity predicts reinforcement learning and conflict biases.

    PubMed

    Frank, Michael J; Woroch, Brion S; Curran, Tim

    2005-08-18

    The error-related negativity (ERN) is an electrophysiological marker thought to reflect changes in dopamine when participants make errors in cognitive tasks. Our computational model further predicts that larger ERNs should be associated with better learning to avoid maladaptive responses. Here we show that participants who avoided negative events had larger ERNs than those who were biased to learn more from positive outcomes. We also tested for effects of response conflict on ERN magnitude. While there was no overall effect of conflict, positive learners had larger ERNs when having to choose among two good options (win/win decisions) compared with two bad options (lose/lose decisions), whereas negative learners exhibited the opposite pattern. These results demonstrate that the ERN predicts the degree to which participants are biased to learn more from their mistakes than their correct choices and clarify the extent to which it indexes decision conflict. PMID:16102533

  10. Reinforcement of cement-based matrices with graphite nanomaterials

    NASA Astrophysics Data System (ADS)

    Sadiq, Muhammad Maqbool

    Cement-based materials offer a desirable balance of compressive strength, moisture resistance, durability, economy and energy-efficiency; their tensile strength, fracture energy and durability in aggressive environments, however, could benefit from further improvements. An option for realizing some of these improvements involves introduction of discrete fibers into concrete. When compared with today's micro-scale (steel, polypropylene, glass, etc.) fibers, graphite nanomaterials (carbon nanotube, nanofiber and graphite nanoplatelet) offer superior geometric, mechanical and physical characteristics. Graphite nanomaterials would realize their reinforcement potential as far as they are thoroughly dispersed within cement-based matrices, and effectively bond to cement hydrates. The research reported herein developed non-covalent and covalent surface modification techniques to improve the dispersion and interfacial interactions of graphite nanomaterials in cement-based matrices with a dense and well graded micro-structure. The most successful approach involved polymer wrapping of nanomaterials for increasing the density of hydrophilic groups on the nanomaterial surface without causing any damage to the their structure. The nanomaterials were characterized using various spectrometry techniques, and SEM (Scanning Electron Microscopy). The graphite nanomaterials were dispersed via selected sonication procedures in the mixing water of the cement-based matrix; conventional mixing and sample preparation techniques were then employed to prepare the cement-based nanocomposite samples, which were subjected to steam curing. Comprehensive engineering and durability characteristics of cement-based nanocomposites were determined and their chemical composition, microstructure and failure mechanisms were also assessed through various spectrometry, thermogravimetry, electron microscopy and elemental analyses. Both functionalized and non-functionalized nanomaterials as well as different

  11. A hypothesis for basal ganglia-dependent reinforcement learning in the songbird

    PubMed Central

    Fee, Michale S.; Goldberg, Jesse H.

    2011-01-01

    Most of our motor skills are not innately programmed, but are learned by a combination of motor exploration and performance evaluation, suggesting that they proceed through a reinforcement learning (RL) mechanism. Songbirds have emerged as a model system to study how a complex behavioral sequence can be learned through an RL-like strategy. Interestingly, like motor sequence learning in mammals, song learning in birds requires a basal ganglia (BG)-thalamocortical loop, suggesting common neural mechanisms. Here we outline a specific working hypothesis for how BG-forebrain circuits could utilize an internally computed reinforcement signal to direct song learning. Our model includes a number of general concepts borrowed from the mammalian BG literature, including a dopaminergic reward prediction error and dopamine mediated plasticity at corticostriatal synapses. We also invoke a number of conceptual advances arising from recent observations in the songbird. Specifically, there is evidence for a specialized cortical circuit that adds trial-to-trial variability to stereotyped cortical motor programs, and a role for the BG in ‘biasing’ this variability to improve behavioral performance. This BG-dependent ‘premotor bias’ may in turn guide plasticity in downstream cortical synapses to consolidate recently-learned song changes. Given the similarity between mammalian and songbird BG-thalamocortical circuits, our model for the role of the BG in this process may have broader relevance to mammalian BG function. PMID:22015923

  12. Predicting Pilot Behavior in Medium Scale Scenarios Using Game Theory and Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Yildiz, Yildiray; Agogino, Adrian; Brat, Guillaume

    2013-01-01

    Effective automation is critical in achieving the capacity and safety goals of the Next Generation Air Traffic System. Unfortunately creating integration and validation tools for such automation is difficult as the interactions between automation and their human counterparts is complex and unpredictable. This validation becomes even more difficult as we integrate wide-reaching technologies that affect the behavior of different decision makers in the system such as pilots, controllers and airlines. While overt short-term behavior changes can be explicitly modeled with traditional agent modeling systems, subtle behavior changes caused by the integration of new technologies may snowball into larger problems and be very hard to detect. To overcome these obstacles, we show how integration of new technologies can be validated by learning behavior models based on goals. In this framework, human participants are not modeled explicitly. Instead, their goals are modeled and through reinforcement learning their actions are predicted. The main advantage to this approach is that modeling is done within the context of the entire system allowing for accurate modeling of all participants as they interact as a whole. In addition such an approach allows for efficient trade studies and feasibility testing on a wide range of automation scenarios. The goal of this paper is to test that such an approach is feasible. To do this we implement this approach using a simple discrete-state learning system on a scenario where 50 aircraft need to self-navigate using Automatic Dependent Surveillance-Broadcast (ADS-B) information. In this scenario, we show how the approach can be used to predict the ability of pilots to adequately balance aircraft separation and fly efficient paths. We present results with several levels of complexity and airspace congestion.

  13. Homeostatic reinforcement learning for integrating reward collection and physiological stability

    PubMed Central

    Keramati, Mehdi; Gutkin, Boris

    2014-01-01

    Efficient regulation of internal homeostasis and defending it against perturbations requires adaptive behavioral strategies. However, the computational principles mediating the interaction between homeostatic and associative learning processes remain undefined. Here we use a definition of primary rewards, as outcomes fulfilling physiological needs, to build a normative theory showing how learning motivated behaviors may be modulated by internal states. Within this framework, we mathematically prove that seeking rewards is equivalent to the fundamental objective of physiological stability, defining the notion of physiological rationality of behavior. We further suggest a formal basis for temporal discounting of rewards by showing that discounting motivates animals to follow the shortest path in the space of physiological variables toward the desired setpoint. We also explain how animals learn to act predictively to preclude prospective homeostatic challenges, and several other behavioral patterns. Finally, we suggest a computational role for interaction between hypothalamus and the brain reward system. DOI: http://dx.doi.org/10.7554/eLife.04811.001 PMID:25457346

  14. Prediction error in reinforcement learning: a meta-analysis of neuroimaging studies.

    PubMed

    Garrison, Jane; Erdeniz, Burak; Done, John

    2013-08-01

    Activation likelihood estimation (ALE) meta-analyses were used to examine the neural correlates of prediction error in reinforcement learning. The findings are interpreted in the light of current computational models of learning and action selection. In this context, particular consideration is given to the comparison of activation patterns from studies using instrumental and Pavlovian conditioning, and where reinforcement involved rewarding or punishing feedback. The striatum was the key brain area encoding for prediction error, with activity encompassing dorsal and ventral regions for instrumental and Pavlovian reinforcement alike, a finding which challenges the functional separation of the striatum into a dorsal 'actor' and a ventral 'critic'. Prediction error activity was further observed in diverse areas of predominantly anterior cerebral cortex including medial prefrontal cortex and anterior cingulate cortex. Distinct patterns of prediction error activity were found for studies using rewarding and aversive reinforcers; reward prediction errors were observed primarily in the striatum while aversive prediction errors were found more widely including insula and habenula. PMID:23567522

  15. Factors Contributing to the Effectiveness of Social and Nonsocial Reinforcers in the Discrimination Learning of Children from Two Socioeconomic Groups

    ERIC Educational Resources Information Center

    Spence, Janet Taylor

    1973-01-01

    Middle- and lower-class children who had been treated by E in a warm or aloof manner were given a discrimination learning task under one of six conditions forming a 3 by 2 design: three reinforcement types (Verbal-intoned, Verbal-nonintoned, or Symbolic) and reinforcement for correct or incorrect responses. (Editor)

  16. A Randomized Trial of Employment-Based Reinforcement of Cocaine Abstinence in Injection Drug Users

    ERIC Educational Resources Information Center

    Silverman, Kenneth; Wong, Conrad J.; Needham, Mick; Diemer, Karly N.; Knealing, Todd; Crone-Todd, Darlene; Fingerhood, Michael; Nuzzo, Paul; Kolodner, Kenneth

    2007-01-01

    High-magnitude and long-duration abstinence reinforcement can promote drug abstinence but can be difficult to finance. Employment may be a vehicle for arranging high-magnitude and long-duration abstinence reinforcement. This study determined if employment-based abstinence reinforcement could increase cocaine abstinence in adults who inject drugs…

  17. Does temporal discounting explain unhealthy behavior? A systematic review and reinforcement learning perspective.

    PubMed

    Story, Giles W; Vlaev, Ivo; Seymour, Ben; Darzi, Ara; Dolan, Raymond J

    2014-01-01

    The tendency to make unhealthy choices is hypothesized to be related to an individual's temporal discount rate, the theoretical rate at which they devalue delayed rewards. Furthermore, a particular form of temporal discounting, hyperbolic discounting, has been proposed to explain why unhealthy behavior can occur despite healthy intentions. We examine these two hypotheses in turn. We first systematically review studies which investigate whether discount rates can predict unhealthy behavior. These studies reveal that high discount rates for money (and in some instances food or drug rewards) are associated with several unhealthy behaviors and markers of health status, establishing discounting as a promising predictive measure. We secondly examine whether intention-incongruent unhealthy actions are consistent with hyperbolic discounting. We conclude that intention-incongruent actions are often triggered by environmental cues or changes in motivational state, whose effects are not parameterized by hyperbolic discounting. We propose a framework for understanding these state-based effects in terms of the interplay of two distinct reinforcement learning mechanisms: a "model-based" (or goal-directed) system and a "model-free" (or habitual) system. Under this framework, while discounting of delayed health may contribute to the initiation of unhealthy behavior, with repetition, many unhealthy behaviors become habitual; if health goals then change, habitual behavior can still arise in response to environmental cues. We propose that the burgeoning development of computational models of these processes will permit further identification of health decision-making phenotypes. PMID:24659960

  18. Informing sequential clinical decision-making through reinforcement learning: an empirical study

    PubMed Central

    Shortreed, Susan M.; Laber, Eric; Lizotte, Daniel J.; Stroup, T. Scott; Pineau, Joelle; Murphy, Susan A.

    2011-01-01

    This paper highlights the role that reinforcement learning can play in the optimization of treatment policies for chronic illnesses. Before applying any off-the-shelf reinforcement learning methods in this setting, we must first tackle a number of challenges. We outline some of these challenges and present methods for overcoming them. First, we describe a multiple imputation approach to overcome the problem of missing data. Second, we discuss the use of function approximation in the context of a highly variable observation set. Finally, we discuss approaches to summarizing the evidence in the data for recommending a particular action and quantifying the uncertainty around the Q-function of the recommended policy. We present the results of applying these methods to real clinical trial data of patients with schizophrenia. PMID:21799585

  19. A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.

    PubMed

    Suri, R E; Schultz, W

    1999-01-01

    This study investigated how the simulated response of dopamine neurons to reward-related stimuli could be used as reinforcement signal for learning a spatial delayed response task. Spatial delayed response tasks assess the functions of frontal cortex and basal ganglia in short-term memory, movement preparation and expectation of environmental events. In these tasks, a stimulus appears for a short period at a particular location, and after a delay the subject moves to the location indicated. Dopamine neurons are activated by unpredicted rewards and reward-predicting stimuli, are not influenced by fully predicted rewards, and are depressed by omitted rewards. Thus, they appear to report an error in the prediction of reward, which is the crucial reinforcement term in formal learning theories. Theoretical studies on reinforcement learning have shown that signals similar to dopamine responses can be used as effective teaching signals for learning. A neural network model implementing the temporal difference algorithm was trained to perform a simulated spatial delayed response task. The reinforcement signal was modeled according to the basic characteristics of dopamine responses to novel stimuli, primary rewards and reward-predicting stimuli. A Critic component analogous to dopamine neurons computed a temporal error in the prediction of reinforcement and emitted this signal to an Actor component which mediated the behavioral output. The spatial delayed response task was learned via two subtasks introducing spatial choices and temporal delays, in the same manner as monkeys in the laboratory. In all three tasks, the reinforcement signal of the Critic developed in a similar manner to the responses of natural dopamine neurons in comparable learning situations, and the learning curves of the Actor replicated the progress of learning observed in the animals. Several manipulations demonstrated further the efficacy of the particular characteristics of the dopamine

  20. Reinforcement Learning Models and Their Neural Correlates: An Activation Likelihood Estimation Meta-Analysis

    PubMed Central

    Kumar, Poornima; Eickhoff, Simon B.; Dombrovski, Alexandre Y.

    2015-01-01

    Reinforcement learning describes motivated behavior in terms of two abstract signals. The representation of discrepancies between expected and actual rewards/punishments – prediction error – is thought to update the expected value of actions and predictive stimuli. Electrophysiological and lesion studies suggest that mesostriatal prediction error signals control behavior through synaptic modification of cortico-striato-thalamic networks. Signals in the ventromedial prefrontal and orbitofrontal cortex are implicated in representing expected value. To obtain unbiased maps of these representations in the human brain, we performed a meta-analysis of functional magnetic resonance imaging studies that employed algorithmic reinforcement learning models, across a variety of experimental paradigms. We found that the ventral striatum (medial and lateral) and midbrain/thalamus represented reward prediction errors, consistent with animal studies. Prediction error signals were also seen in the frontal operculum/insula, particularly for social rewards. In Pavlovian studies, striatal prediction error signals extended into the amygdala, while instrumental tasks engaged the caudate. Prediction error maps were sensitive to the model-fitting procedure (fixed or individually-estimated) and to the extent of spatial smoothing. A correlate of expected value was found in a posterior region of the ventromedial prefrontal cortex, caudal and medial to the orbitofrontal regions identified in animal studies. These findings highlight a reproducible motif of reinforcement learning in the cortico-striatal loops and identify methodological dimensions that may influence the reproducibility of activation patterns across studies. PMID:25665667

  1. Reinforcement learning models and their neural correlates: An activation likelihood estimation meta-analysis.

    PubMed

    Chase, Henry W; Kumar, Poornima; Eickhoff, Simon B; Dombrovski, Alexandre Y

    2015-06-01

    Reinforcement learning describes motivated behavior in terms of two abstract signals. The representation of discrepancies between expected and actual rewards/punishments-prediction error-is thought to update the expected value of actions and predictive stimuli. Electrophysiological and lesion studies have suggested that mesostriatal prediction error signals control behavior through synaptic modification of cortico-striato-thalamic networks. Signals in the ventromedial prefrontal and orbitofrontal cortex are implicated in representing expected value. To obtain unbiased maps of these representations in the human brain, we performed a meta-analysis of functional magnetic resonance imaging studies that had employed algorithmic reinforcement learning models across a variety of experimental paradigms. We found that the ventral striatum (medial and lateral) and midbrain/thalamus represented reward prediction errors, consistent with animal studies. Prediction error signals were also seen in the frontal operculum/insula, particularly for social rewards. In Pavlovian studies, striatal prediction error signals extended into the amygdala, whereas instrumental tasks engaged the caudate. Prediction error maps were sensitive to the model-fitting procedure (fixed or individually estimated) and to the extent of spatial smoothing. A correlate of expected value was found in a posterior region of the ventromedial prefrontal cortex, caudal and medial to the orbitofrontal regions identified in animal studies. These findings highlight a reproducible motif of reinforcement learning in the cortico-striatal loops and identify methodological dimensions that may influence the reproducibility of activation patterns across studies. PMID:25665667

  2. Does overall reinforcer rate affect discrimination of time-based contingencies?

    PubMed

    Cowie, Sarah; Davison, Michael; Blumhardt, Luca; Elliffe, Douglas

    2016-05-01

    Overall reinforcer rate appears to affect choice. The mechanism for such an effect is uncertain, but may relate to reinforcer rate changing the discrimination of the relation between stimuli and reinforcers. We assessed whether a quantitative model based on a stimulus-control approach could be used to account for the effects of overall reinforcer rate on choice under changing time-based contingencies. On a two-key concurrent schedule, the likely availability of a reinforcer reversed when a fixed time had elapsed since the last reinforcer, and the overall reinforcer rate was varied across conditions. Changes in the overall reinforcer rate produced a change in response bias, and some indication of a change in discrimination. These changes in bias and discrimination always occurred quickly, usually within the first session of a condition. The stimulus-control approach provided an excellent account of the data, suggesting that changes in overall reinforcer rate affect choice because they alter the frequency of reinforcers obtained at different times, or in different stimulus contexts, and thus change the discriminated relation between stimuli and reinforcers. These findings support the notion that temporal and spatial discriminations can be understood in terms of discrimination of reinforcers across time and space. PMID:27151836

  3. Neural Regions that Underlie Reinforcement Learning Also Engage in Social Expectancy Violations

    PubMed Central

    Harris, Lasana T.; Fiske, Susan T.

    2013-01-01

    Prediction error, the difference between an expected and actual outcome, serves as a learning signal that interacts with reward and punishment value to direct future behavior during reinforcement learning. We hypothesized that similar learning and valuation signals may underlie social expectancy violations. Here, we explore the neural correlates of social expectancy violation signals along the universal person-perception dimensions of trait warmth and competence. In this context, social learning may result from expectancy violations that occur when a target is inconsistent with an a priori schema. Expectancy violation may activate neural regions normally implicated in prediction error and valuation during appetitive and aversive conditioning. Using fMRI, we first gave perceivers warmth or competence behavioral information. Participants then saw pictures of people responsible for the behavior; they represented social groups either inconsistent (rated low on either warmth or competence) or consistent (rated high on either warmth or competence) with the behavior information. Warmth and competence expectancy violations activate striatal regions and frontal cortex respectively, areas that represent evaluative and prediction-error signals. These findings suggest that regions underlying reinforcement learning may be engaged in warmth and competence social expectancy violation, and illustrate the neural overlap between neuroeconomics and social neuroscience. PMID:20119878

  4. Concept learning without differential reinforcement in pigeons by means of contextual cueing.

    PubMed

    Couto, Kalliu C; Navarro, Victor M; Smith, Tatiana R; Wasserman, Edward A

    2016-04-01

    How supervision is arranged can affect the way that humans learn concepts. Yet very little is known about the role that supervision plays in nonhuman concept learning. Prior research in pigeon concept learning has commonly used differential response-reinforcer procedures (involving high-level supervision) to support reliable discrimination and generalization involving from 4 to 16 concurrently presented photographic categories. In the present project, we used contextual cueing, a nondifferential reinforcement procedure (involving low-level supervision), to investigate concept learning in pigeons. We found that pigeons were faster to peck a target stimulus when 8 members from each of 4 categories of black-and-white photographs-dogs, trees, shoes, and keys-correctly cued its location than when they did not. This faster detection of the target also generalized to 4 untrained members from each of the 4 photographic categories. Our results thus pass the prime behavioral tests of conceptualization and suggest that high-level supervision is unnecessary to support concept learning in pigeons. (PsycINFO Database Record PMID:26914972

  5. Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators.

    PubMed

    Yang, Qinmin; Jagannathan, Sarangapani

    2012-04-01

    In this paper, reinforcement learning state- and output-feedback-based adaptive critic controller designs are proposed by using the online approximators (OLAs) for a general multi-input and multioutput affine unknown nonlinear discretetime systems in the presence of bounded disturbances. The proposed controller design has two entities, an action network that is designed to produce optimal signal and a critic network that evaluates the performance of the action network. The critic estimates the cost-to-go function which is tuned online using recursive equations derived from heuristic dynamic programming. Here, neural networks (NNs) are used both for the action and critic whereas any OLAs, such as radial basis functions, splines, fuzzy logic, etc., can be utilized. For the output-feedback counterpart, an additional NN is designated as the observer to estimate the unavailable system states, and thus, separation principle is not required. The NN weight tuning laws for the controller schemes are also derived while ensuring uniform ultimate boundedness of the closed-loop system using Lyapunov theory. Finally, the effectiveness of the two controllers is tested in simulation on a pendulum balancing system and a two-link robotic arm system. PMID:21947529

  6. Trading Rules on Stock Markets Using Genetic Network Programming with Reinforcement Learning and Importance Index

    NASA Astrophysics Data System (ADS)

    Mabu, Shingo; Hirasawa, Kotaro; Furuzuki, Takayuki

    Genetic Network Programming (GNP) is an evolutionary computation which represents its solutions using graph structures. Since GNP can create quite compact programs and has an implicit memory function, it has been clarified that GNP works well especially in dynamic environments. In addition, a study on creating trading rules on stock markets using GNP with Importance Index (GNP-IMX) has been done. IMX is a new element which is a criterion for decision making. In this paper, we combined GNP-IMX with Actor-Critic (GNP-IMX&AC) and create trading rules on stock markets. Evolution-based methods evolve their programs after enough period of time because they must calculate fitness values, however reinforcement learning can change programs during the period, therefore the trading rules can be created efficiently. In the simulation, the proposed method is trained using the stock prices of 10 brands in 2002 and 2003. Then the generalization ability is tested using the stock prices in 2004. The simulation results show that the proposed method can obtain larger profits than GNP-IMX without AC and Buy&Hold.

  7. Does temporal discounting explain unhealthy behavior? A systematic review and reinforcement learning perspective

    PubMed Central

    Story, Giles W.; Vlaev, Ivo; Seymour, Ben; Darzi, Ara; Dolan, Raymond J.

    2014-01-01

    The tendency to make unhealthy choices is hypothesized to be related to an individual's temporal discount rate, the theoretical rate at which they devalue delayed rewards. Furthermore, a particular form of temporal discounting, hyperbolic discounting, has been proposed to explain why unhealthy behavior can occur despite healthy intentions. We examine these two hypotheses in turn. We first systematically review studies which investigate whether discount rates can predict unhealthy behavior. These studies reveal that high discount rates for money (and in some instances food or drug rewards) are associated with several unhealthy behaviors and markers of health status, establishing discounting as a promising predictive measure. We secondly examine whether intention-incongruent unhealthy actions are consistent with hyperbolic discounting. We conclude that intention-incongruent actions are often triggered by environmental cues or changes in motivational state, whose effects are not parameterized by hyperbolic discounting. We propose a framework for understanding these state-based effects in terms of the interplay of two distinct reinforcement learning mechanisms: a “model-based” (or goal-directed) system and a “model-free” (or habitual) system. Under this framework, while discounting of delayed health may contribute to the initiation of unhealthy behavior, with repetition, many unhealthy behaviors become habitual; if health goals then change, habitual behavior can still arise in response to environmental cues. We propose that the burgeoning development of computational models of these processes will permit further identification of health decision-making phenotypes. PMID:24659960

  8. Elevator Group Supervisory Control System Using Genetic Network Programming with Macro Nodes and Reinforcement Learning

    NASA Astrophysics Data System (ADS)

    Zhou, Jin; Yu, Lu; Mabu, Shingo; Hirasawa, Kotaro; Hu, Jinglu; Markon, Sandor

    Elevator Group Supervisory Control System (EGSCS) is a very large scale stochastic dynamic optimization problem. Due to its vast state space, significant uncertainty and numerous resource constraints such as finite car capacities and registered hall/car calls, it is hard to manage EGSCS using conventional control methods. Recently, many solutions for EGSCS using Artificial Intelligence (AI) technologies have been reported. Genetic Network Programming (GNP), which is proposed as a new evolutionary computation method several years ago, is also proved to be efficient when applied to EGSCS problem. In this paper, we propose an extended algorithm for EGSCS by introducing Reinforcement Learning (RL) into GNP framework, and an improvement of the EGSCS' performances is expected since the efficiency of GNP with RL has been clarified in some other studies like tile-world problem. Simulation tests using traffic flows in a typical office building have been made, and the results show an actual improvement of the EGSCS' performances comparing to the algorithms using original GNP and conventional control methods. Furthermore, as a further study, an importance weight optimization algorithm is employed based on GNP with RL and its efficiency is also verified with the better performances.

  9. Reinforcement Learning for Weakly-Coupled MDPs and an Application to Planetary Rover Control

    NASA Technical Reports Server (NTRS)

    Bernstein, Daniel S.; Zilberstein, Shlomo

    2003-01-01

    Weakly-coupled Markov decision processes can be decomposed into subprocesses that interact only through a small set of bottleneck states. We study a hierarchical reinforcement learning algorithm designed to take advantage of this particular type of decomposability. To test our algorithm, we use a decision-making problem faced by autonomous planetary rovers. In this problem, a Mars rover must decide which activities to perform and when to traverse between science sites in order to make the best use of its limited resources. In our experiments, the hierarchical algorithm performs better than Q-learning in the early stages of learning, but unlike Q-learning it converges to a suboptimal policy. This suggests that it may be advantageous to use the hierarchical algorithm when training time is limited.

  10. Simulating the Effect of Reinforcement Learning on Neuronal Synchrony and Periodicity in the Striatum.

    PubMed

    Hélie, Sébastien; Fleischer, Pierson J

    2016-01-01

    The study of rhythms and oscillations in the brain is gaining attention. While it is unclear exactly what the role of oscillation, synchrony, and rhythm is, it appears increasingly likely that synchrony is related to normal and abnormal brain states and possibly cognition. In this article, we explore the relationship between basal ganglia (BG) synchrony and reinforcement learning. We simulate a biologically-realistic model of the striatum initially proposed by Ponzi and Wickens (2010) and enhance the model by adding plastic cortico-BG synapses that can be modified using reinforcement learning. The effect of reinforcement learning on striatal rhythmic activity is then explored, and disrupted using simulated deep brain stimulation (DBS). The stimulator injects current in the brain structure to which it is attached, which affects neuronal synchrony. The results show that training the model without DBS yields a high accuracy in the learning task and reduced the number of active neurons in the striatum, along with an increased firing periodicity and a decreased firing synchrony between neurons in the same assembly. In addition, a spectral decomposition shows a stronger signal for correct trials than incorrect trials in high frequency bands. If the DBS is ON during the training phase, but not the test phase, the amount of learning in the model is reduced, along with firing periodicity. Similar to when the DBS is OFF, spectral decomposition shows a stronger signal for correct trials than for incorrect trials in high frequency domains, but this phenoemenon happens in higher frequency bands than when the DBS is OFF. Synchrony between the neurons is not affected. Finally, the results show that turning the DBS ON at test increases both firing periodicity and striatal synchrony, and spectral decomposition of the signal show that neural activity synchronizes with the DBS fundamental frequency (and its harmonics). Turning the DBS ON during the test phase results in chance

  11. Simulating the Effect of Reinforcement Learning on Neuronal Synchrony and Periodicity in the Striatum

    PubMed Central

    Hélie, Sébastien; Fleischer, Pierson J.

    2016-01-01

    The study of rhythms and oscillations in the brain is gaining attention. While it is unclear exactly what the role of oscillation, synchrony, and rhythm is, it appears increasingly likely that synchrony is related to normal and abnormal brain states and possibly cognition. In this article, we explore the relationship between basal ganglia (BG) synchrony and reinforcement learning. We simulate a biologically-realistic model of the striatum initially proposed by Ponzi and Wickens (2010) and enhance the model by adding plastic cortico-BG synapses that can be modified using reinforcement learning. The effect of reinforcement learning on striatal rhythmic activity is then explored, and disrupted using simulated deep brain stimulation (DBS). The stimulator injects current in the brain structure to which it is attached, which affects neuronal synchrony. The results show that training the model without DBS yields a high accuracy in the learning task and reduced the number of active neurons in the striatum, along with an increased firing periodicity and a decreased firing synchrony between neurons in the same assembly. In addition, a spectral decomposition shows a stronger signal for correct trials than incorrect trials in high frequency bands. If the DBS is ON during the training phase, but not the test phase, the amount of learning in the model is reduced, along with firing periodicity. Similar to when the DBS is OFF, spectral decomposition shows a stronger signal for correct trials than for incorrect trials in high frequency domains, but this phenoemenon happens in higher frequency bands than when the DBS is OFF. Synchrony between the neurons is not affected. Finally, the results show that turning the DBS ON at test increases both firing periodicity and striatal synchrony, and spectral decomposition of the signal show that neural activity synchronizes with the DBS fundamental frequency (and its harmonics). Turning the DBS ON during the test phase results in chance

  12. Group-Based Learning.

    ERIC Educational Resources Information Center

    Garth, Russell Y.

    1999-01-01

    An author of the 14th issue of this journal, which was devoted to group-based pedagogical approaches to college instruction, traces the continuing development of collaborative or cooperative learning. Notes influence of the original volume on guidelines of the Fund for the Improvement of Postsecondary Education and the Collaboration in…

  13. Concept-Based Learning

    ERIC Educational Resources Information Center

    Schill, Bethany; Howell, Linda

    2011-01-01

    A major part of developing concept-based instruction is the use of an overarching idea to provide a conceptual lens through which students view the content of a particular subject. By using a conceptual lens to focus learning, students think at a much deeper level about the content and its facts (Erickson 2007). Therefore, the authors collaborated…

  14. Problem Based Learning in Science

    ERIC Educational Resources Information Center

    Pepper, Coral

    2009-01-01

    Problem based learning (PBL) is a recognised teaching and learning strategy used to engage students in deep rather than surface learning. It is also viewed as a successful strategy to align university courses with the real life professional work students are expected to undertake on graduation (Biggs, 2003). Problem based learning is practised…

  15. Numerical simulation of monitoring corrosion in reinforced concrete based on ultrasonic guided waves.

    PubMed

    Zheng, Zhupeng; Lei, Ying; Xue, Xin

    2014-01-01

    Numerical simulation based on finite element method is conducted to predict the location of pitting corrosion in reinforced concrete. Simulation results show that it is feasible to predict corrosion monitoring based on ultrasonic guided wave in reinforced concrete, and wavelet analysis can be used for the extremely weak signal of guided waves due to energy leaking into concrete. The characteristic of time-frequency localization of wavelet transform is adopted in the corrosion monitoring of reinforced concrete. Guided waves can be successfully used to identify corrosion defects in reinforced concrete with the analysis of suitable wavelet-based function and its scale. PMID:25013865

  16. Numerical Simulation of Monitoring Corrosion in Reinforced Concrete Based on Ultrasonic Guided Waves

    PubMed Central

    Zheng, Zhupeng; Lei, Ying; Xue, Xin

    2014-01-01

    Numerical simulation based on finite element method is conducted to predict the location of pitting corrosion in reinforced concrete. Simulation results show that it is feasible to predict corrosion monitoring based on ultrasonic guided wave in reinforced concrete, and wavelet analysis can be used for the extremely weak signal of guided waves due to energy leaking into concrete. The characteristic of time-frequency localization of wavelet transform is adopted in the corrosion monitoring of reinforced concrete. Guided waves can be successfully used to identify corrosion defects in reinforced concrete with the analysis of suitable wavelet-based function and its scale. PMID:25013865

  17. Stabilized fiber-reinforced pavement base course with recycled aggregate

    NASA Astrophysics Data System (ADS)

    Sobhan, Khaled

    This study evaluates the benefits to be gained by using a composite highway base course material consisting of recycled crushed concrete aggregate, portland cement, fly ash, and a modest amount of reinforcing fibers. The primary objectives of this research were to (a) quantify the improvement that is obtained by adding fibers to a lean concrete composite (made from recycled aggregate and low quantities of Portland cement and/or fly ash), (b) evaluate the mechanical behavior of such a composite base course material under both static and repeated loads, and (c) utilize the laboratory-determined properties with a mechanistic design method to assess the potential advantages. The split tensile strength of a stabilized recycled aggregate base course material was found to be exponentially related to the compacted dry density of the mix. A lean mix containing 4% cement and 4% fly ash (by weight) develops sufficient unconfined compressive, split tensile, and flexural strengths to be used as a high quality stabilized base course. The addition of 4% (by weight) of hooked-end steel fibers significantly enhances the post-peak load-deformation response of the composite in both indirect tension and static flexure. The flexural fatigue behavior of the 4% cement-4% fly ash mix is comparable to all commonly used stabilized materials, including regular concrete; the inclusion of 4% hooked-end fibers to this mix significantly improves its resistance to fatigue failure. The resilient moduli of stabilized recycled aggregate in flexure are comparable to the values obtained for traditional soil-cement mixes. In general, the fibers are effective in retarding the rate of fatigue damage accumulation, which is quantified in terms of a damage index defined by an energy-based approach. The thickness design curves for a stabilized recycled aggregate base course, as developed by using an elastic layer approach, is shown to be in close agreement with a theoretical model (based on Westergaard

  18. Can Service Learning Reinforce Social and Cultural Bias? Exploring a Popular Model of Family Involvement for Early Childhood Teacher Candidates

    ERIC Educational Resources Information Center

    Dunn-Kenney, Maylan

    2010-01-01

    Service learning is often used in teacher education as a way to challenge social bias and provide teacher candidates with skills needed to work in partnership with diverse families. Although some literature suggests that service learning could reinforce cultural bias, there is little documentation. In a study of 21 early childhood teacher…

  19. Differential Modulation of Reinforcement Learning by D2 Dopamine and NMDA Glutamate Receptor Antagonism

    PubMed Central

    Klein, Tilmann A.; Ullsperger, Markus

    2014-01-01

    The firing pattern of midbrain dopamine (DA) neurons is well known to reflect reward prediction errors (PEs), the difference between obtained and expected rewards. The PE is thought to be a crucial signal for instrumental learning, and interference with DA transmission impairs learning. Phasic increases of DA neuron firing during positive PEs are driven by activation of NMDA receptors, whereas phasic suppression of firing during negative PEs is likely mediated by inputs from the lateral habenula. We aimed to determine the contribution of DA D2-class and NMDA receptors to appetitively and aversively motivated reinforcement learning. Healthy human volunteers were scanned with functional magnetic resonance imaging while they performed an instrumental learning task under the influence of either the DA D2 receptor antagonist amisulpride (400 mg), the NMDA receptor antagonist memantine (20 mg), or placebo. Participants quickly learned to select (“approach”) rewarding and to reject (“avoid”) punishing options. Amisulpride impaired both approach and avoidance learning, while memantine mildly attenuated approach learning but had no effect on avoidance learning. These behavioral effects of the antagonists were paralleled by their modulation of striatal PEs. Amisulpride reduced both appetitive and aversive PEs, while memantine diminished appetitive, but not aversive PEs. These data suggest that striatal D2-class receptors contribute to both approach and avoidance learning by detecting both the phasic DA increases and decreases during appetitive and aversive PEs. NMDA receptors on the contrary appear to be required only for approach learning because phasic DA increases during positive PEs are NMDA dependent, whereas phasic decreases during negative PEs are not. PMID:25253860

  20. Differential modulation of reinforcement learning by D2 dopamine and NMDA glutamate receptor antagonism.

    PubMed

    Jocham, Gerhard; Klein, Tilmann A; Ullsperger, Markus

    2014-09-24

    The firing pattern of midbrain dopamine (DA) neurons is well known to reflect reward prediction errors (PEs), the difference between obtained and expected rewards. The PE is thought to be a crucial signal for instrumental learning, and interference with DA transmission impairs learning. Phasic increases of DA neuron firing during positive PEs are driven by activation of NMDA receptors, whereas phasic suppression of firing during negative PEs is likely mediated by inputs from the lateral habenula. We aimed to determine the contribution of DA D2-class and NMDA receptors to appetitively and aversively motivated reinforcement learning. Healthy human volunteers were scanned with functional magnetic resonance imaging while they performed an instrumental learning task under the influence of either the DA D2 receptor antagonist amisulpride (400 mg), the NMDA receptor antagonist memantine (20 mg), or placebo. Participants quickly learned to select ("approach") rewarding and to reject ("avoid") punishing options. Amisulpride impaired both approach and avoidance learning, while memantine mildly attenuated approach learning but had no effect on avoidance learning. These behavioral effects of the antagonists were paralleled by their modulation of striatal PEs. Amisulpride reduced both appetitive and aversive PEs, while memantine diminished appetitive, but not aversive PEs. These data suggest that striatal D2-class receptors contribute to both approach and avoidance learning by detecting both the phasic DA increases and decreases during appetitive and aversive PEs. NMDA receptors on the contrary appear to be required only for approach learning because phasic DA increases during positive PEs are NMDA dependent, whereas phasic decreases during negative PEs are not. PMID:25253860

  1. A cholinergic feedback circuit to regulate striatal population uncertainty and optimize reinforcement learning

    PubMed Central

    Franklin, Nicholas T; Frank, Michael J

    2015-01-01

    Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments. DOI: http://dx.doi.org/10.7554/eLife.12029.001 PMID:26705698

  2. Sex-dependent effects on tasks assessing reinforcement learning and interference inhibition

    PubMed Central

    Evans, Kelly L.; Hampson, Elizabeth

    2015-01-01

    Increasing evidence suggests that the prefrontal cortex (PFC) is influenced by sex steroids and that some cognitive functions dependent on the PFC may be sexually differentiated in humans. Past work has identified a male advantage on certain complex reinforcement learning tasks, but it is unclear which latent task components are important to elicit the sex difference. The objective of the current study was to investigate whether there are sex differences on measures of response inhibition and valenced feedback processing, elements that are shared by previously studied reinforcement learning tasks. Healthy young adults (90 males, 86 females) matched in general intelligence completed the Probabilistic Selection Task (PST), a Simon task, and the Stop-Signal task. On the PST, females were more accurate than males in learning from positive (but not negative) feedback. On the Simon task, males were faster than females, especially in the face of incongruent stimuli. No sex difference was observed in Stop-Signal reaction time. The current findings provide preliminary support for a sex difference in the processing of valenced feedback and in interference inhibition. PMID:26257691

  3. Real-time reinforcement learning by sequential Actor-Critics and experience replay.

    PubMed

    Wawrzyński, Paweł

    2009-12-01

    Actor-Critics constitute an important class of reinforcement learning algorithms that can deal with continuous actions and states in an easy and natural way. This paper shows how these algorithms can be augmented by the technique of experience replay without degrading their convergence properties, by appropriately estimating the policy change direction. This is achieved by truncated importance sampling applied to the recorded past experiences. It is formally shown that the resulting estimation bias is bounded and asymptotically vanishes, which allows the experience replay-augmented algorithm to preserve the convergence properties of the original algorithm. The technique of experience replay makes it possible to utilize the available computational power to reduce the required number of interactions with the environment considerably, which is essential for real-world applications. Experimental results are presented that demonstrate that the combination of experience replay and Actor-Critics yields extremely fast learning algorithms that achieve successful policies for non-trivial control tasks in considerably short time. Namely, the policies for the cart-pole swing-up [Doya, K. (2000). Reinforcement learning in continuous time and space. Neural Computation, 12(1), 219-245] are obtained after as little as 20 min of the cart-pole time and the policy for Half-Cheetah (a walking 6-degree-of-freedom robot) is obtained after four hours of Half-Cheetah time. PMID:19523786

  4. Dopamine and performance in a reinforcement learning task: evidence from Parkinson's disease.

    PubMed

    Shiner, Tamara; Seymour, Ben; Wunderlich, Klaus; Hill, Ciaran; Bhatia, Kailash P; Dayan, Peter; Dolan, Raymond J

    2012-06-01

    The role dopamine plays in decision-making has important theoretical, empirical and clinical implications. Here, we examined its precise contribution by exploiting the lesion deficit model afforded by Parkinson's disease. We studied patients in a two-stage reinforcement learning task, while they were ON and OFF dopamine replacement medication. Contrary to expectation, we found that dopaminergic drug state (ON or OFF) did not impact learning. Instead, the critical factor was drug state during the performance phase, with patients ON medication choosing correctly significantly more frequently than those OFF medication. This effect was independent of drug state during initial learning and appears to reflect a facilitation of generalization for learnt information. This inference is bolstered by our observation that neural activity in nucleus accumbens and ventromedial prefrontal cortex, measured during simultaneously acquired functional magnetic resonance imaging, represented learnt stimulus values during performance. This effect was expressed solely during the ON state with activity in these regions correlating with better performance. Our data indicate that dopamine modulation of nucleus accumbens and ventromedial prefrontal cortex exerts a specific effect on choice behaviour distinct from pure learning. The findings are in keeping with the substantial other evidence that certain aspects of learning are unaffected by dopamine lesions or depletion, and that dopamine plays a key role in performance that may be distinct from its role in learning. PMID:22508958

  5. Reinforcement Learning of Two-Joint Virtual Arm Reaching in a Computer Model of Sensorimotor Cortex

    PubMed Central

    Neymotin, Samuel A.; Chadderdon, George L.; Kerr, Cliff C.; Francis, Joseph T.; Lytton, William W.

    2014-01-01

    Neocortical mechanisms of learning sensorimotor control involve a complex series of interactions at multiple levels, from synaptic mechanisms to cellular dynamics to network connectomics. We developed a model of sensory and motor neocortex consisting of 704 spiking model neurons. Sensory and motor populations included excitatory cells and two types of interneurons. Neurons were interconnected with AMPA/NMDA and GABAA synapses. We trained our model using spike-timing-dependent reinforcement learning to control a two-joint virtual arm to reach to a fixed target. For each of 125 trained networks, we used 200 training sessions, each involving 15 s reaches to the target from 16 starting positions. Learning altered network dynamics, with enhancements to neuronal synchrony and behaviorally relevant information flow between neurons. After learning, networks demonstrated retention of behaviorally relevant memories by using proprioceptive information to perform reach-to-target from multiple starting positions. Networks dynamically controlled which joint rotations to use to reach a target, depending on current arm position. Learning-dependent network reorganization was evident in both sensory and motor populations: learned synaptic weights showed target-specific patterning optimized for particular reach movements. Our model embodies an integrative hypothesis of sensorimotor cortical learning that could be used to interpret future electrophysiological data recorded in vivo from sensorimotor learning experiments. We used our model to make the following predictions: learning enhances synchrony in neuronal populations and behaviorally relevant information flow across neuronal populations, enhanced sensory processing aids task-relevant motor performance and the relative ease of a particular movement in vivo depends on the amount of sensory information required to complete the movement. PMID:24047323

  6. Understanding and Improving the Elastic Compressive Modulus of Fibre Reinforced Soy-Based Polyurethane Foams

    NASA Astrophysics Data System (ADS)

    Hussain, Sadakat

    Soy-based polyurethane foams (PUFs) were reinforced with fibres of different aspect ratios to improve the compressive modulus. Each of the three fibre types reinforced PUF differently. Shorter micro-crystalline cellulose fibres were found embedded inside the cell struts of PUF and reinforced them. The reinforcement was attributed to be stress transfer from the matrix to the fibre by comparing the experimental results to those predicted by micro-mechanical models for short fibre reinforced composites. The reinforced cell struts increased the overall compressive modulus of the foam. Longer glass fibres (470 microns, length) provided the best reinforcement. These fibres were found to be larger than the cell diameters. The micro-mechanical models could not predict the reinforcement provided by the longer glass fibres. The models predicted negligible reinforcement because the very low modulus PUF should not transfer load to the higher modulus fibres. However, using a finite element model, it was determined that the fibres were providing reinforcement through direct fibre interaction with each other. Intermediate length glass fibres (260 microns, length) were found to poorly reinforce the PUF and should be avoided. These fibres were too short to interact with each other and were on average too large to embed and reinforce cell struts. In order to produce natural fibre reinforced PUFs in the future, a novel device was invented. The purpose of the device is to deliver natural fibres at a constant mass flow rate. The device was found to consistently meter individual loose natural fibre tufts at a mass flow rate of 2 grams per second. However, the device is not robust and requires further development to deliver a fine stream of natural fibre that can mix and interact with the curing polymeric components of PUF. A design plan was proposed to address the remaining issues with the device.

  7. Intelligent multiagent coordination based on reinforcement hierarchical neuro-fuzzy models.

    PubMed

    Mendoza, Leonardo Forero; Vellasco, Marley; Figueiredo, Karla

    2014-12-01

    This paper presents the research and development of two hybrid neuro-fuzzy models for the hierarchical coordination of multiple intelligent agents. The main objective of the models is to have multiple agents interact intelligently with each other in complex systems. We developed two new models of coordination for intelligent multiagent systems, which integrates the Reinforcement Learning Hierarchical Neuro-Fuzzy model with two proposed coordination mechanisms: the MultiAgent Reinforcement Learning Hierarchical Neuro-Fuzzy with a market-driven coordination mechanism (MA-RL-HNFP-MD) and the MultiAgent Reinforcement Learning Hierarchical Neuro-Fuzzy with graph coordination (MA-RL-HNFP-CG). In order to evaluate the proposed models and verify the contribution of the proposed coordination mechanisms, two multiagent benchmark applications were developed: the pursuit game and the robot soccer simulation. The results obtained demonstrated that the proposed coordination mechanisms greatly improve the performance of the multiagent system when compared with other strategies. PMID:25406641

  8. A Novel Clustering Method Curbing the Number of States in Reinforcement Learning

    NASA Astrophysics Data System (ADS)

    Kotani, Naoki; Nunobiki, Masayuki; Taniguchi, Kenji

    We propose an efficient state-space construction method for a reinforcement learning. Our method controls the number of categories with improving the clustering method of Fuzzy ART which is an autonomous state-space construction method. The proposed method represents weight vector as the mean value of input vectors in order to curb the number of new categories and eliminates categories whose state values are low to curb the total number of categories. As the state value is updated, the size of category becomes small to learn policy strictly. We verified the effectiveness of the proposed method with simulations of a reaching problem for a two-link robot arm. We confirmed that the number of categories was reduced and the agent achieved the complex task quickly.

  9. A fuzzy reinforcement learning approach to power control in wireless transmitters.

    PubMed

    Vengerov, David; Bambos, Nicholas; Berenji, Hamid R

    2005-08-01

    We address the issue of power-controlled shared channel access in wireless networks supporting packetized data traffic. We formulate this problem using the dynamic programming framework and present a new distributed fuzzy reinforcement learning algorithm (ACFRL-2) capable of adequately solving a class of problems to which the power control problem belongs. Our experimental results show that the algorithm converges almost deterministically to a neighborhood of optimal parameter values, as opposed to a very noisy stochastic convergence of earlier algorithms. The main tradeoff facing a transmitter is to balance its current power level with future backlog in the presence of stochastically changing interference. Simulation experiments demonstrate that the ACFRL-2 algorithm achieves significant performance gains over the standard power control approach used in CDMA2000. Such a large improvement is explained by the fact that ACFRL-2 allows transmitters to learn implicit coordination policies, which back off under stressful channel conditions as opposed to engaging in escalating "power wars." PMID:16128459

  10. An analysis of intergroup rivalry using Ising model and reinforcement learning

    NASA Astrophysics Data System (ADS)

    Zhao, Feng-Fei; Qin, Zheng; Shao, Zhuo

    2014-01-01

    Modeling of intergroup rivalry can help us better understand economic competitions, political elections and other similar activities. The result of intergroup rivalry depends on the co-evolution of individual behavior within one group and the impact from the rival group. In this paper, we model the rivalry behavior using Ising model. Different from other simulation studies using Ising model, the evolution rules of each individual in our model are not static, but have the ability to learn from historical experience using reinforcement learning technique, which makes the simulation more close to real human behavior. We studied the phase transition in intergroup rivalry and focused on the impact of the degree of social freedom, the personality of group members and the social experience of individuals. The results of computer simulation show that a society with a low degree of social freedom and highly educated, experienced individuals is more likely to be one-sided in intergroup rivalry.

  11. Reinforcement learning for adaptive threshold control of restorative brain-computer interfaces: a Bayesian simulation.

    PubMed

    Bauer, Robert; Gharabaghi, Alireza

    2015-01-01

    Restorative brain-computer interfaces (BCI) are increasingly used to provide feedback of neuronal states in a bid to normalize pathological brain activity and achieve behavioral gains. However, patients and healthy subjects alike often show a large variability, or even inability, of brain self-regulation for BCI control, known as BCI illiteracy. Although current co-adaptive algorithms are powerful for assistive BCIs, their inherent class switching clashes with the operant conditioning goal of restorative BCIs. Moreover, due to the treatment rationale, the classifier of restorative BCIs usually has a constrained feature space, thus limiting the possibility of classifier adaptation. In this context, we applied a Bayesian model of neurofeedback and reinforcement learning for different threshold selection strategies to study the impact of threshold adaptation of a linear classifier on optimizing restorative BCIs. For each feedback iteration, we first determined the thresholds that result in minimal action entropy and maximal instructional efficiency. We then used the resulting vector for the simulation of continuous threshold adaptation. We could thus show that threshold adaptation can improve reinforcement learning, particularly in cases of BCI illiteracy. Finally, on the basis of information-theory, we provided an explanation for the achieved benefits of adaptive threshold setting. PMID:25729347

  12. Reinforcement learning for adaptive threshold control of restorative brain-computer interfaces: a Bayesian simulation

    PubMed Central

    Bauer, Robert; Gharabaghi, Alireza

    2015-01-01

    Restorative brain-computer interfaces (BCI) are increasingly used to provide feedback of neuronal states in a bid to normalize pathological brain activity and achieve behavioral gains. However, patients and healthy subjects alike often show a large variability, or even inability, of brain self-regulation for BCI control, known as BCI illiteracy. Although current co-adaptive algorithms are powerful for assistive BCIs, their inherent class switching clashes with the operant conditioning goal of restorative BCIs. Moreover, due to the treatment rationale, the classifier of restorative BCIs usually has a constrained feature space, thus limiting the possibility of classifier adaptation. In this context, we applied a Bayesian model of neurofeedback and reinforcement learning for different threshold selection strategies to study the impact of threshold adaptation of a linear classifier on optimizing restorative BCIs. For each feedback iteration, we first determined the thresholds that result in minimal action entropy and maximal instructional efficiency. We then used the resulting vector for the simulation of continuous threshold adaptation. We could thus show that threshold adaptation can improve reinforcement learning, particularly in cases of BCI illiteracy. Finally, on the basis of information-theory, we provided an explanation for the achieved benefits of adaptive threshold setting. PMID:25729347

  13. Development of natural fiber reinforced polylactide-based biocomposites

    NASA Astrophysics Data System (ADS)

    Arias Herrera, Andrea Marcela

    Polylactide or PLA is a biodegradable polymer that can be produced from renewable resources. This aliphatic polyester exhibits good mechanical properties similar to those of polyethylene terephthalate (PET). Since 2003, bio-based high molecular weight PLA is produced on an industrial scale and commercialized under amorphous and semicrystalline grades for various applications. Enhancement of PLA crystallization kinetics is crucial for the competitiveness of this biopolymer as a commodity material able to replace petroleum-based plastics. On the other hand, the combination of natural fibers with polymer matrices made from renewable resources, to produce fully biobased and biodegradable polymer composite materials, has been a strong trend in research activities during the last decade. Nevertheless, the differences related to the chemical structure, clearly observed in the marked hydrophilic/hydrophobic character of the fibers and the thermoplastic matrix, respectively, represent a major drawback for promoting strong fiber/matrix interactions. The aim of the present study was to investigate the intrinsic fiber/matrix interactions of PLAbased natural fiber composites prepared by melt-compounding. Short flax fibers presenting a nominal length of ˜1 mm were selected as reinforcement and biocomposites containing low to moderate fiber loading were processed by melt-mixing. Fiber bundle breakage during processing led to important reductions in length and diameter. The mean aspect ratio was decreased by about 50%. Quiescent crystallization kinetics of PLA and biocomposite systems was examined under isothermal and non-isothermal conditions. The nucleating nature of the flax fibers was demonstrated and PLA crystallization was effectively accelerated as the natural reinforcement content increased. Such improvement was controlled by the temperature at which crystallization took place, the liquid-to-solid transition being thermodynamically promoted by the degree of supercooling

  14. A Comparison of Function-Based Differential Reinforcement Interventions for Children Engaging in Disruptive Classroom Behavior

    ERIC Educational Resources Information Center

    LeGray, Matthew W.; Dufrene, Brad A.; Sterling-Turner, Heather; Olmi, D. Joe; Bellone, Katherine

    2010-01-01

    This study provides a direct comparison of differential reinforcement of other behavior (DRO) and differential reinforcement of alternative behavior (DRA). Participants included three children in center-based classrooms referred for functional assessments due to disruptive classroom behavior. Functional assessments included interviews and brief…

  15. Randomized Trial of Prize-Based Reinforcement Density for Simultaneous Abstinence from Cocaine and Heroin

    ERIC Educational Resources Information Center

    Ghitza, Udi E.; Epstein, David H.; Schmittner, John; Vahabzadeh, Massoud; Lin, Jia-Ling; Preston, Kenzie L.

    2007-01-01

    To examine the effect of reinforcer density in prize-based abstinence reinforcement, heroin/cocaine users (N = 116) in methadone maintenance (100 mg/day) were randomly assigned to a noncontingent control group (NonC) or to 1 of 3 groups that earned prize draws for abstinence: manual drawing with standard prize density (MS) or computerized drawing…

  16. The role of multisensor data fusion in neuromuscular control of a sagittal arm with a pair of muscles using actor-critic reinforcement learning method.

    PubMed

    Golkhou, V; Parnianpour, M; Lucas, C

    2004-01-01

    In this study, we consider the role of multisensor data fusion in neuromuscular control using an actor-critic reinforcement learning method. The model we use is a single link system actuated by a pair of muscles that are excited with alpha and gamma signals. Various physiological sensor information such as proprioception, spindle sensors, and Golgi tendon organs have been integrated to achieve an oscillatory movement with variable amplitude and frequency, while achieving a stable movement with minimum metabolic cost and coactivation. The system is highly nonlinear in all its physical and physiological attributes. Transmission delays are included in the afferent and efferent neural paths to account for a more accurate representation of the reflex loops. This paper proposes a reinforcement learning method with an Actor-Critic architecture instead of middle and low level of central nervous system (CNS). The Actor in this structure is a two layer feedforward neural network and the Critic is a model of the cerebellum. The Critic is trained by the State-Action-Reward-State-Action (SARSA) method. The Critic will train the Actor by supervisory learning based on previous experiences. The reinforcement signal in SARSA is evaluated based on available alternatives concerning the concept of multisensor data fusion. The effectiveness and the biological plausibility of the present model are demonstrated by several simulations. The system showed excellent tracking capability when we integrated the available sensor information. Addition of a penalty for activation of muscles resulted in much lower muscle coactivation while keeping the movement stable. PMID:15671597

  17. Development of natural fiber reinforced polylactide-based biocomposites

    NASA Astrophysics Data System (ADS)

    Arias Herrera, Andrea Marcela

    Polylactide or PLA is a biodegradable polymer that can be produced from renewable resources. This aliphatic polyester exhibits good mechanical properties similar to those of polyethylene terephthalate (PET). Since 2003, bio-based high molecular weight PLA is produced on an industrial scale and commercialized under amorphous and semicrystalline grades for various applications. Enhancement of PLA crystallization kinetics is crucial for the competitiveness of this biopolymer as a commodity material able to replace petroleum-based plastics. On the other hand, the combination of natural fibers with polymer matrices made from renewable resources, to produce fully biobased and biodegradable polymer composite materials, has been a strong trend in research activities during the last decade. Nevertheless, the differences related to the chemical structure, clearly observed in the marked hydrophilic/hydrophobic character of the fibers and the thermoplastic matrix, respectively, represent a major drawback for promoting strong fiber/matrix interactions. The aim of the present study was to investigate the intrinsic fiber/matrix interactions of PLAbased natural fiber composites prepared by melt-compounding. Short flax fibers presenting a nominal length of ˜1 mm were selected as reinforcement and biocomposites containing low to moderate fiber loading were processed by melt-mixing. Fiber bundle breakage during processing led to important reductions in length and diameter. The mean aspect ratio was decreased by about 50%. Quiescent crystallization kinetics of PLA and biocomposite systems was examined under isothermal and non-isothermal conditions. The nucleating nature of the flax fibers was demonstrated and PLA crystallization was effectively accelerated as the natural reinforcement content increased. Such improvement was controlled by the temperature at which crystallization took place, the liquid-to-solid transition being thermodynamically promoted by the degree of supercooling

  18. A Spiking Neural Network Model of Model-Free Reinforcement Learning with High-Dimensional Sensory Input and Perceptual Ambiguity

    PubMed Central

    Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji

    2015-01-01

    A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach. PMID:25734662

  19. Long Term Effects of Aversive Reinforcement on Colour Discrimination Learning in Free-Flying Bumblebees

    PubMed Central

    Rodríguez-Gironés, Miguel A.; Trillo, Alejandro; Corcobado, Guadalupe

    2013-01-01

    The results of behavioural experiments provide important information about the structure and information-processing abilities of the visual system. Nevertheless, if we want to infer from behavioural data how the visual system operates, it is important to know how different learning protocols affect performance and to devise protocols that minimise noise in the response of experimental subjects. The purpose of this work was to investigate how reinforcement schedule and individual variability affect the learning process in a colour discrimination task. Free-flying bumblebees were trained to discriminate between two perceptually similar colours. The target colour was associated with sucrose solution, and the distractor could be associated with water or quinine solution throughout the experiment, or with one substance during the first half of the experiment and the other during the second half. Both acquisition and final performance of the discrimination task (measured as proportion of correct choices) were determined by the choice of reinforcer during the first half of the experiment: regardless of whether bees were trained with water or quinine during the second half of the experiment, bees trained with quinine during the first half learned the task faster and performed better during the whole experiment. Our results confirm that the choice of stimuli used during training affects the rate at which colour discrimination tasks are acquired and show that early contact with a strongly aversive stimulus can be sufficient to maintain high levels of attention during several hours. On the other hand, bees which took more time to decide on which flower to alight were more likely to make correct choices than bees which made fast decisions. This result supports the existence of a trade-off between foraging speed and accuracy, and highlights the importance of measuring choice latencies during behavioural experiments focusing on cognitive abilities. PMID:23951186

  20. Using reinforcement learning to provide stable brain-machine interface control despite neural input reorganization.

    PubMed

    Pohlmeyer, Eric A; Mahmoudi, Babak; Geng, Shijia; Prins, Noeline W; Sanchez, Justin C

    2014-01-01

    Brain-machine interface (BMI) systems give users direct neural control of robotic, communication, or functional electrical stimulation systems. As BMI systems begin transitioning from laboratory settings into activities of daily living, an important goal is to develop neural decoding algorithms that can be calibrated with a minimal burden on the user, provide stable control for long periods of time, and can be responsive to fluctuations in the decoder's neural input space (e.g. neurons appearing or being lost amongst electrode recordings). These are significant challenges for static neural decoding algorithms that assume stationary input/output relationships. Here we use an actor-critic reinforcement learning architecture to provide an adaptive BMI controller that can successfully adapt to dramatic neural reorganizations, can maintain its performance over long time periods, and which does not require the user to produce specific kinetic or kinematic activities to calibrate the BMI. Two marmoset monkeys used the Reinforcement Learning BMI (RLBMI) to successfully control a robotic arm during a two-target reaching task. The RLBMI was initialized using random initial conditions, and it quickly learned to control the robot from brain states using only a binary evaluative feedback regarding whether previously chosen robot actions were good or bad. The RLBMI was able to maintain control over the system throughout sessions spanning multiple weeks. Furthermore, the RLBMI was able to quickly adapt and maintain control of the robot despite dramatic perturbations to the neural inputs, including a series of tests in which the neuron input space was deliberately halved or doubled. PMID:24498055

  1. Using Reinforcement Learning to Provide Stable Brain-Machine Interface Control Despite Neural Input Reorganization

    PubMed Central

    Pohlmeyer, Eric A.; Mahmoudi, Babak; Geng, Shijia; Prins, Noeline W.; Sanchez, Justin C.

    2014-01-01

    Brain-machine interface (BMI) systems give users direct neural control of robotic, communication, or functional electrical stimulation systems. As BMI systems begin transitioning from laboratory settings into activities of daily living, an important goal is to develop neural decoding algorithms that can be calibrated with a minimal burden on the user, provide stable control for long periods of time, and can be responsive to fluctuations in the decoder’s neural input space (e.g. neurons appearing or being lost amongst electrode recordings). These are significant challenges for static neural decoding algorithms that assume stationary input/output relationships. Here we use an actor-critic reinforcement learning architecture to provide an adaptive BMI controller that can successfully adapt to dramatic neural reorganizations, can maintain its performance over long time periods, and which does not require the user to produce specific kinetic or kinematic activities to calibrate the BMI. Two marmoset monkeys used the Reinforcement Learning BMI (RLBMI) to successfully control a robotic arm during a two-target reaching task. The RLBMI was initialized using random initial conditions, and it quickly learned to control the robot from brain states using only a binary evaluative feedback regarding whether previously chosen robot actions were good or bad. The RLBMI was able to maintain control over the system throughout sessions spanning multiple weeks. Furthermore, the RLBMI was able to quickly adapt and maintain control of the robot despite dramatic perturbations to the neural inputs, including a series of tests in which the neuron input space was deliberately halved or doubled. PMID:24498055

  2. Batch-mode Reinforcement Learning for improved hydro-environmental systems management

    NASA Astrophysics Data System (ADS)

    Castelletti, A.; Galelli, S.; Restelli, M.; Soncini-Sessa, R.

    2010-12-01

    Despite the great progresses made in the last decades, the optimal management of hydro-environmental systems still remains a very active and challenging research area. The combination of multiple, often conflicting interests, high non-linearities of the physical processes and the management objectives, strong uncertainties in the inputs, and high dimensional state makes the problem challenging and intriguing. Stochastic Dynamic Programming (SDP) is one of the most suitable methods for designing (Pareto) optimal management policies preserving the original problem complexity. However, it suffers from a dual curse, which, de facto, prevents its practical application to even reasonably complex water systems. (i) Computational requirement grows exponentially with state and control dimension (Bellman's curse of dimensionality), so that SDP can not be used with water systems where the state vector includes more than few (2-3) units. (ii) An explicit model of each system's component is required (curse of modelling) to anticipate the effects of the system transitions, i.e. any information included into the SDP framework can only be either a state variable described by a dynamic model or a stochastic disturbance, independent in time, with the associated pdf. Any exogenous information that could effectively improve the system operation cannot be explicitly considered in taking the management decision, unless a dynamic model is identified for each additional information, thus adding to the problem complexity through the curse of dimensionality (additional state variables). To mitigate this dual curse, the combined use of batch-mode Reinforcement Learning (bRL) and Dynamic Model Reduction (DMR) techniques is explored in this study. bRL overcomes the curse of modelling by replacing explicit modelling with an external simulator and/or historical observations. The curse of dimensionality is averted using a functional approximation of the SDP value function based on proper non

  3. Development and Evaluation of Mechatronics Learning System in a Web-Based Environment

    ERIC Educational Resources Information Center

    Shyr, Wen-Jye

    2011-01-01

    The development of remote laboratory suitable for the reinforcement of undergraduate level teaching of mechatronics is important. For the reason, a Web-based mechatronics learning system, called the RECOLAB (REmote COntrol LABoratory), for remote learning in engineering education has been developed in this study. The web-based environment is an…

  4. A comparison of differential reinforcement procedures with children with autism.

    PubMed

    Boudreau, Brittany A; Vladescu, Jason C; Kodak, Tiffany M; Argott, Paul J; Kisamore, April N

    2015-12-01

    The current evaluation compared the effects of 2 differential reinforcement arrangements and a nondifferential reinforcement arrangement on the acquisition of tacts for 3 children with autism. Participants learned in all reinforcement-based conditions, and we discuss areas for future research in light of these findings and potential limitations. PMID:26174019

  5. Learning-Based Curriculum Development

    ERIC Educational Resources Information Center

    Nygaard, Claus; Hojlt, Thomas; Hermansen, Mads

    2008-01-01

    This article is written to inspire curriculum developers to centre their efforts on the learning processes of students. It presents a learning-based paradigm for higher education and demonstrates the close relationship between curriculum development and students' learning processes. The article has three sections: Section "The role of higher…

  6. Suboptimal choice in pigeons: Choice is primarily based on the value of the conditioned reinforcer rather than overall reinforcement rate.

    PubMed

    Smith, Aaron P; Zentall, Thomas R

    2016-04-01

    Pigeons have sometimes shown a preference for a signaled 50% reinforcement alternative (leading half of the time to a stimulus that signaled 100% reinforcement and otherwise to a stimulus that signaled 0% reinforcement) over a 100% reinforcement alternative. We hypothesized that pigeons may actually be indifferent between the 2 alternatives with previous inconsistent preferences resulting in part from an artifact of the use of a spatial discrimination. In the present experiments, we tested the hypothesis that pigeons would be indifferent between alternatives that provide conditioned reinforcers of equal value. In Experiment 1, we used the signaled 50% reinforcement versus 100% reinforcement procedure, but cued the alternatives with shapes that varied in their spatial location from trial to trial. Consistent with the stimulus value hypothesis, the pigeons showed indifference between the alternatives. In Experiment 2, to confirm that the pigeons could discriminate between the shapes, we removed the discriminative function from the 50% reinforcement alternative and found a clear preference for the 100% reinforcement alternative. Finally, in Experiment 3, when we returned the discriminative function to the 50% reinforcement alternative and reduced the 100% reinforcement alternative to 50% reinforcement, we found a clear preference for the discriminative stimulus alternative. These results support the hypothesis that pigeons prefer the alternative with the conditioned reinforcer that best predicts reinforcement, whereas its frequency may be relatively unimportant. (PsycINFO Database Record PMID:26881902

  7. Grid Cells, Place Cells, and Geodesic Generalization for Spatial Reinforcement Learning

    PubMed Central

    Gustafson, Nicholas J.; Daw, Nathaniel D.

    2011-01-01

    Reinforcement learning (RL) provides an influential characterization of the brain's mechanisms for learning to make advantageous choices. An important problem, though, is how complex tasks can be represented in a way that enables efficient learning. We consider this problem through the lens of spatial navigation, examining how two of the brain's location representations—hippocampal place cells and entorhinal grid cells—are adapted to serve as basis functions for approximating value over space for RL. Although much previous work has focused on these systems' roles in combining upstream sensory cues to track location, revisiting these representations with a focus on how they support this downstream decision function offers complementary insights into their characteristics. Rather than localization, the key problem in learning is generalization between past and present situations, which may not match perfectly. Accordingly, although neural populations collectively offer a precise representation of position, our simulations of navigational tasks verify the suggestion that RL gains efficiency from the more diffuse tuning of individual neurons, which allows learning about rewards to generalize over longer distances given fewer training experiences. However, work on generalization in RL suggests the underlying representation should respect the environment's layout. In particular, although it is often assumed that neurons track location in Euclidean coordinates (that a place cell's activity declines “as the crow flies” away from its peak), the relevant metric for value is geodesic: the distance along a path, around any obstacles. We formalize this intuition and present simulations showing how Euclidean, but not geodesic, representations can interfere with RL by generalizing inappropriately across barriers. Our proposal that place and grid responses should be modulated by geodesic distances suggests novel predictions about how obstacles should affect spatial firing

  8. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis.

    PubMed

    Glimcher, Paul W

    2011-09-13

    A number of recent advances have been achieved in the study of midbrain dopaminergic neurons. Understanding these advances and how they relate to one another requires a deep understanding of the computational models that serve as an explanatory framework and guide ongoing experimental inquiry. This intertwining of theory and experiment now suggests very clearly that the phasic activity of the midbrain dopamine neurons provides a global mechanism for synaptic modification. These synaptic modifications, in turn, provide the mechanistic underpinning for a specific class of reinforcement learning mechanisms that now seem to underlie much of human and animal behavior. This review describes both the critical empirical findings that are at the root of this conclusion and the fantastic theoretical advances from which this conclusion is drawn. PMID:21389268

  9. FROM REINFORCEMENT LEARNING MODELS OF THE BASAL GANGLIA TO THE PATHOPHYSIOLOGY OF PSYCHIATRIC AND NEUROLOGICAL DISORDERS

    PubMed Central

    Maia, Tiago V.; Frank, Michael J.

    2013-01-01

    Over the last decade and a half, reinforcement learning models have fostered an increasingly sophisticated understanding of the functions of dopamine and cortico-basal ganglia-thalamo-cortical (CBGTC) circuits. More recently, these models, and the insights that they afford, have started to be used to understand key aspects of several psychiatric and neurological disorders that involve disturbances of the dopaminergic system and CBGTC circuits. We review this approach and its existing and potential applications to Parkinson’s disease, Tourette’s syndrome, attention-deficit/hyperactivity disorder, addiction, schizophrenia, and preclinical animal models used to screen novel antipsychotic drugs. The approach’s proven explanatory and predictive power bodes well for the continued growth of computational psychiatry and computational neurology. PMID:21270784

  10. Development of Flax Fibre based Textile Reinforcements for Composite Applications

    NASA Astrophysics Data System (ADS)

    Goutianos, S.; Peijs, T.; Nystrom, B.; Skrifvars, M.

    2006-07-01

    Most developments in the area of natural fibre reinforced composites have focused on random discontinuous fibre composite systems. The development of continuous fibre reinforced composites is, however, essential for manufacturing materials, which can be used in load-bearing/structural applications. The current work aims to develop high-performance natural fibre composite systems for structural applications using continuous textile reinforcements like UD-tapes or woven fabrics. One of the main problems in this case is the optimisation of the yarn to be used to manufacture the textile reinforcement. Low twisted yarns display a very low strength when tested dry in air and therefore they cannot be used in processes such as pultrusion or textile manufacturing routes. On the other hand, by increasing the level of twist, a degradation of the mechanical properties is observed in impregnated yarns (e.g., unidirectional composites) similar to off-axis composites. Therefore, an optimum twist should be used to balance processability and mechanical properties. Subsequently, different types of fabrics (i.e., biaxial plain weaves, unidirectional fabrics and non-crimp fabrics) were produced and evaluated as reinforcement in composites manufactured by well established manufacturing techniques such as hand lay-up, vacuum infusion, pultrusion and resin transfer moulding (RTM). Clearly, as expected, the developed materials cannot directly compete in terms of strength with glass fibre composites. However, they are clearly able to compete with these materials in terms of stiffness, especially if the low density of flax is taken into account. Their properties are however very favourable when compared with non-woven glass composites.

  11. Lessons Learned in Computer-Based Learning.

    ERIC Educational Resources Information Center

    Modesitt, Kenneth L.

    This personal account of the development of computer-based learning from the 1960s to the present argues that the 1960s were a period of gestation. Instructional applications of computers at that time included efforts to simulate physics experiments and the debut of the PLATO system, which already had the ability to deliver interactive instruction…

  12. Examining Organizational Learning in Schools: The Role of Psychological Safety, Experimentation, and Leadership that Reinforces Learning

    ERIC Educational Resources Information Center

    Higgins, Monica; Ishimaru, Ann; Holcombe, Rebecca; Fowler, Amy

    2012-01-01

    This study draws upon theory and methods from the field of organizational behavior to examine organizational learning (OL) in the context of a large urban US school district. We build upon prior literature on OL from the field of organizational behavior to introduce and validate three subscales that assess key dimensions of organizational learning…

  13. Work-Based Learning: Learning To Work; Working To Learn; Learning To Learn.

    ERIC Educational Resources Information Center

    Strumpf, Lori; Mains, Kristine

    This document describes a work-based learning approach designed to integrate work and learning at the workplace and thereby help young people develop the skills required for changing workplaces. The following considerations in designing work-based programs are discussed: the trend toward high performance workplaces and changes in the way work is…

  14. Effect of silica nanoparticles on reinforcement of poly(phenylene ether) based thermoplastic elastomer.

    PubMed

    Gupta, Samik; Maiti, Parnasree; Krishnamoorthy, Kumar; Krishnamurthy, Raja; Menon, Ashok; Bhowmick, Anil K

    2008-04-01

    Reinforcement of a novel poly(phenylene ether) (PPE) based thermoplastic elastomer (TPE), i.e., styrene-ethylene-butylene-styrene (SEBS)/ethylene vinyl acetate (EVA) and PPE-polystyrene (PS), was studied to develop a reinforced thermoplastic elastomer or thermoplastic vulcanizate (TPV). An effort was made to reinforce selectively the elastomeric dispersed phase of EVA by silica nanoparticles and silica sol-gel precursors, like alkoxy orthosilanes, using twin-screw extrusion and injection molding processes. Improvement of tensile strength and percent elongation at break was observed both with silica nanoparticles and tetraethoxy orthosilane (TEOS). Addition of TEOS transformed the dispersed EVA lamellar morphology into semispherical domains as a consequence of possible crosslinking. Soxhlet extraction was done on the silica and TEOS reinforced materials. The insoluble residues collected from both the silica and TEOS reinforced samples were analyzed in detail using both morphological and spectroscopic studies. This extensive study also provided an in-depth conceptual understanding of the PPE based TPE behavior upon reinforcement with silica nanoparticles and silica sol-gel precursors and the effect of reinforcement on recycling behavior. PMID:18572622

  15. Mesoscale simulations of particle reinforced epoxy-based composites

    NASA Astrophysics Data System (ADS)

    White, Bradley W.; Springer, Harry Keo; Jordan, Jennifer L.; Spowart, Jonathan E.; Thadhani, Naresh

    2012-03-01

    Polymer matrix composites reinforced with metal powders have complex microstructures that vary greatly from differences in particle size, morphology, loading fractions, etc. The effects of the underlying microstructure on the mechanical and wave propagation behavior of these composites during dynamic loading conditions are not well understood. To better understand these effects, epoxy (Epon826/DEA) reinforced with different particle sizes of Al and loading fractions of Al and Ni were prepared by casting. Microstructures from the composites were then used in 2D plane strain mesoscale simulations. The effect of varying velocity loading conditions on the wave velocity was then examined to determine the Us-Up and particle deformation response as a function of composite configuration.

  16. Al-based metal matrix composites reinforced with nanocrystalline Al-Ti-Ni particles

    NASA Astrophysics Data System (ADS)

    Scudino, S.; Ali, F.; Surreddi, K. B.; Prashanth, K. G.; Sakaliyska, M.; Eckert, J.

    2010-07-01

    Al-based metal matrix composites containing different volume fractions of nanocrystalline Al70Ti20Ni10 reinforcing particles have been produced by powder metallurgy and the effect of the volume fraction of reinforcement on the mechanical properties of the composites has been studied. Room temperature compression tests reveal a considerable improvement of the mechanical properties as compared to pure Aluminum. The compressive strength increases from 155 MPa for pure Al to about 200 and 240 MPa for the samples with 20 and 40 vol.% of reinforcement, respectively, while retaining appreciable plastic deformation with a fracture strain ranging between 43 and 28 %.

  17. Dose Dependent Dopaminergic Modulation of Reward-Based Learning in Parkinson's Disease

    ERIC Educational Resources Information Center

    van Wouwe, N. C.; Ridderinkhof, K. R.; Band, G. P. H.; van den Wildenberg, W. P. M.; Wylie, S. A.

    2012-01-01

    Learning to select optimal behavior in new and uncertain situations is a crucial aspect of living and requires the ability to quickly associate stimuli with actions that lead to rewarding outcomes. Mathematical models of reinforcement-based learning to select rewarding actions distinguish between (1) the formation of stimulus-action-reward…

  18. Application of Reinforcement Learning in Cognitive Radio Networks: Models and Algorithms

    PubMed Central

    Yau, Kok-Lim Alvin; Poh, Geong-Sen; Chien, Su Fong; Al-Rawi, Hasan A. A.

    2014-01-01

    Cognitive radio (CR) enables unlicensed users to exploit the underutilized spectrum in licensed spectrum whilst minimizing interference to licensed users. Reinforcement learning (RL), which is an artificial intelligence approach, has been applied to enable each unlicensed user to observe and carry out optimal actions for performance enhancement in a wide range of schemes in CR, such as dynamic channel selection and channel sensing. This paper presents new discussions of RL in the context of CR networks. It provides an extensive review on how most schemes have been approached using the traditional and enhanced RL algorithms through state, action, and reward representations. Examples of the enhancements on RL, which do not appear in the traditional RL approach, are rules and cooperative learning. This paper also reviews performance enhancements brought about by the RL algorithms and open issues. This paper aims to establish a foundation in order to spark new research interests in this area. Our discussion has been presented in a tutorial manner so that it is comprehensive to readers outside the specialty of RL and CR. PMID:24995352

  19. Application of reinforcement learning in cognitive radio networks: models and algorithms.

    PubMed

    Yau, Kok-Lim Alvin; Poh, Geong-Sen; Chien, Su Fong; Al-Rawi, Hasan A A

    2014-01-01

    Cognitive radio (CR) enables unlicensed users to exploit the underutilized spectrum in licensed spectrum whilst minimizing interference to licensed users. Reinforcement learning (RL), which is an artificial intelligence approach, has been applied to enable each unlicensed user to observe and carry out optimal actions for performance enhancement in a wide range of schemes in CR, such as dynamic channel selection and channel sensing. This paper presents new discussions of RL in the context of CR networks. It provides an extensive review on how most schemes have been approached using the traditional and enhanced RL algorithms through state, action, and reward representations. Examples of the enhancements on RL, which do not appear in the traditional RL approach, are rules and cooperative learning. This paper also reviews performance enhancements brought about by the RL algorithms and open issues. This paper aims to establish a foundation in order to spark new research interests in this area. Our discussion has been presented in a tutorial manner so that it is comprehensive to readers outside the specialty of RL and CR. PMID:24995352

  20. Incoherent control of quantum systems with wavefunction-controllable subspaces via quantum reinforcement learning.

    PubMed

    Dong, Daoyi; Chen, Chunlin; Tarn, Tzyh-Jong; Pechen, Alexander; Rabitz, Herschel

    2008-08-01

    In this paper, an incoherent control scheme for accomplishing the state control of a class of quantum systems which have wavefunction-controllable subspaces is proposed. This scheme includes the following two steps: projective measurement on the initial state and learning control in the wavefunction-controllable subspace. The first step probabilistically projects the initial state into the wavefunction-controllable subspace. The probability of success is sensitive to the initial state; however, it can be greatly improved through multiple experiments on several identical initial states even in the case with a small probability of success for an individual measurement. The second step finds a local optimal control sequence via quantum reinforcement learning and drives the controlled system to the objective state through a set of suitable controls. In this strategy, the initial states can be unknown identical states, the quantum measurement is used as an effective control, and the controlled system is not necessarily unitarily controllable. This incoherent control scheme provides an alternative quantum engineering strategy for locally controllable quantum systems. PMID:18632384

  1. Toward an autonomous brain machine interface: integrating sensorimotor reward modulation and reinforcement learning.

    PubMed

    Marsh, Brandi T; Tarigoppula, Venkata S Aditya; Chen, Chen; Francis, Joseph T

    2015-05-13

    For decades, neurophysiologists have worked on elucidating the function of the cortical sensorimotor control system from the standpoint of kinematics or dynamics. Recently, computational neuroscientists have developed models that can emulate changes seen in the primary motor cortex during learning. However, these simulations rely on the existence of a reward-like signal in the primary sensorimotor cortex. Reward modulation of the primary sensorimotor cortex has yet to be characterized at the level of neural units. Here we demonstrate that single units/multiunits and local field potentials in the primary motor (M1) cortex of nonhuman primates (Macaca radiata) are modulated by reward expectation during reaching movements and that this modulation is present even while subjects passively view cursor motions that are predictive of either reward or nonreward. After establishing this reward modulation, we set out to determine whether we could correctly classify rewarding versus nonrewarding trials, on a moment-to-moment basis. This reward information could then be used in collaboration with reinforcement learning principles toward an autonomous brain-machine interface. The autonomous brain-machine interface would use M1 for both decoding movement intention and extraction of reward expectation information as evaluative feedback, which would then update the decoding algorithm as necessary. In the work presented here, we show that this, in theory, is possible. PMID:25972167

  2. Reinforcement learning signals in the anterior cingulate cortex code for others' false beliefs.

    PubMed

    Apps, M A J; Green, R; Ramnani, N

    2013-01-01

    The ability to recognise that another's belief is false is a hallmark of our capacity to understand others' mental states. It has been suggested that the computational and neural mechanisms that underpin learning about others' mental states may be similar to those that underpin first-person Reinforcement Learning (RL). In RL, unexpected decision-making outcomes constitute prediction errors (PE), which are coded for by neurons in the Anterior Cingulate Cortex (ACC). Does the ACC signal the PEs (false beliefs) of others about the outcomes of their decisions? We scanned subjects using fMRI while they monitored a third-person's decisions and similar responses made by a computer. The outcomes of the trials were manipulated, such that the actual outcome was unexpectedly different from the predicted outcome on 1/3 of trials. We examined activity time-locked to privileged information which indicated the actual outcomes only to subjects. Activity in the gyral ACC was found when the outcomes of the third-person's decisions were unexpectedly positive. Activity in the sulcal ACC was found when the third-person's or computer's outcomes were unexpectedly positive. We suggest that a property of the ACC is that it codes PEs, with a portion of the gyral ACC specialised for processing the PEs of others. PMID:22982355

  3. Applying Learning Design to Work-Based Learning

    ERIC Educational Resources Information Center

    Miao, Yongwu; Hoppe, Heinz Ulrich

    2011-01-01

    Learning design is currently slanted to reflect a course-based approach to learning. This article explores whether the concept of learning design could be applied to support the informal aspects of work-based learning (WBL). It also discusses the characteristics of WBL and presents a WBL-specific learning design that highlights the key features…

  4. The Effects of Observation of Learn Units during Reinforcement and Correction Conditions on the Rate of Learning Math Algorithms by Fifth Grade Students

    ERIC Educational Resources Information Center

    Neu, Jessica Adele

    2013-01-01

    I conducted two studies on the comparative effects of the observation of learn units during (a) reinforcement or (b) correction conditions on the acquisition of math objectives. The dependent variables were the within-session cumulative numbers of correct responses emitted during observational sessions. The independent variables were the…

  5. Ballistic Impact Properties of Zr-Based Amorphous Alloy Composites Reinforced with Woven Continuous Fibers

    NASA Astrophysics Data System (ADS)

    Kim, Gyeong Su; Son, Chang-Young; Lee, Sang-Bok; Lee, Sang-Kwan; Song, Young Buem; Lee, Sunghak

    2012-03-01

    This study aims at investigating ballistic impact properties of Zr-based amorphous alloy (LM1 alloy) matrix composites reinforced with woven stainless steel or glass continuous fibers. The fiber-reinforced composites with excellent fiber/matrix interfaces were fabricated without pores and misinfiltration by liquid pressing process, and contained 35 to 41 vol pct of woven continuous fibers homogeneously distributed in the amorphous matrix. The woven-STS-continuous-fiber-reinforced composite consisted of the LM1 alloy layer of 1.0 mm in thickness in the upper region and the fiber-reinforced composite layer in the lower region. The hard LM1 alloy layer absorbed the ballistic impact energy by forming many cracks, and the fiber-reinforced composite layer interrupted the crack propagation and blocked the impact and traveling of the projectile, thereby resulting in the improvement of ballistic performance by about 20 pct over the LM1 alloy. According to the ballistic impact test data of the woven-glass-continuous-fiber-reinforced composite, glass fibers were preferentially fragmented to form a number of cracks, and the amorphous matrix accelerated the fragmentation of glass fibers and the initiation of cracks. Because of the absorption process of ballistic impact energy by forming very large amounts of cracks, fragments, and debris, the glass-fiber-reinforced composite showed better ballistic performance than the LM1 alloy.

  6. Relationship between Reinforcement and Eye Movements during Ocular Motor Training with Learning Disabled Children.

    ERIC Educational Resources Information Center

    Punnett, Audrey F.; Steinhauer, Gene D.

    1984-01-01

    Four reading disabled children were given eight sessions of ocular motor training with reinforcement and eight sessions without reinforcement. Two reading disabled control Ss were treated similarly but received no ocular motor training. Results demonstrated that reinforcement can improve ocular motor skills, which in turn elevates reading…

  7. Brain-based Learning.

    ERIC Educational Resources Information Center

    Weiss, Ruth Palombo

    2000-01-01

    Discusses brain research and how new imaging technologies allow scientists to explore how human brains process memory, emotion, attention, patterning, motivation, and context. Explains how brain research is being used to revise learning theories. (JOW)

  8. Does Artificial Tutoring Foster Inquiry Based Learning?

    ERIC Educational Resources Information Center

    Schmoelz, Alexander; Swertz, Christian; Forstner, Alexandra; Barberi, Alessandro

    2014-01-01

    This contribution looks at the Intelligent Tutoring Interface for Technology Enhanced Learning, which integrates multistage-learning and inquiry-based learning in an adaptive e-learning system. Based on a common pedagogical ontology, adaptive e-learning systems can be enabled to recommend learning objects and activities, which follow inquiry-based…

  9. Problem Based Learning: Mystery Disease.

    ERIC Educational Resources Information Center

    Bohland, Mark A.

    This guide features a problem-based learning (PBL) unit specifically designed for student-centered learning of new and meaningful content on diseases. Students grapple with a complex and changing problem that requires higher level thinking skills in an environment in which students work both individually and in collaboration with others. Includes…

  10. The Reinforcement Effect of Nano-Zirconia on the Transverse Strength of Repaired Acrylic Denture Base

    PubMed Central

    ArRejaie, Aws S.; Abdel-Halim, Mohamed Saber; Rahoma, Ahmed

    2016-01-01

    Objective. The aim of this study was to evaluate the effect of incorporation of glass fiber, zirconia, and nano-zirconia on the transverse strength of repaired denture base. Materials and Methods. Eighty specimens of heat polymerized acrylic resin were prepared and randomly divided into eight groups (n = 10): one intact group (control) and seven repaired groups. One group was repaired with autopolymerized resin while the other six groups were repaired using autopolymerized resin reinforced with 2 wt% or 5 wt% glass fiber, zirconia, or nano-zirconia particles. A three-point bending test was used to measure the transverse strength. The results were analyzed using SPSS and repeated measure ANOVA and post hoc least significance (LSD) test (P ≤ 0.05). Results. Among repaired groups it was found that autopolymerized resin reinforced with 2 or 5 wt% nano-zirconia showed the highest transverse strength (P ≤ 0.05). Repairs with autopolymerized acrylic resin reinforced with 5 wt% zirconia showed the lowest transverse strength value. There was no significant difference between the groups repaired with repair resin without reinforcement, 2 wt% zirconia, and glass fiber reinforced resin. Conclusion. Reinforcing of repair material with nano-zirconia may significantly improve the transverse strength of some fractured denture base polymers. PMID:27366150

  11. The Reinforcement Effect of Nano-Zirconia on the Transverse Strength of Repaired Acrylic Denture Base.

    PubMed

    Gad, Mohammed; ArRejaie, Aws S; Abdel-Halim, Mohamed Saber; Rahoma, Ahmed

    2016-01-01

    Objective. The aim of this study was to evaluate the effect of incorporation of glass fiber, zirconia, and nano-zirconia on the transverse strength of repaired denture base. Materials and Methods. Eighty specimens of heat polymerized acrylic resin were prepared and randomly divided into eight groups (n = 10): one intact group (control) and seven repaired groups. One group was repaired with autopolymerized resin while the other six groups were repaired using autopolymerized resin reinforced with 2 wt% or 5 wt% glass fiber, zirconia, or nano-zirconia particles. A three-point bending test was used to measure the transverse strength. The results were analyzed using SPSS and repeated measure ANOVA and post hoc least significance (LSD) test (P ≤ 0.05). Results. Among repaired groups it was found that autopolymerized resin reinforced with 2 or 5 wt% nano-zirconia showed the highest transverse strength (P ≤ 0.05). Repairs with autopolymerized acrylic resin reinforced with 5 wt% zirconia showed the lowest transverse strength value. There was no significant difference between the groups repaired with repair resin without reinforcement, 2 wt% zirconia, and glass fiber reinforced resin. Conclusion. Reinforcing of repair material with nano-zirconia may significantly improve the transverse strength of some fractured denture base polymers. PMID:27366150

  12. Optimal reinforcement of training datasets in semi-supervised landmark-based segmentation

    NASA Astrophysics Data System (ADS)

    Ibragimov, Bulat; Likar, Boštjan; Pernuš, Franjo; Vrtovec, Tomaž

    2015-03-01

    During the last couple of decades, the development of computerized image segmentation shifted from unsupervised to supervised methods, which made segmentation results more accurate and robust. However, the main disadvantage of supervised segmentation is a need for manual image annotation that is time-consuming and subjected to human error. To reduce the need for manual annotation, we propose a novel learning approach for training dataset reinforcement in the area of landmark-based segmentation, where newly detected landmarks are optimally combined with reference landmarks from the training dataset and therefore enriches the training process. The approach is formulated as a nonlinear optimization problem, where the solution is a vector of weighting factors that measures how reliable are the detected landmarks. The detected landmarks that are found to be more reliable are included into the training procedure with higher weighting factors, whereas the detected landmarks that are found to be less reliable are included with lower weighting factors. The approach is integrated into the landmark-based game-theoretic segmentation framework and validated against the problem of lung field segmentation from chest radiographs.

  13. Neural signature of hierarchically structured expectations predicts clustering and transfer of rule sets in reinforcement learning.

    PubMed

    Collins, Anne Gabrielle Eva; Frank, Michael Joshua

    2016-07-01

    Often the world is structured such that distinct sensory contexts signify the same abstract rule set. Learning from feedback thus informs us not only about the value of stimulus-action associations but also about which rule set applies. Hierarchical clustering models suggest that learners discover structure in the environment, clustering distinct sensory events into a single latent rule set. Such structure enables a learner to transfer any newly acquired information to other contexts linked to the same rule set, and facilitates re-use of learned knowledge in novel contexts. Here, we show that humans exhibit this transfer, generalization and clustering during learning. Trial-by-trial model-based analysis of EEG signals revealed that subjects' reward expectations incorporated this hierarchical structure; these structured neural signals were predictive of behavioral transfer and clustering. These results further our understanding of how humans learn and generalize flexibly by building abstract, behaviorally relevant representations of the complex, high-dimensional sensory environment. PMID:27082659

  14. Reinforcing outpatient medical student learning using brief computer tutorials: the Patient-Teacher-Tutorial sequence

    PubMed Central

    2012-01-01

    Background At present, what students read after an outpatient encounter is largely left up to them. Our objective was to evaluate the education efficacy of a clinical education model in which the student moves through a sequence that includes immediately reinforcing their learning using a specifically designed computer tutorial. Methods Prior to a 14-day Pediatric Emergency rotation, medical students completed pre-tests for two common pediatric topics: Oral Rehydration Solutions (ORS) and Fever Without Source (FWS). After encountering a patient with either FWS or a patient needing ORS, the student logged into a computer that randomly assigned them to either a) completing a relevant computer tutorial (e.g. FWS patient + FWS tutorial = “in sequence”) or b) completing the non-relevant tutorial (e.g. FWS patient + ORS tutorial = “out of sequence”). At the end of their rotation, they were tested again on both topics. Our main outcome was post-test scores on a given tutorial topic, contrasted by whether done in- or out-of-sequence. Results Ninety-two students completed the study protocol with 41 in the ‘in sequence’ group. Pre-test scores did not differ significantly. Overall, doing a computer tutorial in sequence resulted in significantly greater post-test scores (z-score 1.1 (SD 0.70) in sequence vs. 0.52 (1.1) out-of-sequence; 95% CI for difference +0.16, +0.93). Students spent longer on the tutorials when they were done in sequence (12.1 min (SD 7.3) vs. 10.5 (6.5)) though the difference was not statistically significant (95% CI diff: -1.2 min, +4.5). Conclusions Outpatient learning frameworks could be structured to take best advantage of the heightened learning potential created by patient encounters. We propose the Patient-Teacher-Tutorial sequence as a framework for organizing learning in outpatient clinical settings. PMID:22873635

  15. Adaptive internal state space construction method for reinforcement learning of a real-world agent.

    PubMed

    Samejima, K; Omori, T

    1999-10-01

    One of the difficulties encountered in the application of the reinforcement learning to real-world problems is the construction of a discrete state space from a continuous sensory input signal. In the absence of a priori knowledge about the task, a straightforward approach to this problem is to discretize the input space into a grid, and to use a lookup table. However, this method suffers from the curse of dimensionality. Some studies use continuous function approximators such as neural networks instead of lookup tables. However, when global basis functions such as sigmoid functions are used, convergence cannot be guaranteed. To overcome this problem, we propose a method in which local basis functions are incrementally assigned depending on the task requirement. Initially, only one basis function is allocated over the entire space. The basis function is divided according to the statistical property of locally weighted temporal difference error (TD error) of the value function. We applied this method to an autonomous robot collision avoidance problem, and evaluated the validity of the algorithm in simulation. The proposed algorithm, which we call adaptive basis division (ABD) algorithm, achieved the task using a smaller number of basis functions than the conventional methods. Moreover, we applied the method to a goal-directed navigation problem of a real mobile robot. The action strategy was learned using a database of sensor data, and it was then used for navigation of a real machine. The robot reached the goal using a smaller number of internal states than with the conventional methods. PMID:12662650

  16. Formable woven preforms based on in situ reinforced thermoplastic fibers

    SciTech Connect

    Robertson, C.G.; Souza, J.P. de; Baird, D.G.

    1995-12-01

    Blends of Vectra B950 (VB) and polypropylene (PP) were spun into fibers utilizing a dual extrusion process for use in formable fabric prepregs. Fibers of 50/50 weight composition were processed up to fiber draw ratios of 106. The tensile modulus of these fibers showed positive deviation from the rule of mixtures for draw ratios greater than 40, and the tensile modulus and strength properties did not level off within the range of draw ratios investigated. The fibers, pre-wetted with polypropylene, were woven into fabrics that were subsequently impregnated with polypropylene sheet to form composites. The tensile mechanical properties of these composites were nearly equivalent to those of long glass fiber reinforced polypropylene. At temperatures between 240 and 280{degrees}C, composites of 6.3 wt.% VB proved formable with elongation to break values in excess of 20%. Impregnated fabric composites were successfully thermoformed without noticeable fiber damage, and a combined fabric impregnation / thermoforming process was developed.

  17. Principal components analysis of reward prediction errors in a reinforcement learning task.

    PubMed

    Sambrook, Thomas D; Goslin, Jeremy

    2016-01-01

    Models of reinforcement learning represent reward and punishment in terms of reward prediction errors (RPEs), quantitative signed terms describing the degree to which outcomes are better than expected (positive RPEs) or worse (negative RPEs). An electrophysiological component known as feedback related negativity (FRN) occurs at frontocentral sites 240-340ms after feedback on whether a reward or punishment is obtained, and has been claimed to neurally encode an RPE. An outstanding question however, is whether the FRN is sensitive to the size of both positive RPEs and negative RPEs. Previous attempts to answer this question have examined the simple effects of RPE size for positive RPEs and negative RPEs separately. However, this methodology can be compromised by overlap from components coding for unsigned prediction error size, or "salience", which are sensitive to the absolute size of a prediction error but not its valence. In our study, positive and negative RPEs were parametrically modulated using both reward likelihood and magnitude, with principal components analysis used to separate out overlying components. This revealed a single RPE encoding component responsive to the size of positive RPEs, peaking at ~330ms, and occupying the delta frequency band. Other components responsive to unsigned prediction error size were shown, but no component sensitive to negative RPE size was found. PMID:26196667

  18. CNTRICS imaging biomarkers final task selection: Long-term memory and reinforcement learning.

    PubMed

    Ragland, John D; Cohen, Neal J; Cools, Roshan; Frank, Michael J; Hannula, Deborah E; Ranganath, Charan

    2012-01-01

    Functional imaging paradigms hold great promise as biomarkers for schizophrenia research as they can detect altered neural activity associated with the cognitive and emotional processing deficits that are so disabling to this patient population. In an attempt to identify the most promising functional imaging biomarkers for research on long-term memory (LTM), the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS) initiative selected "item encoding and retrieval," "relational encoding and retrieval," and "reinforcement learning" as key LTM constructs to guide the nomination process. This manuscript reports on the outcome of the third CNTRICS biomarkers meeting in which nominated paradigms in each of these domains were discussed by a review panel to arrive at a consensus on which of the nominated paradigms could be recommended for immediate translational development. After briefly describing this decision process, information is presented from the nominating authors describing the 4 functional imaging paradigms that were selected for immediate development. In addition to describing the tasks, information is provided on cognitive and neural construct validity, sensitivity to behavioral or pharmacological manipulations, availability of animal models, psychometric characteristics, effects of schizophrenia, and avenues for future development. PMID:22102094

  19. A junction-tree based learning algorithm to optimize network wide traffic control: A coordinated multi-agent framework

    SciTech Connect

    Zhu, Feng; Aziz, H. M. Abdul; Qian, Xinwu; Ukkusuri, Satish V.

    2015-01-31

    Our study develops a novel reinforcement learning algorithm for the challenging coordinated signal control problem. Traffic signals are modeled as intelligent agents interacting with the stochastic traffic environment. The model is built on the framework of coordinated reinforcement learning. The Junction Tree Algorithm (JTA) based reinforcement learning is proposed to obtain an exact inference of the best joint actions for all the coordinated intersections. Moreover, the algorithm is implemented and tested with a network containing 18 signalized intersections in VISSIM. Finally, our results show that the JTA based algorithm outperforms independent learning (Q-learning), real-time adaptive learning, and fixed timing plans in terms of average delay, number of stops, and vehicular emissions at the network level.

  20. A junction-tree based learning algorithm to optimize network wide traffic control: A coordinated multi-agent framework

    DOE PAGESBeta

    Zhu, Feng; Aziz, H. M. Abdul; Qian, Xinwu; Ukkusuri, Satish V.

    2015-01-31

    Our study develops a novel reinforcement learning algorithm for the challenging coordinated signal control problem. Traffic signals are modeled as intelligent agents interacting with the stochastic traffic environment. The model is built on the framework of coordinated reinforcement learning. The Junction Tree Algorithm (JTA) based reinforcement learning is proposed to obtain an exact inference of the best joint actions for all the coordinated intersections. Moreover, the algorithm is implemented and tested with a network containing 18 signalized intersections in VISSIM. Finally, our results show that the JTA based algorithm outperforms independent learning (Q-learning), real-time adaptive learning, and fixed timing plansmore » in terms of average delay, number of stops, and vehicular emissions at the network level.« less

  1. Reinforcement Learning of Linking and Tracing Contours in Recurrent Neural Networks

    PubMed Central

    Brosch, Tobias; Neumann, Heiko; Roelfsema, Pieter R.

    2015-01-01

    The processing of a visual stimulus can be subdivided into a number of stages. Upon stimulus presentation there is an early phase of feedforward processing where the visual information is propagated from lower to higher visual areas for the extraction of basic and complex stimulus features. This is followed by a later phase where horizontal connections within areas and feedback connections from higher areas back to lower areas come into play. In this later phase, image elements that are behaviorally relevant are grouped by Gestalt grouping rules and are labeled in the cortex with enhanced neuronal activity (object-based attention in psychology). Recent neurophysiological studies revealed that reward-based learning influences these recurrent grouping processes, but it is not well understood how rewards train recurrent circuits for perceptual organization. This paper examines the mechanisms for reward-based learning of new grouping rules. We derive a learning rule that can explain how rewards influence the information flow through feedforward, horizontal and feedback connections. We illustrate the efficiency with two tasks that have been used to study the neuronal correlates of perceptual organization in early visual cortex. The first task is called contour-integration and demands the integration of collinear contour elements into an elongated curve. We show how reward-based learning causes an enhancement of the representation of the to-be-grouped elements at early levels of a recurrent neural network, just as is observed in the visual cortex of monkeys. The second task is curve-tracing where the aim is to determine the endpoint of an elongated curve composed of connected image elements. If trained with the new learning rule, neural networks learn to propagate enhanced activity over the curve, in accordance with neurophysiological data. We close the paper with a number of model predictions that can be tested in future neurophysiological and computational studies

  2. Reinforcement Learning of Linking and Tracing Contours in Recurrent Neural Networks.

    PubMed

    Brosch, Tobias; Neumann, Heiko; Roelfsema, Pieter R

    2015-10-01

    The processing of a visual stimulus can be subdivided into a number of stages. Upon stimulus presentation there is an early phase of feedforward processing where the visual information is propagated from lower to higher visual areas for the extraction of basic and complex stimulus features. This is followed by a later phase where horizontal connections within areas and feedback connections from higher areas back to lower areas come into play. In this later phase, image elements that are behaviorally relevant are grouped by Gestalt grouping rules and are labeled in the cortex with enhanced neuronal activity (object-based attention in psychology). Recent neurophysiological studies revealed that reward-based learning influences these recurrent grouping processes, but it is not well understood how rewards train recurrent circuits for perceptual organization. This paper examines the mechanisms for reward-based learning of new grouping rules. We derive a learning rule that can explain how rewards influence the information flow through feedforward, horizontal and feedback connections. We illustrate the efficiency with two tasks that have been used to study the neuronal correlates of perceptual organization in early visual cortex. The first task is called contour-integration and demands the integration of collinear contour elements into an elongated curve. We show how reward-based learning causes an enhancement of the representation of the to-be-grouped elements at early levels of a recurrent neural network, just as is observed in the visual cortex of monkeys. The second task is curve-tracing where the aim is to determine the endpoint of an elongated curve composed of connected image elements. If trained with the new learning rule, neural networks learn to propagate enhanced activity over the curve, in accordance with neurophysiological data. We close the paper with a number of model predictions that can be tested in future neurophysiological and computational studies

  3. Model-based machine learning

    PubMed Central

    Bishop, Christopher M.

    2013-01-01

    Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications. PMID:23277612

  4. Model-based machine learning.

    PubMed

    Bishop, Christopher M

    2013-02-13

    Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications. PMID:23277612

  5. Comparison of denture base resin reinforced with polyaromatic polyamide fibers of different orientations.

    PubMed

    Yu, Sang-Hui; Ahn, Dae-Hyung; Park, Ji-Su; Chung, Yong Sik; Han, In-Sik; Lim, Jung-Seop; Oh, Seunghan; Oda, Yutaka; Bae, Ji-Myung

    2013-01-01

    The aim of this study was to evaluate the effect of reinforcing polyaromatic polyamide (aramid) fibers with various orientations on the flexural properties of denture base resin. Aramid fibers with four orientations of unidirectional, woven, non-woven and paper-type were pre-impregnated and placed at the bottom of a specimen mold. Heat-polymerized denture base resin was packed over the fibers and polymerized. A three-point bending test was performed using a universal testing machine at a crosshead speed of 5 mm/min. The flexural strengths and flexural moduli of the unidirectional and woven groups were significantly higher than those of the control and other experimental groups.For the flexural moduli, all experimental groups showed significantly higher reinforcing effects than the control group. In conclusion, the unidirectional group located perpendicular to the direction of the load was most effective in reinforcing the denture base resin, followed by the woven group. PMID:23538771

  6. Characterization of Vc-Vb Particles Reinforced Fe-Based Composite Coatings Produced by Laser Cladding

    NASA Astrophysics Data System (ADS)

    Qu, K. L.; Wang, X. H.; Wang, Z. K.

    2016-03-01

    In situ synthesized VC-VB particles reinforced Fe-based composite coatings were produced by laser beam melting mixture of ferrovanadium (Fe-V) alloy, boron carbide (B4C), CaF2 and Fe-based self-melting powders. The results showed that VB particles with black regular and irregular blocky shape and VC with black flower-like shape were uniformly distributed in the coatings. The type, amount, and size of the reinforcements were influenced by the content of FeV40 and B4C powders. Compared to the substrate, the hardness and wear resistance of the composite coatings were greatly improved.

  7. Fun While Learning and Earning. A Look Into Chattanooga Public Schools' Token Reinforcement Program.

    ERIC Educational Resources Information Center

    Smith, William F.; Sanders, Frank J.

    A token reinforcement program was used by the Piney Woods Research and Demonstration Center in Chattanooga, Tennessee. Children who were from economically deprived homes received tokens for positive behavior. The tokens were redeemable for recess privileges, ice cream, candy, and other such reinforcers. All tokens were spent on the day earned so…

  8. Context-Outcome Associations Underlie Context-Switch Effects after Partial Reinforcement in Human Predictive Learning

    ERIC Educational Resources Information Center

    Moreno-Fernandez, Maria M.; Abad, Maria J. F.; Ramos-Alvarez, Manuel M.; Rosas, Juan M.

    2011-01-01

    Predictive value for continuously reinforced cues is affected by context changes when they are trained within a context in which a different cue undergoes partial reinforcement. An experiment was conducted with the goal of exploring the mechanisms underlying this context-switch effect. Human participants were trained in a predictive learning…

  9. CTMP-based cellulose fibers modified with core-shell latex for reinforcing biocomposites.

    PubMed

    Pan, Yuanfeng; Xiao, Huining; Zhao, Yi; Wang, Zhuang

    2013-06-01

    The toughening of cellulose fiber reinforced polypropylene (PP) was performed via adsorbing the cationic latex with core-shell structure onto chemithermomechanical pulp (CTMP) fibers as reinforcements, which is a novel approach for rendering the surface of cellulose fibers elastomeric. The mechanical, morphological and thermal properties of the resulting biocomposites, containing 40% (wt) of the modified fibers, were investigated. The results showed that with the increasing of the latex dosage up to 2% (wt on dry CTMP fibers), the impact, tensile and flexural strengths of the modified CTMP/PP biocomposites were significantly increased. The toughening mechanism was discussed based on the retarding of crack propagation and the promoting of crystallization of PP matrix (as revealed by DSC characterization). The overall performance of the biocomposite demonstrated that cationic latex-modified CTMP fiber is very effective in reinforcing thermoplastic-based biocomposites along with the synergetic effect on enhancing crystallinity of polymer matrix. PMID:23618289

  10. Multi-scale modeling of fiber and fabric reinforced cement based composites

    NASA Astrophysics Data System (ADS)

    Soranakom, Chote

    With an increased use of fiber reinforced concrete in structural applications, proper characterization techniques and development of design guides are needed. This dissertation presents a multi-scale modeling approach for fiber and fabric reinforced cement-based composites. A micromechanics-based model of the yarn pullout mechanism due to the failure of the interfacial zone is presented. The effect of mechanical anchorage of transverse yarns is simulated using nonlinear spring elements. The yarn pullout mechanism was used in a meso-scale modeling approach to simulate the yarn bridging force in the crack evolution process. The tensile stress-strain response of a tension specimen that experiences distributed cracking can be simulated using a generalized finite difference approach. The stiffness degradation, tension stiffening, crack spacing evolution, and crack width characteristics of cement composites can be derived using matrix, interface and fiber properties. The theoretical models developed for fabric reinforced cement composites were then extended to cover other types of fiber reinforced concrete such as shotcrete, glass fiber reinforced concrete (GFRC), steel fiber reinforced concrete (SFRC), ferrocement and other conventional composite systems. The uniaxial tensile stress-strain response was used to formulate a generalized parametric closed-form solution for predicting flexural behavior of various composites at the macro-structural level. The flexural behaviors of these composites were modeled in a unified manner by means of a moment-curvature relationship based on the uniaxial material models. A variety of theoretical models were developed to address the various mechanisms including: an analytical yarn pullout model; a nonlinear finite difference fabric pullout model; a nonlinear finite difference tension model; closed-form solutions for strain-softening materials; closed-form solutions for strain-softening/hardening materials; and closed-form solutions for

  11. Problem-Based Learning Tools

    ERIC Educational Resources Information Center

    Chin, Christine; Chia, Li-Gek

    2008-01-01

    One way of implementing project-based science (PBS) is to use problem-based learning (PBL), in which students formulate their own problems. These problems are often ill-structured, mirroring complex real-life problems where data are often messy and inclusive. In this article, the authors describe how they used PBL in a ninth-grade biology class in…

  12. A Machine Learning Based Framework for Adaptive Mobile Learning

    NASA Astrophysics Data System (ADS)

    Al-Hmouz, Ahmed; Shen, Jun; Yan, Jun

    Advances in wireless technology and handheld devices have created significant interest in mobile learning (m-learning) in recent years. Students nowadays are able to learn anywhere and at any time. Mobile learning environments must also cater for different user preferences and various devices with limited capability, where not all of the information is relevant and critical to each learning environment. To address this issue, this paper presents a framework that depicts the process of adapting learning content to satisfy individual learner characteristics by taking into consideration his/her learning style. We use a machine learning based algorithm for acquiring, representing, storing, reasoning and updating each learner acquired profile.

  13. The involvement of model-based but not model-free learning signals during observational reward learning in the absence of choice.

    PubMed

    Dunne, Simon; D'Souza, Arun; O'Doherty, John P

    2016-06-01

    A major open question is whether computational strategies thought to be used during experiential learning, specifically model-based and model-free reinforcement learning, also support observational learning. Furthermore, the question of how observational learning occurs when observers must learn about the value of options from observing outcomes in the absence of choice has not been addressed. In the present study we used a multi-armed bandit task that encouraged human participants to employ both experiential and observational learning while they underwent functional magnetic resonance imaging (fMRI). We found evidence for the presence of model-based learning signals during both observational and experiential learning in the intraparietal sulcus. However, unlike during experiential learning, model-free learning signals in the ventral striatum were not detectable during this form of observational learning. These results provide insight into the flexibility of the model-based learning system, implicating this system in learning during observation as well as from direct experience, and further suggest that the model-free reinforcement learning system may be less flexible with regard to its involvement in observational learning. PMID:27052578

  14. The involvement of model-based but not model-free learning signals during observational reward learning in the absence of choice

    PubMed Central

    Dunne, Simon; D’Souza, Arun; O’Doherty, John P.

    2016-01-01

    A major open question is whether computational strategies thought to be used during experiential learning, specifically model-based and model-free reinforcement-learning, also support observational learning. Furthermore, the question of how observational learning occurs when observers must learn about the value of options from observing outcomes in the absence of choice, has not been addressed. In the present study we used a multi-armed bandit task that encouraged human participants to employ both experiential and observational learning while they underwent functional magnetic resonance imaging (fMRI). We found evidence for the presence of model-based learning signals during both observational and experiential learning in the intraparietal sulcus. However, unlike in experiential learning, model-free learning signals in the ventral striatum were not detectable during this form of observational learning. These results provide insight into the flexibilty of the model-based learning system, implicating this system in learning during observation as well as from direct experience, and further suggest that the model-free reinforcement-learning system may be less flexible with regard to its involvement in observational learning. PMID:27052578

  15. A self-learning rule base for command following in dynamical systems

    NASA Technical Reports Server (NTRS)

    Tsai, Wei K.; Lee, Hon-Mun; Parlos, Alexander

    1992-01-01

    In this paper, a self-learning Rule Base for command following in dynamical systems is presented. The learning is accomplished though reinforcement learning using an associative memory called SAM. The main advantage of SAM is that it is a function approximator with explicit storage of training samples. A learning algorithm patterned after the dynamic programming is proposed. Two artificially created, unstable dynamical systems are used for testing, and the Rule Base was used to generate a feedback control to improve the command following ability of the otherwise uncontrolled systems. The numerical results are very encouraging. The controlled systems exhibit a more stable behavior and a better capability to follow reference commands. The rules resulting from the reinforcement learning are explicitly stored and they can be modified or augmented by human experts. Due to overlapping storage scheme of SAM, the stored rules are similar to fuzzy rules.

  16. Strengthening of reinforced concrete beams with basalt-based FRP sheets: An analytical assessment

    NASA Astrophysics Data System (ADS)

    Nerilli, Francesca; Vairo, Giuseppe

    2016-06-01

    In this paper the effectiveness of the flexural strengthening of RC beams through basalt fiber-reinforced sheets is investigated. The non-linear flexural response of RC beams strengthened with FRP composites applied at the traction side is described via an analytical formulation. Validation results and some comparative analyses confirm soundness and consistency of the proposed approach, and highlight the good mechanical performances (in terms of strength and ductility enhancement of the beam) produced by basalt-based reinforcements in comparison with traditional glass or carbon FRPs.

  17. Furfural resin-based bio-nanocomposites reinforced by reactive nanocrystalline cellulose

    NASA Astrophysics Data System (ADS)

    Wang, C.; Sun, S.; Zhao, G.; He, B.; Xiao, H.

    2009-07-01

    The work presented herein has been focused on reinforcing the furfural resins (FA) by reactive-modified nanocrystalline cellulose (NCC) in an attempt to create a bio-nanocomposite completely based on natural resources. FA prepolymers were synthesized with an acid catalyst, and NCC was rendered reactive via the grafting of maleic anhydride (MAH). The resulting NCC and nanocomposites were characterized using TEM, SEM and FT-IR. It was found that NCC appeared to be spherical in shape with diameters under 100 nm. FT-IR confirmed that there were hydrogen and esterification bonding between MAH and NCC or FA prepolymer. After solidified with paratoluenesulfonic acid, NCC-reinforced FA resin composites showed granular cross-section while FA resin with layered structures. Mechanical property tests indicated that NCC-reinforced FA resin composites possessed the improved tensile and flexural strengths, in comparison with FA resin.

  18. Developing and implementing a positive behavioral reinforcement intervention in prison-based drug treatment: Project BRITE.

    PubMed

    Burdon, William M; St De Lore, Jef; Prendergast, Michael L

    2011-09-01

    Within prison settings, the reliance on punishment for controlling inappropriate or noncompliant behavior is self-evident. What is not so evident is the similarity between this reliance on punishment and the use of positive reinforcements to increase desired behaviors. However, seldom do inmates receive positive reinforcement for engaging in prosocial behaviors or, for inmates receiving drug treatment, behaviors that are consistent with or support their recovery. This study provides an overview of the development and implementation of a positive behavioral reinforcement intervention in male and female prison-based drug treatment programs. The active involvement of institutional staff, treatment staff, and inmates enrolled in the treatment programs in the development of the intervention along with the successful branding of the intervention were effective at promoting support and participation. However, these factors may also have ultimately impacted the ability of the randomized design to reliably demonstrate the effectiveness of the intervention. PMID:22185038

  19. Improving the mechanical performance of wood fiber reinforced bio-based polyurethane foam

    NASA Astrophysics Data System (ADS)

    Chang, Li-Chi

    Because of the environmental impact of fossil fuel consumption, soybean-based polyurethane (PU) foam has been developed as an alternative to be used as the core in structural insulated panels (SIPs). Wood fibers can be added to enhance the resistance of foam against bending and buckling in compression. The goal of this work is to study the effect of three modifications: fiber surface treatment, catalyst choice, and mixing method on the compression performance of wood fiber-reinforced PU foam. Foams were made with a free-rising process. The compression performance of the foams was measured and the foams were characterized using Fourier transform infrared spectroscopy (FTIR), scanning electron microscopy (SEM), and X-ray computed tomography (CT). The foam reinforced with alkali-treated fibers had improved compression performance. The foams made with various catalysts shared similar performance. The foam made using a mechanical stirrer contained well-dispersed fibers but the reinforcing capability of the fibers was reduced.

  20. Effects of Internet-Based Voucher Reinforcement and a Transdermal Nicotine Patch on Cigarette Smoking

    ERIC Educational Resources Information Center

    Glenn, Irene M.; Dallery, Jesse

    2007-01-01

    Nicotine replacement products are commonly used to promote smoking cessation, but alternative and complementary methods may increase cessation rates. The current experiment compared the short-term effects of a transdermal nicotine patch to voucher-based reinforcement of smoking abstinence on cigarette smoking. Fourteen heavy smokers (7 men and 7…

  1. Modified flax fibers reinforced soy-based composites: mechanical properties and water absorption behavior

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Flax fibers are often used in reinforced composites which have exhibited numerous advantages such as high mechanical properties, low density and biodegradability. On the other hand, the hydrophilic nature of flax fiber is a major problem. In this study, we prepare the soybean oil based composites ...

  2. Web-Based Instruction, Learning Effectiveness and Learning Behavior: The Impact of Relatedness

    ERIC Educational Resources Information Center

    Shieh, Chich-Jen; Liao, Ying; Hu, Ridong

    2013-01-01

    This study aims to discuss the effects of Web-based Instruction and Learning Behavior on Learning Effectiveness. Web-based Instruction contains the dimensions of Active Learning, Simulation-based Learning, Interactive Learning, and Accumulative Learning; and, Learning Behavior covers Learning Approach, Learning Habit, and Learning Attitude. The…

  3. Incremental learning of skill collections based on intrinsic motivation

    PubMed Central

    Metzen, Jan H.; Kirchner, Frank

    2013-01-01

    Life-long learning of reusable, versatile skills is a key prerequisite for embodied agents that act in a complex, dynamic environment and are faced with different tasks over their lifetime. We address the question of how an agent can learn useful skills efficiently during a developmental period, i.e., when no task is imposed on him and no external reward signal is provided. Learning of skills in a developmental period needs to be incremental and self-motivated. We propose a new incremental, task-independent skill discovery approach that is suited for continuous domains. Furthermore, the agent learns specific skills based on intrinsic motivation mechanisms that determine on which skills learning is focused at a given point in time. We evaluate the approach in a reinforcement learning setup in two continuous domains with complex dynamics. We show that an intrinsically motivated, skill learning agent outperforms an agent which learns task solutions from scratch. Furthermore, we compare different intrinsic motivation mechanisms and how efficiently they make use of the agent's developmental period. PMID:23898265

  4. Adaptive Device Context Based Mobile Learning Systems

    ERIC Educational Resources Information Center

    Pu, Haitao; Lin, Jinjiao; Song, Yanwei; Liu, Fasheng

    2011-01-01

    Mobile learning is e-learning delivered through mobile computing devices, which represents the next stage of computer-aided, multi-media based learning. Therefore, mobile learning is transforming the way of traditional education. However, as most current e-learning systems and their contents are not suitable for mobile devices, an approach for…

  5. Problem Based Learning in Metaverse

    ERIC Educational Resources Information Center

    Barry, Dana M.; Kanematsu, Hideyuki; Fukumura, Yoshimi

    2010-01-01

    Problem Based Learning (PBL) is a powerful tool for both science and engineering education in the real world. Therefore, Japanese educators/researchers (with assistance from a US educator) carried out a pilot study to determine the effectiveness of using PBL activities in Metaverse. Their project was carried out by student teams from the US and…

  6. Beliefs and Computer-Based Learning.

    ERIC Educational Resources Information Center

    Chiou, Guey-Fa

    1995-01-01

    Discusses the use of beliefs to guide researchers in the development of computer-based learning. Topics include properties of beliefs; beliefs about learning; beliefs about computer technologies; directions for computer-based learning, including multimedia technology, virtual reality, and groupware; and learning rationales, including…

  7. Learning diphone-based segmentation.

    PubMed

    Daland, Robert; Pierrehumbert, Janet B

    2011-01-01

    This paper reconsiders the diphone-based word segmentation model of Cairns, Shillcock, Chater, and Levy (1997) and Hockema (2006), previously thought to be unlearnable. A statistically principled learning model is developed using Bayes' theorem and reasonable assumptions about infants' implicit knowledge. The ability to recover phrase-medial word boundaries is tested using phonetic corpora derived from spontaneous interactions with children and adults. The (unsupervised and semi-supervised) learning models are shown to exhibit several crucial properties. First, only a small amount of language exposure is required to achieve the model's ceiling performance, equivalent to between 1 day and 1 month of caregiver input. Second, the models are robust to variation, both in the free parameter and the input representation. Finally, both the learning and baseline models exhibit undersegmentation, argued to have significant ramifications for speech processing as a whole. PMID:21428994

  8. Reinforcing loose foundation stones in trait-based plant ecology.

    PubMed

    Shipley, Bill; De Bello, Francesco; Cornelissen, J Hans C; Laliberté, Etienne; Laughlin, Daniel C; Reich, Peter B

    2016-04-01

    The promise of "trait-based" plant ecology is one of generalized prediction across organizational and spatial scales, independent of taxonomy. This promise is a major reason for the increased popularity of this approach. Here, we argue that some important foundational assumptions of trait-based ecology have not received sufficient empirical evaluation. We identify three such assumptions and, where possible, suggest methods of improvement: (i) traits are functional to the degree that they determine individual fitness, (ii) intraspecific variation in functional traits can be largely ignored, and (iii) functional traits show general predictive relationships to measurable environmental gradients. PMID:26796410

  9. Work-Based Learning: A Manual.

    ERIC Educational Resources Information Center

    Idaho State Div. of Vocational Education, Boise.

    This manual is a guide to local partnership councils as they plan and design work-based learning experiences for credit. Chapter 1 provides an overview of work-based learning as part of vocational education. Chapter 2 describes a variety of work-based learning experiences, including established secondary vocational program work-based learning…

  10. Segmentation of neuronal structures using SARSA (λ)-based boundary amendment with reinforced gradient-descent curve shape fitting.

    PubMed

    Zhu, Fei; Liu, Quan; Fu, Yuchen; Shen, Bairong

    2014-01-01

    The segmentation of structures in electron microscopy (EM) images is very important for neurobiological research. The low resolution neuronal EM images contain noise and generally few features are available for segmentation, therefore application of the conventional approaches to identify the neuron structure from EM images is not successful. We therefore present a multi-scale fused structure boundary detection algorithm in this study. In the algorithm, we generate an EM image Gaussian pyramid first, then at each level of the pyramid, we utilize Laplacian of Gaussian function (LoG) to attain structure boundary, we finally assemble the detected boundaries by using fusion algorithm to attain a combined neuron structure image. Since the obtained neuron structures usually have gaps, we put forward a reinforcement learning-based boundary amendment method to connect the gaps in the detected boundaries. We use a SARSA (λ)-based curve traveling and amendment approach derived from reinforcement learning to repair the incomplete curves. Using this algorithm, a moving point starts from one end of the incomplete curve and walks through the image where the decisions are supervised by the approximated curve model, with the aim of minimizing the connection cost until the gap is closed. Our approach provided stable and efficient structure segmentation. The test results using 30 EM images from ISBI 2012 indicated that both of our approaches, i.e., with or without boundary amendment, performed better than six conventional boundary detection approaches. In particular, after amendment, the Rand error and warping error, which are the most important performance measurements during structure segmentation, were reduced to very low values. The comparison with the benchmark method of ISBI 2012 and the recent developed methods also indicates that our method performs better for the accurate identification of substructures in EM images and therefore useful for the identification of imaging

  11. Segmentation of Neuronal Structures Using SARSA (λ)-Based Boundary Amendment with Reinforced Gradient-Descent Curve Shape Fitting

    PubMed Central

    Zhu, Fei; Liu, Quan; Fu, Yuchen; Shen, Bairong

    2014-01-01

    The segmentation of structures in electron microscopy (EM) images is very important for neurobiological research. The low resolution neuronal EM images contain noise and generally few features are available for segmentation, therefore application of the conventional approaches to identify the neuron structure from EM images is not successful. We therefore present a multi-scale fused structure boundary detection algorithm in this study. In the algorithm, we generate an EM image Gaussian pyramid first, then at each level of the pyramid, we utilize Laplacian of Gaussian function (LoG) to attain structure boundary, we finally assemble the detected boundaries by using fusion algorithm to attain a combined neuron structure image. Since the obtained neuron structures usually have gaps, we put forward a reinforcement learning-based boundary amendment method to connect the gaps in the detected boundaries. We use a SARSA (λ)-based curve traveling and amendment approach derived from reinforcement learning to repair the incomplete curves. Using this algorithm, a moving point starts from one end of the incomplete curve and walks through the image where the decisions are supervised by the approximated curve model, with the aim of minimizing the connection cost until the gap is closed. Our approach provided stable and efficient structure segmentation. The test results using 30 EM images from ISBI 2012 indicated that both of our approaches, i.e., with or without boundary amendment, performed better than six conventional boundary detection approaches. In particular, after amendment, the Rand error and warping error, which are the most important performance measurements during structure segmentation, were reduced to very low values. The comparison with the benchmark method of ISBI 2012 and the recent developed methods also indicates that our method performs better for the accurate identification of substructures in EM images and therefore useful for the identification of imaging

  12. Twelve-Day Reinforcement-Based Memory Retention in African Cichlids (Labidochromis caeruleus)

    PubMed Central

    Ingraham, Erica; Anderson, Nicole D.; Hurd, Peter L.; Hamilton, Trevor J.

    2016-01-01

    The formation of long-term memories for food sources is essential for the survival of most animals. Long-term memory formation in mammalian species has been demonstrated through a variety of conditioning tasks, however, the nature of long-term memory in fish is less known. In the current study, we explored whether African cichlids (Labidochromis caeruleus) could form memories for food-reinforced stimuli that last for 12 days. During the training sessions, fish were reinforced for approaching an upward drifting line grating. After a rest period of 12 days, fish demonstrated a significant preference for the upward drifting grating. To determine whether this preference could also be reversed, fish were then reinforced for approaching a downward drifting line grating after a 20-day rest period. When tested 12 days later, there were no significant differences in preference for either stimulus; however, following a second training period for the downward stimulus, there was a significant preference for the downward drifting grating. This suggests that cichlids are able to form reversible discrimination-based memories for food-reinforced stimuli that remain consolidated for at least 12 days. PMID:27582695

  13. Reinforcement of denture base PMMA with ZrO(2) nanotubes.

    PubMed

    Yu, Wei; Wang, Xixin; Tang, Qingguo; Guo, Mei; Zhao, Jianling

    2014-04-01

    In the research described, ZrO2 nanotubes were prepared by anodization. The morphologies, crystal structure, etc. were characterised by scanning electron microscope (SEM), transmission electron microscope (TEM), X-ray diffractometer (XRD), and Fourier transform infrared spectroscopy (FTIR). ZrO2 nanotubes were pre-stirred with the denture base PMMA powder by a mechanical blender and mixed with MMA liquid to fabricate reinforced composites. The composites were tested by an electromechanical universal testing machine to study the influences of contents and surface-treatment effect on the reinforcement. The ZrO2 nanoparticles were also investigated for comparative purposes. Results indicated that ZrO2 nanotubes had a better reinforcement effect than ZrO2 nanoparticles, and surface-treatment would lower the reinforcement effect of the ZrO2 nanotubes which itself was significantly different from that of the ZrO2 nanoparticles. The flexural strength of the composite was maximised when 2.0wt% untreated ZrO2 nanotubes were added. PMID:24487077

  14. Twelve-Day Reinforcement-Based Memory Retention in African Cichlids (Labidochromis caeruleus).

    PubMed

    Ingraham, Erica; Anderson, Nicole D; Hurd, Peter L; Hamilton, Trevor J

    2016-01-01

    The formation of long-term memories for food sources is essential for the survival of most animals. Long-term memory formation in mammalian species has been demonstrated through a variety of conditioning tasks, however, the nature of long-term memory in fish is less known. In the current study, we explored whether African cichlids (Labidochromis caeruleus) could form memories for food-reinforced stimuli that last for 12 days. During the training sessions, fish were reinforced for approaching an upward drifting line grating. After a rest period of 12 days, fish demonstrated a significant preference for the upward drifting grating. To determine whether this preference could also be reversed, fish were then reinforced for approaching a downward drifting line grating after a 20-day rest period. When tested 12 days later, there were no significant differences in preference for either stimulus; however, following a second training period for the downward stimulus, there was a significant preference for the downward drifting grating. This suggests that cichlids are able to form reversible discrimination-based memories for food-reinforced stimuli that remain consolidated for at least 12 days. PMID:27582695

  15. Does Feedback-Related Brain Response during Reinforcement Learning Predict Socio-motivational (In-)dependence in Adolescence?

    PubMed

    Raufelder, Diana; Boehme, Rebecca; Romund, Lydia; Golde, Sabrina; Lorenz, Robert C; Gleich, Tobias; Beck, Anne

    2016-01-01

    This multi-methodological study applied functional magnetic resonance imaging to investigate neural activation in a group of adolescent students (N = 88) during a probabilistic reinforcement learning task. We related patterns of emerging brain activity and individual learning rates to socio-motivational (in-)dependence manifested in four different motivation types (MTs): (1) peer-dependent MT, (2) teacher-dependent MT, (3) peer-and-teacher-dependent MT, (4) peer-and-teacher-independent MT. A multinomial regression analysis revealed that the individual learning rate predicts students' membership to the independent MT, or the peer-and-teacher-dependent MT. Additionally, the striatum, a brain region associated with behavioral adaptation and flexibility, showed increased learning-related activation in students with motivational independence. Moreover, the prefrontal cortex, which is involved in behavioral control, was more active in students of the peer-and-teacher-dependent MT. Overall, this study offers new insights into the interplay of motivation and learning with (1) a focus on inter-individual differences in the role of peers and teachers as source of students' individual motivation and (2) its potential neurobiological basis. PMID:27199873

  16. Does Feedback-Related Brain Response during Reinforcement Learning Predict Socio-motivational (In-)dependence in Adolescence?

    PubMed Central

    Raufelder, Diana; Boehme, Rebecca; Romund, Lydia; Golde, Sabrina; Lorenz, Robert C.; Gleich, Tobias; Beck, Anne

    2016-01-01

    This multi-methodological study applied functional magnetic resonance imaging to investigate neural activation in a group of adolescent students (N = 88) during a probabilistic reinforcement learning task. We related patterns of emerging brain activity and individual learning rates to socio-motivational (in-)dependence manifested in four different motivation types (MTs): (1) peer-dependent MT, (2) teacher-dependent MT, (3) peer-and-teacher-dependent MT, (4) peer-and-teacher-independent MT. A multinomial regression analysis revealed that the individual learning rate predicts students’ membership to the independent MT, or the peer-and-teacher-dependent MT. Additionally, the striatum, a brain region associated with behavioral adaptation and flexibility, showed increased learning-related activation in students with motivational independence. Moreover, the prefrontal cortex, which is involved in behavioral control, was more active in students of the peer-and-teacher-dependent MT. Overall, this study offers new insights into the interplay of motivation and learning with (1) a focus on inter-individual differences in the role of peers and teachers as source of students’ individual motivation and (2) its potential neurobiological basis. PMID:27199873

  17. Lack of effect of Pitressin on the learning ability of Brattleboro rats with diabetes insipidus using positively reinforced operant conditioning.

    PubMed

    Laycock, J F; Gartside, I B

    1985-08-01

    Brattleboro rats with hereditary hypothalamic diabetes insipidus (BDI) received daily subcutaneous injections of vasopressin in the form of Pitressin tannate (0.5 IU/24 hr). They were initially deprived of food and then trained to work for food reward in a Skinner box to a fixed ratio of ten presses for each pellet received. Once this schedule had been learned the rats were given a discrimination task daily for seven days. The performances of these BDI rats were compared with those of rats of the parent Long Evans (LE) strain receiving daily subcutaneous injections of vehicle (arachis oil). Comparisons were also made between these two groups of treated animals and untreated BDI and LE rats studied under similar conditions. In the initial learning trial, both control and Pitressin-treated BDI rats performed significantly better, and manifested less fear initially, than the control or vehicle-injected LE rats when first placed in the Skinner box. Once the initial task had been learned there was no marked difference in the discrimination learning between control or treated BDI and LE animals. These results support the view that vasopressin is not directly involved in all types of learning behaviour, particularly those involving positively reinforced operant conditioning. PMID:4070391

  18. Online Bahavior Aquisition of an Agent based on Coaching as Learning Assistance

    NASA Astrophysics Data System (ADS)

    Hirokawa, Masakazu; Suzuki, Kenji

    This paper describes a novel methodology, namely ``Coaching'', which allows humans to give a subjective evaluation to an agent in an iterative manner. This is an interactive learning method to improve the reinforcement learning by modifying a reward function dynamically according to given evaluations by a trainer and the learning situation of the agent. We demonstrate that the agent can learn different reward functions by given instructions such as ``good or bad'' by human's observation, and can also obtain a set of behavior based on the learnt reward functions through several experiments.

  19. Knockout crickets for the study of learning and memory: Dopamine receptor Dop1 mediates aversive but not appetitive reinforcement in crickets.

    PubMed

    Awata, Hiroko; Watanabe, Takahito; Hamanaka, Yoshitaka; Mito, Taro; Noji, Sumihare; Mizunami, Makoto

    2015-01-01

    Elucidation of reinforcement mechanisms in associative learning is an important subject in neuroscience. In mammals, dopamine neurons are thought to play critical roles in mediating both appetitive and aversive reinforcement. Our pharmacological studies suggested that octopamine and dopamine neurons mediate reward and punishment, respectively, in crickets, but recent studies in fruit-flies concluded that dopamine neurons mediates both reward and punishment, via the type 1 dopamine receptor Dop1. To resolve the discrepancy between studies in different insect species, we produced Dop1 knockout crickets using the CRISPR/Cas9 system and found that they are defective in aversive learning with sodium chloride punishment but not appetitive learning with water or sucrose reward. The results suggest that dopamine and octopamine neurons mediate aversive and appetitive reinforcement, respectively, in crickets. We suggest unexpected diversity in neurotransmitters mediating appetitive reinforcement between crickets and fruit-flies, although the neurotransmitter mediating aversive reinforcement is conserved. This study demonstrates usefulness of the CRISPR/Cas9 system for producing knockout animals for the study of learning and memory. PMID:26521965

  20. Knockout crickets for the study of learning and memory: Dopamine receptor Dop1 mediates aversive but not appetitive reinforcement in crickets

    PubMed Central

    Awata, Hiroko; Watanabe, Takahito; Hamanaka, Yoshitaka; Mito, Taro; Noji, Sumihare; Mizunami, Makoto

    2015-01-01

    Elucidation of reinforcement mechanisms in associative learning is an important subject in neuroscience. In mammals, dopamine neurons are thought to play critical roles in mediating both appetitive and aversive reinforcement. Our pharmacological studies suggested that octopamine and dopamine neurons mediate reward and punishment, respectively, in crickets, but recent studies in fruit-flies concluded that dopamine neurons mediates both reward and punishment, via the type 1 dopamine receptor Dop1. To resolve the discrepancy between studies in different insect species, we produced Dop1 knockout crickets using the CRISPR/Cas9 system and found that they are defective in aversive learning with sodium chloride punishment but not appetitive learning with water or sucrose reward. The results suggest that dopamine and octopamine neurons mediate aversive and appetitive reinforcement, respectively, in crickets. We suggest unexpected diversity in neurotransmitters mediating appetitive reinforcement between crickets and fruit-flies, although the neurotransmitter mediating aversive reinforcement is conserved. This study demonstrates usefulness of the CRISPR/Cas9 system for producing knockout animals for the study of learning and memory. PMID:26521965

  1. Efficient reinforcement learning of a reservoir network model of parametric working memory achieved with a cluster population winner-take-all readout mechanism.

    PubMed

    Cheng, Zhenbo; Deng, Zhidong; Hu, Xiaolin; Zhang, Bo; Yang, Tianming

    2015-12-01

    The brain often has to make decisions based on information stored in working memory, but the neural circuitry underlying working memory is not fully understood. Many theoretical efforts have been focused on modeling the persistent delay period activity in the prefrontal areas that is believed to represent working memory. Recent experiments reveal that the delay period activity in the prefrontal cortex is neither static nor homogeneous as previously assumed. Models based on reservoir networks have been proposed to model such a dynamical activity pattern. The connections between neurons within a reservoir are random and do not require explicit tuning. Information storage does not depend on the stable states of the network. However, it is not clear how the encoded information can be retrieved for decision making with a biologically realistic algorithm. We therefore built a reservoir-based neural network to model the neuronal responses of the prefrontal cortex in a somatosensory delayed discrimination task. We first illustrate that the neurons in the reservoir exhibit a heterogeneous and dynamical delay period activity observed in previous experiments. Then we show that a cluster population circuit decodes the information from the reservoir with a winner-take-all mechanism and contributes to the decision making. Finally, we show that the model achieves a good performance rapidly by shaping only the readout with reinforcement learning. Our model reproduces important features of previous behavior and neurophysiology data. We illustrate for the first time how task-specific information stored in a reservoir network can be retrieved with a biologically plausible reinforcement learning training scheme. PMID:26445865

  2. Technology and Problem-Based Learning

    ERIC Educational Resources Information Center

    Uden, Lorna; Beaumont, Chris

    2006-01-01

    Problem-based learning (PBL) has been the focus of many developments in teaching and learning facilitation in recent years. It has been claimed that PBL produces independent learners who are motivated, engaged in deep learning, work as a team, and develop effective strategies, skills and knowledge for life-long learning and professional work.…

  3. The Credentials of Brain-Based Learning

    ERIC Educational Resources Information Center

    Davis, Andrew

    2004-01-01

    This paper discusses the current fashion for brain-based learning, in which value-laden claims about learning are grounded in neurophysiology. It argues that brain science cannot have the authority about learning that some seek to give it. It goes on to discuss whether the claim that brain science is relevant to learning involves a category…

  4. Foundations of Game-Based Learning

    ERIC Educational Resources Information Center

    Plass, Jan L.; Homer, Bruce D.; Kinzer, Charles K.

    2015-01-01

    In this article we argue that to study or apply games as learning environments, multiple perspectives have to be taken into account. We first define game-based learning and gamification, and then discuss theoretical models that describe learning with games, arguing that playfulness is orthogonal to learning theory. We then review design elements…

  5. The Reinforcement of First Grade Science Concepts with the Use of Motor Learning Activities.

    ERIC Educational Resources Information Center

    Prager, Iris J.

    In order to test the theory that selected first-grade science concepts could be successfully reinforced with the use of motor activities, 52 first-graders were exposed to certain experimental procedures. Two separate classes of 25 students (group A) and 27 students (group B) underwent a pretest. Both classes were then taught through traditional…

  6. Delay of Reinforcement and Knowledge of Response Contingencies in Human Learning.

    ERIC Educational Resources Information Center

    Taylor, James S.; Holen, Michael C.

    Eighty undergraduate students participated in an investigation of the effects on acquisition of immediate versus 30-second delay of reinforcement under either known or unknown response contingency conditions. The students in the four experimental conditions were required to depress three buttons in proper senquence. Following a correct response,…

  7. The Perceived Impacts of Supervisor Reinforcement and Learning Objectives Importance on Transfer of Training.

    ERIC Educational Resources Information Center

    Lee, Kisung; Pucel, David J.

    1998-01-01

    This study, conducted within a Managerial Leadership Program in an oil refinery and chemical company in Korea, investigates the relationships between: perceived importance of training objectives and perceived transfer of training relative to those objectives, and types of supervisor reinforcement which trainees perceive to be most motivating and…

  8. Team Teaching Verbal, Mathematics, and Learning Skills. Howard University. The Center for Academic Reinforcement.

    ERIC Educational Resources Information Center

    Bartlett, Joan; Byrd, Roland

    Team teaching was used in three undergraduate courses to explore its potential for enhancing students' academic development. The courses were part of a program offered to freshmen with unrealized academic potential through the Howard University (District of Columbia) Center for Academic Reinforcement (CAR). A three-hour block of time was set aside…

  9. Learning Deficit in the Ability to Self-Reinforce as Related to Negative Self-Concept.

    ERIC Educational Resources Information Center

    Felker, Donald W.; Bahlke, Susan

    The study tests four hypotheses derived from the proposition that positive self-concept is partly due to an ability to utilize self-initiated verbal reinforcement. Subjects were 131 (66 boys and 65 girls) white fourth grade students from a suburban middle class school. The Piers-Harris self-concept measure was administered to all students. The…

  10. Zirconia-Nanoparticle-Reinforced Morphology-Engineered Graphene-Based Foams.

    PubMed

    Chakravarty, Dibyendu; Tiwary, Chandra Sekhar; Machado, Leonardo Dantas; Brunetto, Gustavo; Vinod, Soumya; Yadav, Ram Manohar; Galvao, Douglas S; Joshi, Shrikant V; Sundararajan, Govindan; Ajayan, Pulickel M

    2015-08-19

    The morphology of graphene-based foams can be engineered by reinforcing them with nanocrystalline zirconia, thus improving their oil-adsorption capacity; This can be observed experimentally and explained theoretically. Low zirconia fractions yield flaky microstructures where zirconia nanoparticles arrest propagating cracks. Higher zirconia concentrations possess a mesh-like interconnected structure where the degree of coiling is dependant on the local zirconia content. PMID:26171602

  11. Particle-Based Geometric and Mechanical Modelling of Woven Technical Textiles and Reinforcements for Composites

    NASA Astrophysics Data System (ADS)

    Samadi, Reza

    Technical textiles are increasingly being engineered and used in challenging applications, in areas such as safety, biomedical devices, architecture and others, where they must meet stringent demands including excellent and predictable load bearing capabilities. They also form the bases for one of the most widespread group of composite materials, fibre reinforced polymer-matrix composites (PMCs), which comprise materials made of stiff and strong fibres generally available in textile form and selected for their structural potential, combined with a polymer matrix that gives parts their shape. Manufacturing processes for PMCs and technical textiles, as well as parts and advanced textile structures must be engineered, ideally through simulation, and therefore diverse properties of the textiles, textile reinforcements and PMC materials must be available for predictive simulation. Knowing the detailed geometry of technical textiles is essential to predicting accurately the processing and performance properties of textiles and PMC parts. In turn, the geometry taken by a textile or a reinforcement textile is linked in an intricate manner to its constitutive behaviour. This thesis proposes, investigates and validates a general numerical tool for the integrated and comprehensive analysis of textile geometry and constitutive behaviour as required toward engineering applications featuring technical textiles and textile reinforcements. The tool shall be general with regards to the textiles modelled and the loading cases applied. Specifically, the work aims at fulfilling the following objectives: 1) developing and implementing dedicated simulation software for modelling textiles subjected to various load cases; 2) providing, through simulation, geometric descriptions for different textiles subjected to different load cases namely compaction, relaxation and shear; 3) predicting the constitutive behaviour of the textiles undergoing said load cases; 4) identifying parameters

  12. Fluid Intelligence and Discriminative Operant Learning of Reinforcement Contingencies in a Fixed Ratio 3 Schedule

    ERIC Educational Resources Information Center

    Lozano, J. H.; Hernandez, J. M.; Rubio, V. J.; Santacreu, J.

    2011-01-01

    Although intelligence has traditionally been identified as "the ability to learn" (Peterson, 1925), this relationship has been questioned in simple operant learning tasks (Spielberger, 1962). Nevertheless, recent pieces of research have demonstrated a strong and significant correlation between associative learning measures and intelligence…

  13. Argumentation Based Joint Learning: A Novel Ensemble Learning Approach

    PubMed Central

    Xu, Junyi; Yao, Li; Li, Le

    2015-01-01

    Recently, ensemble learning methods have been widely used to improve classification performance in machine learning. In this paper, we present a novel ensemble learning method: argumentation based multi-agent joint learning (AMAJL), which integrates ideas from multi-agent argumentation, ensemble learning, and association rule mining. In AMAJL, argumentation technology is introduced as an ensemble strategy to integrate multiple base classifiers and generate a high performance ensemble classifier. We design an argumentation framework named Arena as a communication platform for knowledge integration. Through argumentation based joint learning, high quality individual knowledge can be extracted, and thus a refined global knowledge base can be generated and used independently for classification. We perform numerous experiments on multiple public datasets using AMAJL and other benchmark methods. The results demonstrate that our method can effectively extract high quality knowledge for ensemble classifier and improve the performance of classification. PMID:25966359

  14. A game theory-reinforcement learning (GT-RL) method to develop optimal operation policies for multi-operator reservoir systems

    NASA Astrophysics Data System (ADS)

    Madani, Kaveh; Hooshyar, Milad

    2014-11-01

    Reservoir systems with multiple operators can benefit from coordination of operation policies. To maximize the total benefit of these systems the literature has normally used the social planner's approach. Based on this approach operation decisions are optimized using a multi-objective optimization model with a compound system's objective. While the utility of the system can be increased this way, fair allocation of benefits among the operators remains challenging for the social planner who has to assign controversial weights to the system's beneficiaries and their objectives. Cooperative game theory provides an alternative framework for fair and efficient allocation of the incremental benefits of cooperation. To determine the fair and efficient utility shares of the beneficiaries, cooperative game theory solution methods consider the gains of each party in the status quo (non-cooperation) as well as what can be gained through the grand coalition (social planner's solution or full cooperation) and partial coalitions. Nevertheless, estimation of the benefits of different coalitions can be challenging in complex multi-beneficiary systems. Reinforcement learning can be used to address this challenge and determine the gains of the beneficiaries for different levels of cooperation, i.e., non-cooperation, partial cooperation, and full cooperation, providing the essential input for allocation based on cooperative game theory. This paper develops a game theory-reinforcement learning (GT-RL) method for determining the optimal operation policies in multi-operator multi-reservoir systems with respect to fairness and efficiency criteria. As the first step to underline the utility of the GT-RL method in solving complex multi-agent multi-reservoir problems without a need for developing compound objectives and weight assignment, the proposed method is applied to a hypothetical three-agent three-reservoir system.

  15. Shape memory composites based on glass-fibre-reinforced poly(ethylene)-like polymers

    NASA Astrophysics Data System (ADS)

    Cuevas, J. M.; Rubio, R.; Laza, J. M.; Vilas, J. L.; Rodriguez, M.; León, L. M.

    2012-03-01

    The mechanical response of a series of semicrystalline shape memory polymers was considerably enhanced by incorporating short glass fibres without modifying the thermo-responsive actuation based on balanced crystallinity and elasticity. The effect of different fractions of inorganic reinforcement on thermo-mechanical properties was evaluated using different instrument techniques such as differential scanning calorimetry (DSC), thermogravimetry (TGA), dynamic mechanical thermal analysis (DMTA) and three-point flexural tests. Moreover, we studied the inorganic reinforcement influence on the shape memory actuation capabilities by thermo-mechanical bending cycle experiments. As demonstrated, the manufactured polymer composites showed excellent shape memory capacities, similar to neat active polymer matrices, but with outstanding improvements in static and recovering mechanical performance.

  16. Evolution of damage and plasticity in titanium-based, fiber-reinforced composites

    SciTech Connect

    Majumdar, B.S. ); Newaz, G.M. ); Ellis, J.R. . Fatigue and Failure Branch)

    1993-07-01

    The inelastic deformation mechanisms were evaluated for a model titanium-based, fiber-reinforced composite: a beta titanium alloy (Ti-15V-3Al-3Cr-3Sn) reinforced with SiC (SCS-6) fibers. The primary emphasis of this article is to illustrate the sequence in which damage and plasticity evolved for this system. The mechanical responses and the results of detailed microstructural evaluations for the [0][sub 8], [90][sub 8], and [[plus minus]45][sub 2s] laminates are provided. It is shown that the characteristics of the reaction zone around the fiber play a very important role in the way damage and plasticity evolve, particularly in the microyield regime of deformation, and must be included in any realistic constitutive model. Fiber-matrix debonding was a major damage mode for the off-axis systems. The tension test results are also compared with the predictions of a few constitutive models.

  17. Evolution of damage and plasticity in titanium-based, fiber-reinforced composites

    NASA Technical Reports Server (NTRS)

    Majumdar, B. S.; Newaz, G. M.; Ellis, J. R.

    1993-01-01

    The inelastic deformation mechanisms were evaluated for a model titanium-based, fiber-reinforced composite: a beta titanium alloy (Ti-15V-3Al-3Cr-3Sn) reinforced with SiC (SCS-6) fibers. The primary emphasis of this article is to illustrate the sequence in which damage and plasticity evolved for this system. The mechanical responses and the results of detailed microstructural evaluations for the 0(8), 90(8), and +/- 45(2s) line oriented laminates are provided. It is shown that the characteristics of the reaction zone around the fiber play a very important role in the way damage and plasticity evolve, particularly in the microyield regime of deformation, and must be included in any realistic constitutive model. Fiber-matrix debonding was a major damage mode for the off-axis systems. The tension test results are also compared with the predictions of a few constitutive models.

  18. Low Cost Fabrication of Silicon Carbide Based Ceramics and Fiber Reinforced Composites

    NASA Technical Reports Server (NTRS)

    Singh, M.; Levine, S. R.

    1995-01-01

    A low cost processing technique called reaction forming for the fabrication of near-net and complex shaped components of silicon carbide based ceramics and composites is presented. This process consists of the production of a microporous carbon preform and subsequent infiltration with liquid silicon or silicon-refractory metal alloys. The microporous preforms are made by the pyrolysis of a polymerized resin mixture with very good control of pore volume and pore size thereby yielding materials with tailorable microstructure and composition. Mechanical properties (elastic modulus, flexural strength, and fracture toughness) of reaction-formed silicon carbide ceramics are presented. This processing approach is suitable for various kinds of reinforcements such as whiskers, particulates, fibers (tows, weaves, and filaments), and 3-D architectures. This approach has also been used to fabricate continuous silicon carbide fiber reinforced ceramic composites (CFCC's) with silicon carbide based matrices. Strong and tough composites with tailorable matrix microstructure and composition have been obtained. Microstructure and thermomechanical properties of a silicon carbide (SCS-6) fiber reinforced reaction-formed silicon carbide matrix composites are discussed.

  19. Low cost fabrication of silicon carbide based ceramics and fiber reinforced composites

    SciTech Connect

    Singh, M.; Levine, S.R.

    1995-07-01

    A low cost processing technique called reaction forming for the fabrication of near-net and complex shaped components of silicon carbide based ceramics and composites is presented. This process consists of the production of a microporous carbon preform and subsequent infiltration with liquid silicon or silicon-refractory metal alloys. The microporous preforms are made by the pyrolysis of a polymerized resin mixture with very good control of pore volume and pore size thereby yielding materials with tailorable microstructure and composition. Mechanical properties (elastic modulus, flexural strength, and fracture toughness) of reaction-formed silicon carbide ceramics are presented. This processing approach is suitable for various kinds of reinforcements such as whiskers, particulates, fibers (tows, weaves, and filaments), and 3-D architectures. This approach has also been used to fabricate continuous silicon carbide fiber reinforced ceramic composites (CFCC`s) with silicon carbide based matrices. Strong and tough composites with tailorable matrix microstructure and composition have been obtained. Microstructure and thermomechanical properties of a silicon carbide (SCS-6) fiber reinforced reaction-formed silicon carbide matrix composites are discussed.

  20. Rats bred for helplessness exhibit positive reinforcement learning deficits which are not alleviated by an antidepressant dose of the MAO-B inhibitor deprenyl.

    PubMed

    Schulz, Daniela; Henn, Fritz A; Petri, David; Huston, Joseph P

    2016-08-01

    Principles of negative reinforcement learning may play a critical role in the etiology and treatment of depression. We examined the integrity of positive reinforcement learning in congenitally helpless (cH) rats, an animal model of depression, using a random ratio schedule and a devaluation-extinction procedure. Furthermore, we tested whether an antidepressant dose of the monoamine oxidase (MAO)-B inhibitor deprenyl would reverse any deficits in positive reinforcement learning. We found that cH rats (n=9) were impaired in the acquisition of even simple operant contingencies, such as a fixed interval (FI) 20 schedule. cH rats exhibited no apparent deficits in appetite or reward sensitivity. They reacted to the devaluation of food in a manner consistent with a dose-response relationship. Reinforcer motivation as assessed by lever pressing across sessions with progressively decreasing reward probabilities was highest in congenitally non-helpless (cNH, n=10) rats as long as the reward probabilities remained relatively high. cNH compared to wild-type (n=10) rats were also more resistant to extinction across sessions. Compared to saline (n=5), deprenyl (n=5) reduced the duration of immobility of cH rats in the forced swimming test, indicative of antidepressant effects, but did not restore any deficits in the acquisition of a FI 20 schedule. We conclude that positive reinforcement learning was impaired in rats bred for helplessness, possibly due to motivational impairments but not deficits in reward sensitivity, and that deprenyl exerted antidepressant effects but did not reverse the deficits in positive reinforcement learning. PMID:27163379

  1. Video Game Based Learning in English Grammar

    ERIC Educational Resources Information Center

    Singaravelu, G.

    2008-01-01

    The study enlightens the effectiveness of Video Game Based Learning in English Grammar at standard VI. A Video Game package was prepared and it consisted of self-learning activities in play way manner which attracted the minds of the young learners. Chief objective: Find out the effectiveness of Video-Game based learning in English grammar.…

  2. Project-Based Learning around the World

    ERIC Educational Resources Information Center

    Weatherby, Kristen

    2007-01-01

    This paper, the first of a two-part article, addresses ways that project-based learning is being used in countries around the world. It introduces Microsoft's worldwide K-12 education initiative, Partners in Learning, and provides some background as to why Microsoft is interested in developing project-based learning curricula for teachers to help…

  3. Local History and Problem-Based Learning

    ERIC Educational Resources Information Center

    Wieseman, Katherine C.; Cadwell, Doni

    2005-01-01

    The combination of students, local history, researching, and problem-based learning creates a powerful opportunity for learning to all involved. This article provides one example of how an elementary teacher and a teacher educator have used local resources and problem-based learning to teach a fourth grade unit about human communities and the…

  4. Global reinforcement training of CrossNets

    NASA Astrophysics Data System (ADS)

    Ma, Xiaolong

    2007-10-01

    Hybrid "CMOL" integrated circuits, incorporating advanced CMOS devices for neural cell bodies, nanowires as axons and dendrites, and latching switches as synapses, may be used for the hardware implementation of extremely dense (107 cells and 1012 synapses per cm2) neuromorphic networks, operating up to 10 6 times faster than their biological prototypes. We are exploring several "Cross- Net" architectures that accommodate the limitations imposed by CMOL hardware and should allow effective training of the networks without a direct external access to individual synapses. Our studies have show that CrossNets based on simple (two-terminal) crosspoint devices can work well in at least two modes: as Hop-field networks for associative memory and multilayer perceptrons for classification tasks. For more intelligent tasks (such as robot motion control or complex games), which do not have "examples" for supervised learning, more advanced training methods such as the global reinforcement learning are necessary. For application of global reinforcement training algorithms to CrossNets, we have extended Williams's REINFORCE learning principle to a more general framework and derived several learning rules that are more suitable for CrossNet hardware implementation. The results of numerical experiments have shown that these new learning rules can work well for both classification tasks and reinforcement tasks such as the cartpole balancing control problem. Some limitations imposed by the CMOL hardware need to be carefully addressed for the the successful application of in situ reinforcement training to CrossNets.

  5. Acquisition of Robotic Giant-swing Motion Using Reinforcement Learning and Its Consideration of Motion Forms

    NASA Astrophysics Data System (ADS)

    Sakai, Naoki; Kawabe, Naoto; Hara, Masayuki; Toyoda, Nozomi; Yabuta, Tetsuro

    This paper argues how a compact humanoid robot can acquire a giant-swing motion without any robotic models by using Q-Learning method. Generally, it is widely said that Q-Learning is not appropriated for learning dynamic motions because Markov property is not necessarily guaranteed during the dynamic task. However, we tried to solve this problem by embedding the angular velocity state into state definition and averaging Q-Learning method to reduce dynamic effects, although there remain non-Markov effects in the learning results. The result shows how the robot can acquire a giant-swing motion by using Q-Learning algorithm. The successful acquired motions are analyzed in the view point of dynamics in order to realize a functionally giant-swing motion. Finally, the result shows how this method can avoid the stagnant action loop at around the bottom of the horizontal bar during the early stage of giant-swing motion.

  6. Problem-based learning case writing by students based on early years clinical attachments: a focus group evaluation

    PubMed Central

    Idowu, Yewande; Easton, Graham

    2016-01-01

    Objectives To evaluate the perception of medical students of the new approach to problem-based learning which involves students writing their own problem-based learning cases based on their recent clinical attachment, and team assessment. Design Focus group interviews with students using purposive sampling. Transcripts of the audio recordings were analysed using thematic analysis. Setting Imperial College School of Medicine, London. Participants Medical students in the second year of the MBBS course, who attended the problem-based learning case writing session. Main outcome measures To elicit the students’ views about problem-based learning case writing and team assessment. Results The following broad themes emerged: effect of group dynamics on the process; importance of defining the tutor’s role; role of summative assessment; feedback as a learning tool and the skills developed during the process. Conclusions Overall the students found the new approach, writing problem-based learning cases based on patients seen during their clinical attachments, useful in helping them to gain a better understanding about the problem-based learning process, promoting creativity and reinforcing the importance of team work and peer assessment which are vital professional skills. Further tutor development and guidance for students about the new approach was found to be important in ensuring it is a good learning experience. We hope this evaluation will be of use to other institutions considering introducing students’ case writing to problem-based learning. PMID:26981255

  7. Bond mechanisms in fiber-reinforced cement-based composites. Final report, 1 July 1987-30 August 1989

    SciTech Connect

    Naaman, A.E.; Namur, G.; Najm, H.; Alwan, J.

    1989-08-01

    This report presents a comprehensive investigation of the mechanisms of bond in steel-fiber-reinforced-cement-based composites. Following a state-of-the-art review on bond in reinforced and prestressed concrete as well as fiber reinforced concrete, the results of an experimental and an analytical program are described. The experimental program focuses primarily on the behavior of fibers under pull-out conditions. Pull-out load versus end-slip behavior and bond shear stress versus slip relationship are studied extensively.

  8. A reinforcement learning trained fuzzy neural network controller for maintaining wireless communication connections in multi-robot systems

    NASA Astrophysics Data System (ADS)

    Zhong, Xu; Zhou, Yu

    2014-05-01

    This paper presents a decentralized multi-robot motion control strategy to facilitate a multi-robot system, comprised of collaborative mobile robots coordinated through wireless communications, to form and maintain desired wireless communication coverage in a realistic environment with unstable wireless signaling condition. A fuzzy neural network controller is proposed for each robot to maintain the wireless link quality with its neighbors. The controller is trained through reinforcement learning to establish the relationship between the wireless link quality and robot motion decision, via consecutive interactions between the controller and environment. The tuned fuzzy neural network controller is applied to a multi-robot deployment process to form and maintain desired wireless communication coverage. The effectiveness of the proposed control scheme is verified through simulations under different wireless signal propagation conditions.

  9. Evolution of cooperation in the snowdrift game among mobile players with random-pairing and reinforcement learning

    NASA Astrophysics Data System (ADS)

    Jia, Ning; Ma, Shoufeng

    2013-11-01

    The evolutionary spatial game in a mobile population has attracted many researchers of biological, social and economic sciences. Considering some facts observed in the real world, this paper proposes a novel spatial evolutionary snowdrift game model with movable players. In this model, one player interacts only with the nearest neighbor in each turn, and makes decision in a reinforcement learning way. In a very large range of the parameters moving ability enhances cooperation, but under some special condition, velocity heavily depresses cooperation. Some explanations have also been given out by investigating the strategy-change behavior of players. The findings may be helpful in understanding cooperative behavior in natural and social systems consisting of mobile agents.

  10. A new release device based on styrene-based SMP reinforced by carbon fiber

    NASA Astrophysics Data System (ADS)

    Wei, Hanqing; Guan, Chunyang; Du, Haiyang; Liu, Liwu; Leng, Jinsong

    2013-08-01

    Shape memory polymer composites (SMPC) release device can be fabricated to solve the disadvantages of traditional explosive release device, such as large weight, bad stability, and strong impact force and damage due to explosion. The release device is made up of two thin-walled tubes, the first one is responsible for the torsion, and the second is used to fit the first tube. The tubes are made from carbon fiber reinforced styrene-based shape memory polymer (SMP). Resistor heater is applied to heat the device and actuate the shape recovery process. This SMPC release device can connect the main device and the device which need released. When the instruction comes, it can separate the two devices immediately. Firstly, the first tube is heated by the resistor heater, then the twisting and stretching force is exited on the heating part of the tube, unloading after cooling, the two thin-walled tubes of release device is connected. Secondly, the twisted part of the first tube is heated, it twisted to the original angle, and then the stretched part drew back to the original shape after heating. So the working part pulled the claws of it out of the second tube automatically, and separated the release device to two parts, thus the release is completed. Optimal solutions are designed to achieve high driving efficiency. This paper has evaluated the strength and verified the feasibility of the SMPC release device, measured the tensile strength and the reverse effect, compared with the theoretical and experimental results. Finite element analysis is used to simulate the deformation.

  11. Neuromuscular control of the point to point and oscillatory movements of a sagittal arm with the actor-critic reinforcement learning method.

    PubMed

    Golkhou, Vahid; Parnianpour, Mohamad; Lucas, Caro

    2005-04-01

    In this study, we have used a single link system with a pair of muscles that are excited with alpha and gamma signals to achieve both point to point and oscillatory movements with variable amplitude and frequency.The system is highly nonlinear in all its physical and physiological attributes. The major physiological characteristics of this system are simultaneous activation of a pair of nonlinear muscle-like-actuators for control purposes, existence of nonlinear spindle-like sensors and Golgi tendon organ-like sensor, actions of gravity and external loading. Transmission delays are included in the afferent and efferent neural paths to account for a more accurate representation of the reflex loops.A reinforcement learning method with an actor-critic (AC) architecture instead of middle and low level of central nervous system (CNS), is used to track a desired trajectory. The actor in this structure is a two layer feedforward neural network and the critic is a model of the cerebellum. The critic is trained by state-action-reward-state-action (SARSA) method. The critic will train the actor by supervisory learning based on the prior experiences. Simulation studies of oscillatory movements based on the proposed algorithm demonstrate excellent tracking capability and after 280 epochs the RMS error for position and velocity profiles were 0.02, 0.04 rad and rad/s, respectively. PMID:16154874

  12. Empirical Evidence of Priming, Transfer, Reinforcement, and Learning in the Real and Virtual Trillium Trails

    ERIC Educational Resources Information Center

    Harrington, M. C. R.

    2011-01-01

    Over the past 20 years, there has been a debate on the effectiveness of virtual reality used for learning with young children, producing many ideas but little empirical proof. This empirical study compared learning activity in situ of a real environment (Real) and a desktop virtual reality (Virtual) environment, built with video game technology,…

  13. Reinforcing Comprehensive Business Learning through an Undergraduate Retailing Course: A Prospectus

    ERIC Educational Resources Information Center

    Ahmed, Irfan

    2009-01-01

    Undergraduate programs in business are expected to provide a comprehensive learning for their students in order to prepare them to be able to deal with complex business problems in their jobs. Business schools attempt to provide this learning through various curricular design strategies. This paper proposes the use of an undergraduate course in…

  14. Learning Outcomes of Project-Based and Inquiry-Based Learning Activities

    ERIC Educational Resources Information Center

    Panasan, Mookdaporn; Nuangchalerm, Prasart

    2010-01-01

    Problem statement: Organization of science learning activities is necessary to rely on various methods of organization of learning and to be appropriate to learners. Organization of project-based learning activities and inquiry-based learning activities are teaching methods which can help students understand scientific knowledge. It would be more…

  15. Conditioned reinforcement and information theory reconsidered.

    PubMed

    Shahan, Timothy A; Cunningham, Paul

    2015-03-01

    The idea that stimuli might function as conditioned reinforcers because of the information they convey about primary reinforcers has a long history in the study of learning. However, formal application of information theory to conditioned reinforcement has been largely abandoned in modern theorizing because of its failures with respect to observing behavior. In this paper we show how recent advances in the application of information theory to Pavlovian conditioning offer a novel approach to conditioned reinforcement. The critical feature of this approach is that calculations of information are based on reductions of uncertainty about expected time to primary reinforcement signaled by a conditioned reinforcer. Using this approach, we show that previous failures of information theory with observing behavior can be remedied, and that the resulting framework produces predictions similar to Delay Reduction Theory in both observing-response and concurrent-chains procedures. We suggest that the similarity of these predictions might offer an analytically grounded reason for why Delay Reduction Theory has been a successful theory of conditioned reinforcement. Finally, we suggest that the approach provides a formal basis for the assertion that conditioned reinforcement results from Pavlovian conditioning and may provide an integrative approach encompassing both domains. PMID:25766452

  16. Web-Based Learning Design Tool

    ERIC Educational Resources Information Center

    Bruno, F. B.; Silva, T. L. K.; Silva, R. P.; Teixeira, F. G.

    2012-01-01

    Purpose: The purpose of this paper is to propose a web-based tool that enables the development and provision of learning designs and its reuse and re-contextualization as generative learning objects, aimed at developing educational materials. Design/methodology/approach: The use of learning objects can facilitate the process of production and…

  17. Adaptive Learning for ESL Based on Computation

    ERIC Educational Resources Information Center

    Wang, Ya-huei; Liao, Hung-Chang

    2011-01-01

    In the conventional English as a Second Language (ESL) class-based learning environment, teachers use a fixed learning sequence and content for all students without considering the diverse needs of each individual. There is a great deal of diversity within and between classes. Hence, if students' learning outcomes are to be maximised, it is…

  18. Accountability for Project-Based Collaborative Learning

    ERIC Educational Resources Information Center

    Jamal, Abu-Hussain; Essawi, Mohammad; Tilchin, Oleg

    2014-01-01

    One perspective model for the creation of the learning environment and engendering students' thinking development is the Project-Based Collaborative Learning (PBCL) model. This model organizes learning by collaborative performance of various projects. In this paper we describe an approach to enhancing the PBCL model through the creation of…

  19. Proposing Community-Based Learning in the Marketing Curriculum

    ERIC Educational Resources Information Center

    Cadwallader, Susan; Atwong, Catherine; Lebard, Aubrey

    2013-01-01

    Community service and service learning (CS&SL) exposes students to the business practice of giving back to society while reinforcing classroom learning in an applied real-world setting. However, does the CS&SL format provide a better means of instilling the benefits of community service among marketing students than community-based…

  20. A Lamb waves based statistical approach to structural health monitoring of carbon fibre reinforced polymer composites.

    PubMed

    Carboni, Michele; Gianneo, Andrea; Giglio, Marco

    2015-07-01

    This research investigates a Lamb-wave based structural health monitoring approach matching an out-of-phase actuation of a pair of piezoceramic transducers at low frequency. The target is a typical quasi-isotropic carbon fibre reinforced polymer aeronautical laminate subjected to artificial, via Teflon patches, and natural, via suitable low velocity drop weight impact tests, delaminations. The performance and main influencing factors of such an approach are studied through a Design of Experiment statistical method, considering both Pulse Echo and Pitch Catch configurations of PZT sensors. Results show that some factors and their interactions can effectively influence the detection of a delamination-like damage. PMID:25746761

  1. Optical-Based Sensors for Monitoring Corrosion of Reinforcement Rebar via an Etched Cladding Bragg Grating

    PubMed Central

    Hassan, Muhammad Rosdi Abu; Bakar, Muhammad Hafiz Abu; Dambul, Katrina; Adikan, Faisal Rafiq Mahamd

    2012-01-01

    In this paper, we present the development and testing of an optical-based sensor for monitoring the corrosion of reinforcement rebar. The testing was carried out using an 80% etched-cladding Fibre Bragg grating sensor to monitor the production of corrosion waste in a localized region of the rebar. Progression of corrosion can be sensed by observing the reflected wavelength shift of the FBG sensor. With the presence of corrosion, the etched-FBG reflected spectrum was shifted by 1.0 nm. In addition, with an increase in fringe pattern and continuously, step-like drop in power of the Bragg reflected spectrum was also displayed. PMID:23202233

  2. Work-Based Learning for All! Work-Based Learning Development Handbook.

    ERIC Educational Resources Information Center

    Nachtrieb, Paula; Vore, Stacey

    This handbook provides information, strategies, and techniques for developing and implementing successful work-based learning experiences. It is divided into five sections, each with 1-5 chapters that cover specific topics in detail. Section I, "What is Work-Based Learning?," considers what work-based learning is; presents a work-based learning…

  3. Utah Work-Based Learning Manual.

    ERIC Educational Resources Information Center

    Utah State Office of Education, Salt Lake City.

    This document presents materials to assist Utah school personnel who are initiating, implementing, or improving work-based learning opportunities for students. The document presents detailed guidelines for creating and maintaining work-based learning systems in schools and resource materials for improving existing work-based opportunities. Formal…

  4. Adaptive learning based heartbeat classification.

    PubMed

    Srinivas, M; Basil, Tony; Mohan, C Krishna

    2015-01-01

    Cardiovascular diseases (CVD) are a leading cause of unnecessary hospital admissions as well as fatalities placing an immense burden on the healthcare industry. A process to provide timely intervention can reduce the morbidity rate as well as control rising costs. Patients with cardiovascular diseases require quick intervention. Towards that end, automated detection of abnormal heartbeats captured by electronic cardiogram (ECG) signals is vital. While cardiologists can identify different heartbeat morphologies quite accurately among different patients, the manual evaluation is tedious and time consuming. In this chapter, we propose new features from the time and frequency domains and furthermore, feature normalization techniques to reduce inter-patient and intra-patient variations in heartbeat cycles. Our results using the adaptive learning based classifier emulate those reported in existing literature and in most cases deliver improved performance, while eliminating the need for labeling of signals by domain experts. PMID:26484555

  5. Reinforcing effects of different fibers on denture base resin based on the fiber type, concentration, and combination.

    PubMed

    Yu, Sang-Hui; Lee, Yoon; Oh, Seunghan; Cho, Hye-Won; Oda, Yutaka; Bae, Ji-Myung

    2012-01-01

    The aim of this study was to evaluate the reinforcing effects of three types of fibers at various concentrations and in different combinations on flexural properties of denture base resin. Glass (GL), polyaromatic polyamide (PA) and ultra-high molecular weight polyethylene (PE) fibers were added to heat-polymerized denture base resin with volume concentrations of 2.6%, 5.3%, and 7.9%, respectively. In addition, hybrid fiber-reinforced composite (FRC) combined with either two or three types of fibers were fabricated. The flexural strength, modulus and toughness of each group were measured with a universal testing machine at a crosshead speed of 5 mm/min. In the single fiber-reinforced composite groups, the 5.3% GL and 7.9% GL had the highest flexural strength and modulus; 5.3% PE was had the highest toughness. Hybrid FRC such as GL/PE, which showed the highest toughness and the flexural strength, was considered to be useful in preventing denture fractures clinically. PMID:23207213

  6. Aberrant Salience Is Related to Reduced Reinforcement Learning Signals and Elevated Dopamine Synthesis Capacity in Healthy Adults.

    PubMed

    Boehme, Rebecca; Deserno, Lorenz; Gleich, Tobias; Katthagen, Teresa; Pankow, Anne; Behr, Joachim; Buchert, Ralph; Roiser, Jonathan P; Heinz, Andreas; Schlagenhauf, Florian

    2015-07-15

    The striatum is known to play a key role in reinforcement learning, specifically in the encoding of teaching signals such as reward prediction errors (RPEs). It has been proposed that aberrant salience attribution is associated with impaired coding of RPE and heightened dopamine turnover in the striatum, and might be linked to the development of psychotic symptoms. However, the relationship of aberrant salience attribution, RPE coding, and dopamine synthesis capacity has not been directly investigated. Here we assessed the association between a behavioral measure of aberrant salience attribution, the salience attribution test, to neural correlates of RPEs measured via functional magnetic resonance imaging while healthy participants (n = 58) performed an instrumental learning task. A subset of participants (n = 27) also underwent positron emission tomography with the radiotracer [(18)F]fluoro-l-DOPA to quantify striatal presynaptic dopamine synthesis capacity. Individual variability in aberrant salience measures related negatively to ventral striatal and prefrontal RPE signals and in an exploratory analysis was found to be positively associated with ventral striatal presynaptic dopamine levels. These data provide the first evidence for a specific link between the constructs of aberrant salience attribution, reduced RPE processing, and potentially increased presynaptic dopamine function. PMID:26180188

  7. H ∞ tracking control of completely unknown continuous-time systems via off-policy reinforcement learning.

    PubMed

    Modares, Hamidreza; Lewis, Frank L; Jiang, Zhong-Ping

    2015-10-01

    This paper deals with the design of an H ∞ tracking controller for nonlinear continuous-time systems with completely unknown dynamics. A general bounded L2 -gain tracking problem with a discounted performance function is introduced for the H ∞ tracking. A tracking Hamilton-Jacobi-Isaac (HJI) equation is then developed that gives a Nash equilibrium solution to the associated min-max optimization problem. A rigorous analysis of bounded L2 -gain and stability of the control solution obtained by solving the tracking HJI equation is provided. An upper-bound is found for the discount factor to assure local asymptotic stability of the tracking error dynamics. An off-policy reinforcement learning algorithm is used to learn the solution to the tracking HJI equation online without requiring any knowledge of the system dynamics. Convergence of the proposed algorithm to the solution to the tracking HJI equation is shown. Simulation examples are provided to verify the effectiveness of the proposed method. PMID:26111401

  8. Using a Search Engine-Based Mutually Reinforcing Approach to Assess the Semantic Relatedness of Biomedical Terms

    PubMed Central

    Hsu, Yi-Yu; Chen, Hung-Yu; Kao, Hung-Yu

    2013-01-01

    Background Determining the semantic relatedness of two biomedical terms is an important task for many text-mining applications in the biomedical field. Previous studies, such as those using ontology-based and corpus-based approaches, measured semantic relatedness by using information from the structure of biomedical literature, but these methods are limited by the small size of training resources. To increase the size of training datasets, the outputs of search engines have been used extensively to analyze the lexical patterns of biomedical terms. Methodology/Principal Findings In this work, we propose the Mutually Reinforcing Lexical Pattern Ranking (ReLPR) algorithm for learning and exploring the lexical patterns of synonym pairs in biomedical text. ReLPR employs lexical patterns and their pattern containers to assess the semantic relatedness of biomedical terms. By combining sentence structures and the linking activities between containers and lexical patterns, our algorithm can explore the correlation between two biomedical terms. Conclusions/Significance The average correlation coefficient of the ReLPR algorithm was 0.82 for various datasets. The results of the ReLPR algorithm were significantly superior to those of previous methods. PMID:24348899

  9. Stability of Portland cement-based binders reinforced with natural wollastonite micro-fibers

    SciTech Connect

    Low, N.M.P. . Dept. of Civil Engineering); Beaudoin, J.J. . Inst. for Research In Construction)

    1994-01-01

    The stability of Portland cement-based binders reinforced with natural wollastonite micro-fibers was investigated for hydration periods up to one year. The wollastonite micro-fibers imbedded in the hydrated cement paste were examined employing a scanning electron microscopy technique. Composite specimens were also periodically evaluated by flexural strength testing and microstructural characterization including mercury intrusion porosimetry, helium gas pycnometry, and isopropyl alcohol saturation measurement. The amount of Ca(OH)[sub 2] in the hydrated matrices was also determined by differential scanning calorimetry. Wollastonite micro-fibers imbedded in hydrated cement-silica fume matrices remained stable after prolonged hydration and exhibited no surface or bulk deterioration. The flexural strength and overall pore structure of the Portland cement-based binders reinforced with wollastonite micro-fibers also remained essentially unchanged and unaffected. Flexural toughness and the post peak deflection, however, were observed to decrease with hydration time. The amount of Ca(OH)[sub 2] in the hydrated matrices decreased slightly at advanced hydration times. The observed behavior is discussed.

  10. Circular Functions Based Comprehensive Analysis of Plastic Creep Deformations in the Fiber Reinforced Composites

    NASA Astrophysics Data System (ADS)

    Monfared, Vahid

    2016-06-01

    Analytically based model is presented for behavioral analysis of the plastic deformations in the reinforced materials using the circular (trigonometric) functions. The analytical method is proposed to predict creep behavior of the fibrous composites based on basic and constitutive equations under a tensile axial stress. New insight of the work is to predict some important behaviors of the creeping matrix. In the present model, the prediction of the behaviors is simpler than the available methods. Principal creep strain rate behaviors are very noteworthy for designing the fibrous composites in the creeping composites. Analysis of the mentioned parameter behavior in the reinforced materials is necessary to analyze failure, fracture, and fatigue studies in the creep of the short fiber composites. Shuttles, spaceships, turbine blades and discs, and nozzle guide vanes are commonly subjected to the creep effects. Also, predicting the creep behavior is significant to design the optoelectronic and photonic advanced composites with optical fibers. As a result, the uniform behavior with constant gradient is seen in the principal creep strain rate behavior, and also creep rupture may happen at the fiber end. Finally, good agreements are found through comparing the obtained analytical and FEM results.

  11. BisGMA-polyvinylpyrrolidone blend based nanocomposites reinforced with chitosan grafted f-multiwalled carbon nanotubes

    NASA Astrophysics Data System (ADS)

    Praharaj, A.; Behera, D.; Rath, P.; Bastia, T. K.; Rout, A. K.

    In this work, initially a non-destroyable surface grafting of acid functionalized multiwalled carbon nanotubes (f-MWCNTs) with biopolymer chitosan (CS) was carried out using glutaraldehyde as a cross-linking agent via the controlled covalent deposition method which was characterized by Fourier transform infrared spectroscopy (FTIR) and scanning electron microscopy (SEM). Then, BisGMA (bisphenol-A glycidyldimethacrylate)-polyvinylpyrrolidone (PVP) blend was prepared (50:50 wt%) by a simple sonication method. The CS grafted f-MWCNTs (CS/f-MWCNTs) were finally dispersed in BisGMA-PVP blend (BGP50) system in different compositions i.e. 0, 2, 5 and 7 wt% and pressed into molds for the fabrication of reinforced nanocomposites which were characterized by SEM. Nanocomposites reinforced with 2 wt% raw MWCNTs and acid f-MWCNTs were also fabricated and their properties were studied in detail. The results of comparative study report lower values of the investigated properties in nanocomposites with 2 wt% raw and f-MWCNTs than the one with 2 wt% CS/f-MWCNTs proving it to be a better reinforcing nanofiller. Further, the mechanical behavior of the nanocomposites with various CS/f-MWCNTs content showed a dramatic increase in Young's Modulus, tensile strength, impact strength and hardness along with improved dynamic mechanical, thermal and electrical properties at 5 wt% content of CS/f-MWCNTs. The addition of CS/f-MWCNTs also resulted in reduced corrosion and swelling properties. Thus, the fabricated nanocomposites with optimum nanofiller content could serve as low cost and light weight structural, thermal and electrical materials compatible in various corrosive and solvent based environments.

  12. Reinforcement of acrylic denture base resin by incorporation of various fibers.

    PubMed

    Chen, S Y; Liang, W M; Yen, P S

    2001-01-01

    This study was designed to evaluate improvements in the mechanical properties of acrylic resin following reinforcement with three types of fiber. Polyester fiber (PE), Kevlar fiber (KF), and glass fiber (GF) were cut into 2, 4, and 6 mm lengths and incorporated at concentrations of 1, 2, and 3% (w/w). The mixtures of resin and fiber were cured at 70 degrees C in a water bath for 13 h, then at 90 degrees C for 1 h, in 70 x 25 x 15 mm stone molds, which were enclosed by dental flasks. The cured resin blocks were cut to an appropriate size and tested for impact strength and bending strength following the methods of ASTM Specification No. 256 and ISO Specification No. 1567, respectively. Specimens used in the impact strength test were reused for the Knoop hardness test. The results showed that the impact strength tended to be enhanced with fiber length and concentration, particularly PE at 3% and 6 mm length, which was significantly stronger than other formulations. Bending strength did not change significantly with the various formulations when compared to a control without fiber. The assessment of Knoop hardness revealed a complex pattern for the various formulations. The Knoop hardness of 3%, 6 mm PE-reinforced resin was comparable to that of the other formulations except for the control without fiber, but for clinical usage this did not adversely affect the merit of acrylic denture base resin. It is concluded that, for improved strength the optimum formulation to reinforce acrylic resin is by incorporation of 3%, 6 mm length PE fibers. PMID:11241340

  13. Corrosion in Reinforced Concrete Panels: Wireless Monitoring and Wavelet-Based Analysis

    PubMed Central

    Qiao, Guofu; Sun, Guodong; Hong, Yi; Liu, Tiejun; Guan, Xinchun

    2014-01-01

    To realize the efficient data capture and accurate analysis of pitting corrosion of the reinforced concrete (RC) structures, we first design and implement a wireless sensor and network (WSN) to monitor the pitting corrosion of RC panels, and then, we propose a wavelet-based algorithm to analyze the corrosion state with the corrosion data collected by the wireless platform. We design a novel pitting corrosion-detecting mote and a communication protocol such that the monitoring platform can sample the electrochemical emission signals of corrosion process with a configured period, and send these signals to a central computer for the analysis. The proposed algorithm, based on the wavelet domain analysis, returns the energy distribution of the electrochemical emission data, from which close observation and understanding can be further achieved. We also conducted test-bed experiments based on RC panels. The results verify the feasibility and efficiency of the proposed WSN system and algorithms. PMID:24556673

  14. Supporting Instance-Based Learning in Discovery Learning Environments.

    ERIC Educational Resources Information Center

    Reimann, Peter

    Recent research has demonstrated that knowledge about specific instances may be of more relevance to reasoning than has previously been assumed. Students can rely on principles they have learned, or they can recall something similar previously experienced, and base the new prediction on it in an instance-based approach. Instance-based (or…

  15. Carbon nanotube reinforced aluminum based nanocomposite fabricated by thermal spray forming

    NASA Astrophysics Data System (ADS)

    Laha, Tapas

    The present research concentrates on the fabrication of bulk aluminum matrix nanocomposite structures with carbon nanotube reinforcement. The objective of the work was to fabricate and characterize multi-walled carbon nanotube (MWCNT) reinforced hypereutectic Al-Si (23 wt% Si, 2 wt% Ni, 1 wt% Cu, rest Al) nanocomposite bulk structure with nanocrystalline matrix through thermal spray forming techniques viz. plasma spray forming (PSF) and high velocity oxy-fuel (HVOF) spray forming. This is the first research study, which has shown that thermal spray forming can be successfully used to synthesize carbon nanotube reinforced nanocomposites. Microstructural characterization based on quantitative microscopy, scanning and transmission electron microscopy (SEM and TEM), energy dispersive spectroscopy (EDS), X-ray diffraction (XRD), Raman spectroscopy and X ray photoelectron spectroscopy (XPS) confirms (i) retention and macro/sub-macro level homogenous distribution of multiwalled carbon nanotubes in the Al-Si matrix and (ii) evolution of nanostructured grains in the matrix. Formation of ultrathin beta-SiC layer on MWCNT surface, due to chemical reaction of Si atoms diffusing from Al-Si alloy and C atoms from the outer walls of MWCNTs has been confirmed theoretically and experimentally. The presence of SiC layer at the interface improves the wettability and the interfacial adhesion between the MWCNT reinforcement and the Al-Si matrix. Sintering of the as-sprayed nanocomposites was carried out in an inert environment for further densification. As-sprayed PSF nanocomposite showed lower microhardness compared to HVOF, due to the higher porosity content and lower residual stress. The hardness of the nanocomposites increased with sintering time due to effective pore removal. Uniaxial tensile test on CNT-bulk nanocomposite was carried out, which is the first ever study of such nature. The tensile test results showed inconsistency in the data attributed to inhomogeneous

  16. Effects of Social Reinforcement, Locus of Control, and Cognitive Style on Concept Learning among Retarded Children.

    ERIC Educational Resources Information Center

    Panda, Kailas C.

    To examine the effects of locus of control (the extent to which an individual feels he has control over his own behavior) and cognitive style variables on learning deficits among mentally handicapped children, 80 mentally retarded boys (IQ 50 to 83, age 160 to 196 months) were administered a battery of tests. Analyses of student performance…

  17. Anatomy of a Decision: Striato-Orbitofrontal Interactions in Reinforcement Learning, Decision Making, and Reversal

    ERIC Educational Resources Information Center

    Frank, Michael J.; Claus, Eric D.

    2006-01-01

    The authors explore the division of labor between the basal ganglia-dopamine (BG-DA) system and the orbitofrontal cortex (OFC) in decision making. They show that a primitive neural network model of the BG-DA system slowly learns to make decisions on the basis of the relative probability of rewards but is not as sensitive to (a) recency or (b) the…

  18. Reinforcing Math Knowledge by Immersing Students in a Simulated Learning-By-Teaching Experience

    ERIC Educational Resources Information Center

    Lenat, Douglas B.; Durlach, Paula J.

    2014-01-01

    We often understand something only after we've had to teach or explain it to someone else. Learning-by-teaching (LBT) systems exploit this phenomenon by playing the role of "tutee." BELLA, our sixth-grade mathematics LBT systems, departs from other LTB systems in several ways: (1) It was built not from scratch but by very slightly…

  19. The Effects of Interspersal and Reinforcement on Math Fact Accuracy and Learning Rate

    ERIC Educational Resources Information Center

    Rumberger, Jessica L.

    2013-01-01

    Mathematics skill acquisition is a crucial component of education and ongoing research is needed to determine quality instructional techniques. A ubiquitous instructional question is how to manage time. This study investigated several flashcard presentation methods to determine the one that would provide the most learning in a set amount of time.…

  20. Engaging medical undergraduates in question making: a novel way to reinforcing learning in physiology.

    PubMed

    Mehta, Bharati; Bhandari, Bharti

    2016-09-01

    The monotony of conventional didactic lectures makes students less attentive toward learning, and they tend to memorize isolated facts without understanding, just for the sake of passing exams. Therefore, to promote a habit of gaining indepth knowledge of basic sciences in medical undergraduates along with honing of their communication and analytical skills, we introduced this more interactive way of learning. The present study was performed on 99 first-semester medical students. After conventional didactic lectures, students were asked to prepare small conceptual questions on the topic. They were divided into two teams, which were made to ask questions to each other. If a team failed to answer, the student who questioned was supposed to answer to the satisfaction of the other team's student. Data were then obtained by getting feedback from the students on a 10-item questionnaire, and statistical evaluation was done using MS Excel and SPSS. To draft questions, students went through the whole system comprehensively and made questions from every possible aspect of the topic. Some of the questions (30%) were of recall type, but most judged higher cognitive domains. Student feedback revealed that they were satisfied, motivated to read more, and were confident of applying this learning and communication skills in future clinical practice. Students also expressed their desire to implement this activity as a regular feature of the curriculum. The activity resulted in an increase in student perceptions of their knowledge on the topic as well as communicative and analytical skills. This may eventually lead to better learning. PMID:27503900

  1. Using Social Media to Reinforce Environmental Learning and Action-Taking for School Students

    ERIC Educational Resources Information Center

    Warner, Alan; Eames, Chris; Irving, Robyn

    2014-01-01

    Environmental experiences often engage learners and create an intention to act, which is then not followed through once the learner is removed from the environment. This study utilized an exploratory, interpretive framework with younger primary school classes to investigate if transfer of learning from field trip experiences "in" and…

  2. Roles of Approval Motivation and Generalized Expectancy for Reinforcement in Children's Conceptual Discrimination Learning

    ERIC Educational Resources Information Center

    Nyce, Peggy A.; And Others

    1977-01-01

    Forty-four third graders were given a two-choice conceptual discrimination learning task. The two major factors were (1) four treatment groups varying at the extremes on two personality measures, approval motivation and locus of control and (2) sex. (MS)

  3. Adding Interactivity to Web Based Distance Learning.

    ERIC Educational Resources Information Center

    Cafolla, Ralph; Knee, Richard

    Web Based Distance Learning (WBDL) is a form of distance learning based on providing instruction mainly on the World Wide Web. This paradigm has limitations, especially the lack of interactivity inherent in the Web. The purpose of this paper is to discuss some of the technologies the authors have used in their courses at Florida Atlantic…

  4. Adventure-Based Learning across Domains.

    ERIC Educational Resources Information Center

    Garside, Colleen

    With "adventure-based" learning, instructors present activities in a way that allows the group to develop its own abilities, with guidance from the instructor when appropriate. Adventure-based learning activities (which emphasize the importance of play) lend themselves to inclusion in the basic speech communication course, particularly when…

  5. Version Control in Project-Based Learning

    ERIC Educational Resources Information Center

    Milentijevic, Ivan; Ciric, Vladimir; Vojinovic, Oliver

    2008-01-01

    This paper deals with the development of a generalized model for version control systems application as a support in a range of project-based learning methods. The model is given as UML sequence diagram and described in detail. The proposed model encompasses a wide range of different project-based learning approaches by assigning a supervisory…

  6. Project-Based Learning for Sustainable Development

    ERIC Educational Resources Information Center

    Nation, Marcia L.

    2008-01-01

    Project-based learning is a pedagogy that involves students in applying and developing theories, skills, and techniques to solve real world problems. Three faculty members used project-based learning to involve graduate students in an interdisciplinary seminar on sustainable development in Appalachian Ohio, which was convened under the auspices of…

  7. Composite edible films based on hydroxypropyl methylcellulose reinforced with microcrystalline cellulose nanoparticles.

    PubMed

    Bilbao-Sáinz, Cristina; Avena-Bustillos, Roberto J; Wood, Delilah F; Williams, Tina G; McHugh, Tara H

    2010-03-24

    It has been stated that hydroxypropyl methyl cellulose (HPMC) based films have promising applications in the food industry because of their environmental appeal, low cost, flexibility and transparency. Nevertheless, their mechanical and moisture barrier properties should be improved. The aim of this work was to enhance these properties by reinforcing the films with microcrystalline cellulose (MCC) at the nano scale level. Three sizes of MCC nanoparticles were incorporated into HPMC edible films at different concentrations. Identical MCC nanoparticles were lipid coated (LC) prior to casting into HPMC/LC-MCC composite films. The films were examined for mechanical and moisture barrier properties verifying how the addition of cellulose nanoparticles affected the water affinities (water adsorption/desorption isotherms) and the diffusion coefficients. The expected reinforcing effect of the MCC was observed: HPMC/MCC and HPMC/LC-MCC films showed up to 53% and 48% increase, respectively, in tensile strength values in comparison with unfilled HPMC films. Furthermore, addition of unmodified MCC nanoparticles reduced the moisture permeability up to 40% and use of LC-MCC reduced this value up to 50%. Water vapor permeability was mainly influenced by the differences in water solubility of different composite films since, in spite of the increase in water diffusivity values with the incorporation of MCC to HPMC films, better moisture barrier properties were achieved for HPMC/MCC and HPMC/LC-MCC composite films than for HPMC films. PMID:20187652

  8. Tannin-based flax fibre reinforced composites for structural applications in vehicles

    NASA Astrophysics Data System (ADS)

    Zhu, J.; Abhyankar, H.; Nassiopoulos, E.; Njuguna, J.

    2012-09-01

    Innovation is often driven by changes in government policies regulating the industries, especially true in case of the automotive. Except weight savings, the strict EU regulation of 95% recyclable material-made vehicles drives the manufactures and scientists to seek new 'green materials' for structural applications. With handing at two major drawbacks (production cost and safety), ECHOSHELL is supported by EU to develop and optimise structural solutions for superlight electric vehicles by using bio-composites made of high-performance natural fibres and resins, providing enhanced strength and bio-degradability characteristics. Flax reinforced tannin-based composite is selected as one of the candidates and were firstly investigated with different fabric lay-up angles (non-woven flax mat, UD, [0, 90°]4 and [0, +45°, 90°, -45°]2) through authors' work. Some of the obtained results, such as tensile properties and SEM micrographs were shown in this conference paper. The UD flax reinforced composite exhibits the best tensile performance, with tensile strength and modulus of 150 MPa and 9.6 MPa, respectively. It was observed that during tension the oriented-fabric composites showed some delamination process, which are expected to be eliminated through surface treatment (alkali treatment etc.) and nanotechnology, such as the use of nano-fibrils. Failure mechanism of the tested samples were identified through SEM results, indicating that the combination of fibre pull-out, fibre breakage and brittle resins failure mainly contribute to the fracture failure of composites.

  9. Modeling choice and reaction time during arbitrary visuomotor learning through the coordination of adaptive working memory and reinforcement learning.

    PubMed

    Viejo, Guillaume; Khamassi, Mehdi; Brovelli, Andrea; Girard, Benoît

    2015-01-01

    Current learning theory provides a comprehensive description of how humans and other animals learn, and places behavioral flexibility and automaticity at heart of adaptive behaviors. However, the computations supporting the interactions between goal-directed and habitual decision-making systems are still poorly understood. Previous functional magnetic resonance imaging (fMRI) results suggest that the brain hosts complementary computations that may differentially support goal-directed and habitual processes in the form of a dynamical interplay rather than a serial recruitment of strategies. To better elucidate the computations underlying flexible behavior, we develop a dual-system computational model that can predict both performance (i.e., participants' choices) and modulations in reaction times during learning of a stimulus-response association task. The habitual system is modeled with a simple Q-Learning algorithm (QL). For the goal-directed system, we propose a new Bayesian Working Memory (BWM) model that searches for information in the history of previous trials in order to minimize Shannon entropy. We propose a model for QL and BWM coordination such that the expensive memory manipulation is under control of, among others, the level of convergence of the habitual learning. We test the ability of QL or BWM alone to explain human behavior, and compare them with the performance of model combinations, to highlight the need for such combinations to explain behavior. Two of the tested combination models are derived from the literature, and the latter being our new proposal. In conclusion, all subjects were better explained by model combinations, and the majority of them are explained by our new coordination proposal. PMID:26379518

  10. Modeling choice and reaction time during arbitrary visuomotor learning through the coordination of adaptive working memory and reinforcement learning

    PubMed Central

    Viejo, Guillaume; Khamassi, Mehdi; Brovelli, Andrea; Girard, Benoît

    2015-01-01

    Current learning theory provides a comprehensive description of how humans and other animals learn, and places behavioral flexibility and automaticity at heart of adaptive behaviors. However, the computations supporting the interactions between goal-directed and habitual decision-making systems are still poorly understood. Previous functional magnetic resonance imaging (fMRI) results suggest that the brain hosts complementary computations that may differentially support goal-directed and habitual processes in the form of a dynamical interplay rather than a serial recruitment of strategies. To better elucidate the computations underlying flexible behavior, we develop a dual-system computational model that can predict both performance (i.e., participants' choices) and modulations in reaction times during learning of a stimulus–response association task. The habitual system is modeled with a simple Q-Learning algorithm (QL). For the goal-directed system, we propose a new Bayesian Working Memory (BWM) model that searches for information in the history of previous trials in order to minimize Shannon entropy. We propose a model for QL and BWM coordination such that the expensive memory manipulation is under control of, among others, the level of convergence of the habitual learning. We test the ability of QL or BWM alone to explain human behavior, and compare them with the performance of model combinations, to highlight the need for such combinations to explain behavior. Two of the tested combination models are derived from the literature, and the latter being our new proposal. In conclusion, all subjects were better explained by model combinations, and the majority of them are explained by our new coordination proposal. PMID:26379518

  11. CNTRICS Imaging Biomarkers Final Task Selection: Long-Term Memory and Reinforcement Learning

    PubMed Central

    Ragland, John D.; Cohen, Neal J.; Cools, Roshan; Frank, Michael J.; Hannula, Deborah E.; Ranganath, Charan

    2012-01-01

    Functional imaging paradigms hold great promise as biomarkers for schizophrenia research as they can detect altered neural activity associated with the cognitive and emotional processing deficits that are so disabling to this patient population. In an attempt to identify the most promising functional imaging biomarkers for research on long-term memory (LTM), the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS) initiative selected “item encoding and retrieval,” “relational encoding and retrieval,” and “reinforcement learning” as key LTM constructs to guide the nomination process. This manuscript reports on the outcome of the third CNTRICS biomarkers meeting in which nominated paradigms in each of these domains were discussed by a review panel to arrive at a consensus on which of the nominated paradigms could be recommended for immediate translational development. After briefly describing this decision process, information is presented from the nominating authors describing the 4 functional imaging paradigms that were selected for immediate development. In addition to describing the tasks, information is provided on cognitive and neural construct validity, sensitivity to behavioral or pharmacological manipulations, availability of animal models, psychometric characteristics, effects of schizophrenia, and avenues for future development. PMID:22102094

  12. Web-Based Learning Support System

    NASA Astrophysics Data System (ADS)

    Fan, Lisa

    Web-based learning support system offers many benefits over traditional learning environments and has become very popular. The Web is a powerful environment for distributing information and delivering knowledge to an increasingly wide and diverse audience. Typical Web-based learning environments, such as Web-CT, Blackboard, include course content delivery tools, quiz modules, grade reporting systems, assignment submission components, etc. They are powerful integrated learning management systems (LMS) that support a number of activities performed by teachers and students during the learning process [1]. However, students who study a course on the Internet tend to be more heterogeneously distributed than those found in a traditional classroom situation. In order to achieve optimal efficiency in a learning process, an individual learner needs his or her own personalized assistance. For a web-based open and dynamic learning environment, personalized support for learners becomes more important. This chapter demonstrates how to realize personalized learning support in dynamic and heterogeneous learning environments by utilizing Adaptive Web technologies. It focuses on course personalization in terms of contents and teaching materials that is according to each student's needs and capabilities. An example of using Rough Set to analyze student personal information to assist students with effective learning and predict student performance is presented.

  13. Damage evaluation of reinforced concrete frame based on a combined fiber beam model

    NASA Astrophysics Data System (ADS)

    Shang, Bing; Liu, ZhanLi; Zhuang, Zhuo

    2014-04-01

    In order to analyze and simulate the impact collapse or seismic response of the reinforced concrete (RC) structures, a combined fiber beam model is proposed by dividing the cross section of RC beam into concrete fiber and steel fiber. The stress-strain relationship of concrete fiber is based on a model proposed by concrete codes for concrete structures. The stress-strain behavior of steel fiber is based on a model suggested by others. These constitutive models are implemented into a general finite element program ABAQUS through the user defined subroutines to provide effective computational tools for the inelastic analysis of RC frame structures. The fiber model proposed in this paper is validated by comparing with experiment data of the RC column under cyclical lateral loading. The damage evolution of a three-dimension frame subjected to impact loading is also investigated.

  14. A self-sensing fiber reinforced polymer composite using mechanophore-based smart polymer

    NASA Astrophysics Data System (ADS)

    Zou, Jin; Liu, Yingtao; Chattopadhyay, Aditi; Dai, Lenore

    2015-04-01

    Polymer matrix composites (PMCs) are ubiquitous in engineering applications due to their superior mechanical properties at low weight. However, they are susceptible to damage due to their low interlaminar mechanical properties and poor heat and charge transport in the transverse direction to the laminate. Moreover, methods to inspect and ensure the reliability of composites are expensive and labor intensive. Recently, mechanophore-based smart polymer has attracted significant attention, especially for self-sensing of matrix damage in PMCs. A cyclobutane-based self-sensing approach using 1,1,1-tris (cinnamoyloxymethyl) ethane (TCE) and poly (vinyl cinnamate) (PVCi) has been studied in this paper. The self-sensing function was investigated at both the polymer level and composite laminate level. Fluorescence emissions were observed on PMC specimens subjected to low cycle fatigue load, indicating the presence of matrix cracks. Results are presented for graphite fiber reinforced composites.

  15. Investigation on the tensile behavior of fiber metal laminates based on self-reinforced polypropylene

    NASA Astrophysics Data System (ADS)

    Lee, Byoung-Eon; Park, Tom; Kim, Jeong; Kang, Beom-Soo; Song, Woo-Jin

    2013-12-01

    Mechanical tests have been carried out to accurately evaluate the tensile properties of fiber metal laminates (FMLs). The FMLs in this paper comprised of a layer of self-reinforced polypropylene (SRPP) sandwiched between two layers of aluminum alloy 5052-H34. In this study, nonlinear tensile and fracture behavior of FMLs under the in-plane loading conditions has been investigated with numerical simulations and theoretical analysis. The numerical simulation based on finite element modeling using the ABAQUS/Explicit and the theoretical constitutive model based on a volume fraction approach and a modified classical lamination theory, which incorporates the elastic-plastic behavior of the aluminum alloy are used to predict the mechanical properties such as stress-strain response and deformation behavior of FMLs. In addition, through comparing the numerical simulations and the theoretical analysis with experimental results, it was concluded that a numerical simulation model adopted describes with sufficient accuracy the overall tensile stress-strain curve.

  16. Stochastic Reinforcement Benefits Skill Acquisition

    ERIC Educational Resources Information Center

    Dayan, Eran; Averbeck, Bruno B.; Richmond, Barry J.; Cohen, Leonardo G.

    2014-01-01

    Learning complex skills is driven by reinforcement, which facilitates both online within-session gains and retention of the acquired skills. Yet, in ecologically relevant situations, skills are often acquired when mapping between actions and rewarding outcomes is unknown to the learning agent, resulting in reinforcement schedules of a stochastic…

  17. Predicting psychosis across diagnostic boundaries: Behavioral and computational modeling evidence for impaired reinforcement learning in schizophrenia and bipolar disorder with a history of psychosis.

    PubMed

    Strauss, Gregory P; Thaler, Nicholas S; Matveeva, Tatyana M; Vogel, Sally J; Sutton, Griffin P; Lee, Bern G; Allen, Daniel N

    2015-08-01

    There is increasing evidence that schizophrenia (SZ) and bipolar disorder (BD) share a number of cognitive, neurobiological, and genetic markers. Shared features may be most prevalent among SZ and BD with a history of psychosis. This study extended this literature by examining reinforcement learning (RL) performance in individuals with SZ (n = 29), BD with a history of psychosis (BD+; n = 24), BD without a history of psychosis (BD-; n = 23), and healthy controls (HC; n = 24). RL was assessed through a probabilistic stimulus selection task with acquisition and test phases. Computational modeling evaluated competing accounts of the data. Each participant's trial-by-trial decision-making behavior was fit to 3 computational models of RL: (a) a standard actor-critic model simulating pure basal ganglia-dependent learning, (b) a pure Q-learning model simulating action selection as a function of learned expected reward value, and (c) a hybrid model where an actor-critic is "augmented" by a Q-learning component, meant to capture the top-down influence of orbitofrontal cortex value representations on the striatum. The SZ group demonstrated greater reinforcement learning impairments at acquisition and test phases than the BD+, BD-, and HC groups. The BD+ and BD- groups displayed comparable performance at acquisition and test phases. Collapsing across diagnostic categories, greater severity of current psychosis was associated with poorer acquisition of the most rewarding stimuli as well as poor go/no-go learning at test. Model fits revealed that reinforcement learning in SZ was best characterized by a pure actor-critic model where learning is driven by prediction error signaling alone. In contrast, BD-, BD+, and HC were best fit by a hybrid model where prediction errors are influenced by top-down expected value representations that guide decision making. These findings suggest that abnormalities in the reward system are more prominent in SZ than BD; however, current psychotic

  18. A model for discriminating reinforcers in time and space.

    PubMed

    Cowie, Sarah; Davison, Michael; Elliffe, Douglas

    2016-06-01

    Both the response-reinforcer and stimulus-reinforcer relation are important in discrimination learning; differential responding requires a minimum of two discriminably-different stimuli and two discriminably-different associated contingencies of reinforcement. When elapsed time is a discriminative stimulus for the likely availability of a reinforcer, choice over time may be modeled by an extension of the Davison and Nevin (1999) model that assumes that local choice strictly matches the effective local reinforcer ratio. The effective local reinforcer ratio may differ from the obtained local reinforcer ratio for two reasons: Because the animal inaccurately estimates times associated with obtained reinforcers, and thus incorrectly discriminates the stimulus-reinforcer relation across time; and because of error in discriminating the response-reinforcer relation. In choice-based timing tasks, the two responses are usually highly discriminable, and so the larger contributor to differences between the effective and obtained reinforcer ratio is error in discriminating the stimulus-reinforcer relation. Such error may be modeled either by redistributing the numbers of reinforcers obtained at each time across surrounding times, or by redistributing the ratio of reinforcers obtained at each time in the same way. We assessed the extent to which these two approaches to modeling discrimination of the stimulus-reinforcer relation could account for choice in a range of temporal-discrimination procedures. The version of the model that redistributed numbers of reinforcers accounted for more variance in the data. Further, this version provides an explanation for shifts in the point of subjective equality that occur as a result of changes in the local reinforcer rate. The inclusion of a parameter reflecting error in discriminating the response-reinforcer relation enhanced the ability of each version of the model to describe data. The ability of this class of model to account for a

  19. Team-based Learning in Pharmacotherapeutics

    PubMed Central

    2011-01-01

    Objective. To compare student examination performance in pharmacotherapeutics before and after implementation of team-based learning. Design. After the traditional lecture and workshop method for teaching pharmacotherapeutics was replaced with team-based learning in January 2009, students were expected to come to class having read assigned chapters in order to successfully complete an individual quiz, a group quiz, and group application exercises. Assessment. Student learning was assessed using performance on individual quizzes, group quizzes, and the examination at the end of the psychiatry module. Students performed as well on the examination at the end of the module as they did prior to team-based learning implementation. Conclusion. Substituting team-based learning for traditional lecture ensured that students prepared for class and increased student participation in class discussions. PMID:21969722

  20. Learning-Based Compressive Subsampling

    NASA Astrophysics Data System (ADS)

    Baldassarre, Luca; Li, Yen-Huan; Scarlett, Jonathan; Gozcu, Baran; Bogunovic, Ilija; Cevher, Volkan

    2016-06-01

    The problem of recovering a structured signal $\\mathbf{x} \\in \\mathbb{C}^p$ from a set of dimensionality-reduced linear measurements $\\mathbf{b} = \\mathbf {A}\\mathbf {x}$ arises in a variety of applications, such as medical imaging, spectroscopy, Fourier optics, and computerized tomography. Due to computational and storage complexity or physical constraints imposed by the problem, the measurement matrix $\\mathbf{A} \\in \\mathbb{C}^{n \\times p}$ is often of the form $\\mathbf{A} = \\mathbf{P}_{\\Omega}\\boldsymbol{\\Psi}$ for some orthonormal basis matrix $\\boldsymbol{\\Psi}\\in \\mathbb{C}^{p \\times p}$ and subsampling operator $\\mathbf{P}_{\\Omega}: \\mathbb{C}^{p} \\rightarrow \\mathbb{C}^{n}$ that selects the rows indexed by $\\Omega$. This raises the fundamental question of how best to choose the index set $\\Omega$ in order to optimize the recovery performance. Previous approaches to addressing this question rely on non-uniform \\emph{random} subsampling using application-specific knowledge of the structure of $\\mathbf{x}$. In this paper, we instead take a principled learning-based approach in which a \\emph{fixed} index set is chosen based on a set of training signals $\\mathbf{x}_1,\\dotsc,\\mathbf{x}_m$. We formulate combinatorial optimization problems seeking to maximize the energy captured in these signals in an average-case or worst-case sense, and we show that these can be efficiently solved either exactly or approximately via the identification of modularity and submodularity structures. We provide both deterministic and statistical theoretical guarantees showing how the resulting measurement matrices perform on signals differing from the training signals, and we provide numerical examples showing our approach to be effective on a variety of data sets.

  1. Comparison of Example-Based Learning and Problem-Based Learning in Engineering Domain

    ERIC Educational Resources Information Center

    Sern, Lai Chee; Salleh, Kahirol Mohd; Sulaiman, Nor lisa; Mohamad, Mimi Mohaffyza; Yunos, Jailani Md

    2015-01-01

    The research was conducted to compare the impacts of problem-based learning (PBL) and example-based learning (EBL) on the learning performance in an engineering domain. The research was implemented by means of experimental design. Specifically, a two-group experiment with a pre- and post-test design was used in this research. A total of 37…

  2. Investigation on corrosion and wear behaviors of nanoparticles reinforced Ni-based composite alloying layer

    NASA Astrophysics Data System (ADS)

    Xu, Jiang; Tao, Jie; Jiang, Shuyun; Xu, Zhong

    2008-04-01

    In order to investigate the role of amorphous SiO 2 particles in corrosion and wear resistance of Ni-based metal matrix composite alloying layer, the amorphous nano-SiO 2 particles reinforced Ni-based composite alloying layer has been prepared by double glow plasma alloying on AISI 316L stainless steel surface, where Ni/amorphous nano-SiO 2 was firstly predeposited by brush plating. The composition and microstructure of the nano-SiO 2 particles reinforced Ni-based composite alloying layer were analyzed by using SEM, TEM and XRD. The results indicated that the composite alloying layer consisted of γ-phase and amorphous nano-SiO 2 particles, and under alloying temperature (1000 °C) condition, the nano-SiO 2 particles were uniformly distributed in the alloying layer and still kept the amorphous structure. The corrosion resistance of composite alloying layer was investigated by an electrochemical method in 3.5%NaCl solution. Compared with single alloying layer, the amorphous nano-SiO 2 particles slightly decreased the corrosion resistance of the Ni-Cr-Mo-Cu alloying layer. X-ray photoelectron spectroscopy (XPS) revealed that the passive films formed on the composite alloying consisted of Cr 2O 3, MoO 3, SiO 2 and metallic Ni and Mo. The dry wear test results showed that the composite alloying layer had excellent friction-reduced property, and the wear weight loss of composite alloying layer was less than 60% of that of Ni-Cr-Mo-Cu alloying layer.

  3. Ceramics reinforced metal base composite coatings produced by CO II laser cladding

    NASA Astrophysics Data System (ADS)

    Yang, Xichen; Wang, Yu; Yang, Nan

    2008-03-01

    Due to the excellent performance in high strength, anti-temperature and anti-wear, ceramics reinforced metal base composite material was used in some important fields of aircraft, aerospace, automobile and defense. The traditional bulk metal base composite materials are the expensive cost, which is limited in its industrial application. Development of laser coating of ceramics reinforced metal base composite is very interesting in economy. This paper is focused on three laser cladding ceramics coatings of SiC particle /Al matrix , Al IIO 3 powder/ Al matrix and WC + Co/mild steel matrix. Powder particle sizes are of 10-60μm. Chemical contents of aluminum matrix are of 3.8-4.0% Cu, 1.2-1.8% Mg, 0.3-0.99% Mn and balance Al. 5KW CO II laser, 5 axes CNC table, JKF-6 type powder feeder and co-axis feeder nozzle are used in laser cladding. Microstructure and performance of laser composite coatings have been respectively examined with OM,SEM and X-ray diffraction. Its results are as follows : Microstructures of 3C-,6H- and 5H- SiC particles + Al + Al 4SiC 4 + Si in SiC/Al composite, hexagonal α-Al IIO 3 + cubic γ-Al IIO 3 + f.c.c Al in Al IIO 3 powder/ Al composite and original WC particles + separated WC particles + eutectic WC + γ-Co solid solution + W IIC particles in WC + Co/steel coatings are respectively recognized. New microstructures of 5H-SiC in SiC/Al composite, cubic γ-Al IIO 3 in Al IIO 3 composite and W IIC in WC + Co/ steel composite by laser cladding have been respectively observed.

  4. Statistical Mechanics of the Delayed Reward-Based Learning with Node Perturbation

    NASA Astrophysics Data System (ADS)

    Hiroshi Saito,; Kentaro Katahira,; Kazuo Okanoya,; Masato Okada,

    2010-06-01

    In reward-based learning, reward is typically given with some delay after a behavior that causes the reward. In machine learning literature, the framework of the eligibility trace has been used as one of the solutions to handle the delayed reward in reinforcement learning. In recent studies, the eligibility trace is implied to be important for difficult neuroscience problem known as the “distal reward problem”. Node perturbation is one of the stochastic gradient methods from among many kinds of reinforcement learning implementations, and it searches the approximate gradient by introducing perturbation to a network. Since the stochastic gradient method does not require a objective function differential, it is expected to be able to account for the learning mechanism of a complex system, like a brain. We study the node perturbation with the eligibility trace as a specific example of delayed reward-based learning, and analyzed it using a statistical mechanics approach. As a result, we show the optimal time constant of the eligibility trace respect to the reward delay and the existence of unlearnable parameter configurations.

  5. Theoretical assumptions of Maffesoli's sensitivity and Problem-Based Learning in Nursing Education1

    PubMed Central

    Rodríguez-Borrego, María-Aurora; Nitschke, Rosane Gonçalves; do Prado, Marta Lenise; Martini, Jussara Gue; Guerra-Martín, María-Dolores; González-Galán, Carmen

    2014-01-01

    Objective understand the everyday and the imaginary of Nursing students in their knowledge socialization process through the Problem-Based Learning (PBL) strategy. Method Action Research, involving 86 students from the second year of an undergraduate Nursing program in Spain. A Critical Incident Questionnaire and Group interview were used. Thematic/categorical analysis, triangulation of researchers, subjects and techniques. Results the students signal the need to have a view from within, reinforcing the criticism against the schematic dualism; PBL allows one to learn how to be with the other, with his mechanical and organic solidarity; the feeling together, with its emphasis on learning to work in group and wanting to be close to the person taking care. Conclusions The great contradictions the protagonists of the process, that is, the students experience seem to express that group learning is not a form of gaining knowledge, as it makes them lose time to study. The daily, the execution time and the imaginary of how learning should be do not seem to have an intersection point in the use of Problem-Based Learning. The importance of focusing on the daily and the imaginary should be reinforced when we consider nursing education. PMID:25029064

  6. Language-Based Learning Disabilities

    MedlinePlus

    ... the details of a story's plot or a classroom lecture Reading and comprehending material Learning words to ... stories to the child? Observe the child during classroom activities. Evaluate the child's ability to understand verbal ...

  7. An introduction to stochastic control theory, path integrals and reinforcement learning

    NASA Astrophysics Data System (ADS)

    Kappen, Hilbert J.

    2007-02-01

    Control theory is a mathematical description of how to act optimally to gain future rewards. In this paper I give an introduction to deterministic and stochastic control theory and I give an overview of the possible application of control theory to the modeling of animal behavior and learning. I discuss a class of non-linear stochastic control problems that can be efficiently solved using a path integral or by MC sampling. In this control formalism the central concept of cost-to-go becomes a free energy and methods and concepts from statistical physics can be readily applied.

  8. Manufacturing and Process-based Property Analysis of Textile-Reinforced Thermoplastic Spacer Composites

    NASA Astrophysics Data System (ADS)

    Hufenbach, Werner; Adam, Frank; Füßel, René; Krahl, Michael; Weck, Daniel

    2012-12-01

    Novel woven spacer fabrics based on hybrid yarns are suitable for an efficient fabrication of three-dimensional composite structures in high volume production. In this paper, an innovative manufacturing process with short cycle times and high automatisation is introduced for textile-reinforced thermoplastic spacer structures suited for bending load cases. The different process steps hybrid yarn fabrication, weaving technology for three-dimensional textile preforms and consolidation with unique kinematics and hot pressing technology are described in detail. The bending properties of the manufactured spacer structures are evaluated by means of experiments as well as finite element simulations. Numerical parametric studies are performed in order to validate the influence of manufacturing tolerances on the bending stiffness of the spacer structures.

  9. Physicochemical characterization of three fiber-reinforced epoxide-based composites for dental applications.

    PubMed

    Bonon, Anderson J; Weck, Marcus; Bonfante, Estevam A; Coelho, Paulo G

    2016-12-01

    Fiber-reinforced composite (FRC) biomedical materials are in contact with living tissues arising biocompatibility questions regarding their chemical composition. The hazards of materials such as Bisphenol A (BPA), phthalate and other monomers and composites present in FRC have been rationalized due to its potential toxicity since its detection in food, blood, and saliva. This study characterized the physicochemical properties and degradation profiles of three different epoxide-based materials intended for restorative dental applications. Characterization was accomplished by several methods including FTIR, Raman, Brunauer-Emmett-Teller (BET) Analysis, X-ray fluorescence spectroscopy, and degradation experiments. Physicochemical characterization revealed that although materials presented similar chemical composition, variations between them were more largely accounted by the different phase distribution than chemical composition. PMID:27612785

  10. Woven glass fabric reinforced laminates based on polyolefin wastes: Thermal, mechanical and dynamic-mechanical properties

    NASA Astrophysics Data System (ADS)

    Russo, Pietro; Acierno, Domenico; Simeoli, Giorgio; Lopresto, Valentina

    2014-05-01

    Potentialities of polyolefin wastes in place of virgin polypropylene to produce composite laminates have been investigated. Plaques reinforced with a woven glass fabric were prepared by film-stacking technique and systematically analyzed in terms of thermal, mechanical and dynamic-mechanical properties. In case of PP matrices, the use of a typical compatibilizer to improve the adhesion at the interface has been considered. Thermal properties emphasized the chemical nature of plastic wastes. About mechanical properties, static tests showed an increase of flexural parameters for compatibilized systems due to the coupling effect between grafted maleic anhydride and silane groups on the surface of the glass fabric. These effects, maximized for composites based on car bumper wastes, is perfectly reflected in terms of storage modulus and damping ability of products as determined by single-cantilever bending dynamic tests.

  11. A simplified numerical simulation method of bending properties for glass fiber cloth reinforced denture base resin.

    PubMed

    Tanimoto, Yasuhiro; Nishiwaki, Tsuyoshi; Nishiyama, Norihiro; Nemoto, Kimiya; Maekawa, Zen-ichiro

    2002-06-01

    The purpose of this study was to propose a new numerical modeling of the glass fiber cloth reinforced denture base resin (GFRP). The proposed model is constructed with an isotropic shell, beam and orthotropic shell elements representing the outmost resin, interlaminar resin and glass fiber cloth, respectively. The proposed model was applied to the failure progress analysis under three-point bending conditions, the validity of the numerical model was checked through comparisons with experimental results. The failure progress behaviors involving the local failures, such as interlaminar delamination and resin failure, could be simulated using the numerical model for analyzing the failure progress of GFRP. It is concluded that the model was effective for the failure progress analysis of GFRP. PMID:12238780

  12. Performance based seismic qualification of reinforced concrete nuclear materials processing facilities

    SciTech Connect

    Mertz, G.E.; Loceff, F.; Houston, T.; Rauls, G.; Mulliken, J.

    1997-09-01

    A seismic qualification of a reinforced concrete nuclear materials processing facility using performance based acceptance criteria is presented. Performance goals are defined in terms of a minimum annual seismic failure frequency. Pushover analyses are used to determine the building`s ultimate capacity and relate the capacity to roof drift and joint rotation. Nonlinear dynamic analyses are used to quantify the building`s drift using a suite of ground motion intensities representing varying soil conditions and levels of seismic hazard. A correlation between joint rotation and building drift to damage state is developed from experimental data. The damage state and seismic hazard are convolved to determine annual seismic failure frequency. The results of this rigorous approach is compared to those using equivalent force methods and pushover techniques recommended by ATC-19 and FEMA-273.

  13. Intelligent Web-Based Learning System with Personalized Learning Path Guidance

    ERIC Educational Resources Information Center

    Chen, C. M.

    2008-01-01

    Personalized curriculum sequencing is an important research issue for web-based learning systems because no fixed learning paths will be appropriate for all learners. Therefore, many researchers focused on developing e-learning systems with personalized learning mechanisms to assist on-line web-based learning and adaptively provide learning paths…

  14. Reinforcement learning of self-regulated sensorimotor β-oscillations improves motor performance.

    PubMed

    Naros, G; Naros, I; Grimm, F; Ziemann, U; Gharabaghi, A

    2016-07-01

    Self-regulation of sensorimotor oscillations is currently researched in neurorehabilitation, e.g. for priming subsequent physiotherapy in stroke patients, and may be modulated by neurofeedback or transcranial brain stimulation. It has still to be demonstrated, however, whether and under which training conditions such brain self-regulation could also result in motor gains. Thirty-two right-handed, healthy subjects participated in a three-day intervention during which they performed 462 trials of kinesthetic motor-imagery while a brain-robot interface (BRI) turned event-related β-band desynchronization of the left sensorimotor cortex into the opening of the right hand by a robotic orthosis. Different training conditions were compared in a parallel-group design: (i) adaptive classifier thresholding and contingent feedback, (ii) adaptive classifier thresholding and non-contingent feedback, (iii) non-adaptive classifier thresholding and contingent feedback, and (iv) non-adaptive classifier thresholding and non-contingent feedback. We studied the task-related cortical physiology with electroencephalography and the behavioral performance in a subsequent isometric motor task. Contingent neurofeedback and adaptive classifier thresholding were critical for learning brain self-regulation which, in turn, led to behavioral gains after the intervention. The acquired skill for sustained sensorimotor β-desynchronization correlated significantly with subsequent motor improvement. Operant learning of brain self-regulation with a BRI may offer a therapeutic perspective for severely affected stroke patients lacking residual hand function. PMID:27046109

  15. Web-Based Learning in a Geometry Course

    ERIC Educational Resources Information Center

    Chan, Hsungrow; Tsai, Pengheng; Huang, Tien-Yu

    2006-01-01

    This study concerns applying Web-based learning with learner controlled instructional materials in a geometry course. The experimental group learned in a Web-based learning environment, and the control group learned in a classroom. We observed that the learning method accounted for a total variation in learning effect of 19.1% in the 3rd grade and…

  16. A Blog-Based Dynamic Learning Map

    ERIC Educational Resources Information Center

    Wang, Kun Te; Huang, Yueh-Min; Jeng, Yu-Lin; Wang, Tzone-I

    2008-01-01

    Problem-based learning is a goal directed and constructive process for learners. When meeting problems, learners usually force themselves to form work groups in order to find a solution. Currently, blogs are becoming more popular and in fact has formed a community wherein people can share their learning experiences with others. Many pedagogical…

  17. Scenario-Based E-Learning Design

    ERIC Educational Resources Information Center

    Iverson, Kathleen; Colkey, Deborah

    2004-01-01

    As it was initially implemented, e-learning did little other than supply facts and information, offering limited opportunity for interactivity and problem-solving. Designers need to find ways to address past limitations and bring the engagement of classroom training to the web. One method that merits attention is scenario-based learning. The…

  18. Evaluating Web-Based Learning Systems

    ERIC Educational Resources Information Center

    Pergola, Teresa M.; Walters, L. Melissa

    2011-01-01

    Accounting educators continuously seek ways to effectively integrate instructional technology into accounting coursework as a means to facilitate active learning environments and address the technology-driven learning preferences of the current generation of students. Most accounting textbook publishers now provide interactive, web-based learning…

  19. Problem-Based Learning in Accounting

    ERIC Educational Resources Information Center

    Dockter, DuWayne L.

    2012-01-01

    Seasoned educators use an assortment of student-centered methods and tools to enhance their student's learning environment. In respects to methodologies used in accounting, educators have utilized and created new forms of problem-based learning exercises, including case studies, simulations, and other projects, to help students become more active…

  20. Predicting Learned Helplessness Based on Personality

    ERIC Educational Resources Information Center

    Maadikhah, Elham; Erfani, Nasrollah

    2014-01-01

    Learned helplessness as a negative motivational state can latently underlie repeated failures and create negative feelings toward the education as well as depression in students and other members of a society. The purpose of this paper is to predict learned helplessness based on students' personality traits. The research is a predictive…