Sample records for batch-mode reinforcement learning

  1. Batch Mode Reinforcement Learning based on the Synthesis of Artificial Trajectories

    PubMed Central

    Fonteneau, Raphael; Murphy, Susan A.; Wehenkel, Louis; Ernst, Damien

    2013-01-01

    In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value function. As an alternative to the use of function approximators, we rely on the synthesis of “artificial trajectories” from the given sample of trajectories, and show that this idea opens new avenues for designing and analyzing algorithms for batch mode reinforcement learning. PMID:24049244

  2. Adaptive Batch Mode Active Learning.

    PubMed

    Chakraborty, Shayok; Balasubramanian, Vineeth; Panchanathan, Sethuraman

    2015-08-01

    Active learning techniques have gained popularity to reduce human effort in labeling data instances for inducing a classifier. When faced with large amounts of unlabeled data, such algorithms automatically identify the exemplar and representative instances to be selected for manual annotation. More recently, there have been attempts toward a batch mode form of active learning, where a batch of data points is simultaneously selected from an unlabeled set. Real-world applications require adaptive approaches for batch selection in active learning, depending on the complexity of the data stream in question. However, the existing work in this field has primarily focused on static or heuristic batch size selection. In this paper, we propose two novel optimization-based frameworks for adaptive batch mode active learning (BMAL), where the batch size as well as the selection criteria are combined in a single formulation. We exploit gradient-descent-based optimization strategies as well as properties of submodular functions to derive the adaptive BMAL algorithms. The solution procedures have the same computational complexity as existing state-of-the-art static BMAL techniques. Our empirical results on the widely used VidTIMIT and the mobile biometric (MOBIO) data sets portray the efficacy of the proposed frameworks and also certify the potential of these approaches in being used for real-world biometric recognition applications.

  3. Using MATLAB Software on the Peregrine System | High-Performance Computing

    Science.gov Websites

    Learn how to run MATLAB software in batch mode on the Peregrine system. Below is an example MATLAB job in batch (non-interactive) mode. To try the example out, create both matlabTest.sub and /$USER. In this example, it is also the directory into which MATLAB will write the output file x.dat

  4. Dissociating error-based and reinforcement-based loss functions during sensorimotor learning

    PubMed Central

    McGregor, Heather R.; Mohatarem, Ayman

    2017-01-01

    It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback. PMID:28753634

  5. Dissociating error-based and reinforcement-based loss functions during sensorimotor learning.

    PubMed

    Cashaback, Joshua G A; McGregor, Heather R; Mohatarem, Ayman; Gribble, Paul L

    2017-07-01

    It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback.

  6. Batch-mode Reinforcement Learning for improved hydro-environmental systems management

    NASA Astrophysics Data System (ADS)

    Castelletti, A.; Galelli, S.; Restelli, M.; Soncini-Sessa, R.

    2010-12-01

    Despite the great progresses made in the last decades, the optimal management of hydro-environmental systems still remains a very active and challenging research area. The combination of multiple, often conflicting interests, high non-linearities of the physical processes and the management objectives, strong uncertainties in the inputs, and high dimensional state makes the problem challenging and intriguing. Stochastic Dynamic Programming (SDP) is one of the most suitable methods for designing (Pareto) optimal management policies preserving the original problem complexity. However, it suffers from a dual curse, which, de facto, prevents its practical application to even reasonably complex water systems. (i) Computational requirement grows exponentially with state and control dimension (Bellman's curse of dimensionality), so that SDP can not be used with water systems where the state vector includes more than few (2-3) units. (ii) An explicit model of each system's component is required (curse of modelling) to anticipate the effects of the system transitions, i.e. any information included into the SDP framework can only be either a state variable described by a dynamic model or a stochastic disturbance, independent in time, with the associated pdf. Any exogenous information that could effectively improve the system operation cannot be explicitly considered in taking the management decision, unless a dynamic model is identified for each additional information, thus adding to the problem complexity through the curse of dimensionality (additional state variables). To mitigate this dual curse, the combined use of batch-mode Reinforcement Learning (bRL) and Dynamic Model Reduction (DMR) techniques is explored in this study. bRL overcomes the curse of modelling by replacing explicit modelling with an external simulator and/or historical observations. The curse of dimensionality is averted using a functional approximation of the SDP value function based on proper non-linear regressors. DMR reduces the complexity and the associated computational requirements of non-linear distributed process based models, making them suitable for being included into optimization schemes. Results from real world applications of the approach are also presented, including reservoir operation with both quality and quantity targets.

  7. Diverse expected gradient active learning for relative attributes.

    PubMed

    You, Xinge; Wang, Ruxin; Tao, Dacheng

    2014-07-01

    The use of relative attributes for semantic understanding of images and videos is a promising way to improve communication between humans and machines. However, it is extremely labor- and time-consuming to define multiple attributes for each instance in large amount of data. One option is to incorporate active learning, so that the informative samples can be actively discovered and then labeled. However, most existing active-learning methods select samples one at a time (serial mode), and may therefore lose efficiency when learning multiple attributes. In this paper, we propose a batch-mode active-learning method, called diverse expected gradient active learning. This method integrates an informativeness analysis and a diversity analysis to form a diverse batch of queries. Specifically, the informativeness analysis employs the expected pairwise gradient length as a measure of informativeness, while the diversity analysis forces a constraint on the proposed diverse gradient angle. Since simultaneous optimization of these two parts is intractable, we utilize a two-step procedure to obtain the diverse batch of queries. A heuristic method is also introduced to suppress imbalanced multiclass distributions. Empirical evaluations of three different databases demonstrate the effectiveness and efficiency of the proposed approach.

  8. Diverse Expected Gradient Active Learning for Relative Attributes.

    PubMed

    You, Xinge; Wang, Ruxin; Tao, Dacheng

    2014-06-02

    The use of relative attributes for semantic understanding of images and videos is a promising way to improve communication between humans and machines. However, it is extremely labor- and time-consuming to define multiple attributes for each instance in large amount of data. One option is to incorporate active learning, so that the informative samples can be actively discovered and then labeled. However, most existing active-learning methods select samples one at a time (serial mode), and may therefore lose efficiency when learning multiple attributes. In this paper, we propose a batch-mode active-learning method, called Diverse Expected Gradient Active Learning (DEGAL). This method integrates an informativeness analysis and a diversity analysis to form a diverse batch of queries. Specifically, the informativeness analysis employs the expected pairwise gradient length as a measure of informativeness, while the diversity analysis forces a constraint on the proposed diverse gradient angle. Since simultaneous optimization of these two parts is intractable, we utilize a two-step procedure to obtain the diverse batch of queries. A heuristic method is also introduced to suppress imbalanced multi-class distributions. Empirical evaluations of three different databases demonstrate the effectiveness and efficiency of the proposed approach.

  9. Utilizing a Micro in the Accounting Classroom.

    ERIC Educational Resources Information Center

    Wolverton, L. Craig

    1982-01-01

    The author discusses how to select microcomputer software for an accounting program and what types of instructional modes to use. The following modes are examined: problem solving, decision making, automated accounting functions, learning new accounting concepts, reinforcing concepts already learned, developing independent learning skills, and…

  10. A Modified Edge Crack Torsion Test for Measurement of Mode III Fracture Toughness of Laminated Tape Composites

    NASA Technical Reports Server (NTRS)

    Czabaj, Michael W.; Davidson, Barry D.; Ratcliffe, James G.

    2016-01-01

    Modifications to the edge crack torsion (ECT) test are studied to improve the reliability of this test for measuring the mode-III fracture toughness, G (sub IIIc), of laminated tape fiber-reinforced polymeric (FRP) composites. First, the data reduction methods currently used in the ECT test are evaluated and deficiencies in their accuracy are discussed. An alternative data reduction technique, which uses a polynomial form to represent ECT specimen compliance solution, is evaluated and compared to FEA (finite element analysis) results. Second, seven batches of ECT specimens are tested, each batch containing specimens with a preimplanted midplane edge delamination and midplane plies with orientations of plus theta divided by minus theta, with theta ranging from 0 degrees to 90 degrees in 15-degree increments. Tests on these specimens show that intralaminar cracking occurs in specimens from all batches except for which theta = 15 degrees and 30 degrees. Tests on specimens of these two batches are shown to result in mode-III delamination growth at the intended ply interface. The findings from this study are encouraging steps towards the use of the ECT test as a standardized method for measuring G (sub IIIc), although further modification to the data reduction method is required to make it suitable for use as part of a standardized test method.

  11. Model-Free control performance improvement using virtual reference feedback tuning and reinforcement Q-learning

    NASA Astrophysics Data System (ADS)

    Radac, Mircea-Bogdan; Precup, Radu-Emil; Roman, Raul-Cristian

    2017-04-01

    This paper proposes the combination of two model-free controller tuning techniques, namely linear virtual reference feedback tuning (VRFT) and nonlinear state-feedback Q-learning, referred to as a new mixed VRFT-Q learning approach. VRFT is first used to find stabilising feedback controller using input-output experimental data from the process in a model reference tracking setting. Reinforcement Q-learning is next applied in the same setting using input-state experimental data collected under perturbed VRFT to ensure good exploration. The Q-learning controller learned with a batch fitted Q iteration algorithm uses two neural networks, one for the Q-function estimator and one for the controller, respectively. The VRFT-Q learning approach is validated on position control of a two-degrees-of-motion open-loop stable multi input-multi output (MIMO) aerodynamic system (AS). Extensive simulations for the two independent control channels of the MIMO AS show that the Q-learning controllers clearly improve performance over the VRFT controllers.

  12. Competitive learning with pairwise constraints.

    PubMed

    Covões, Thiago F; Hruschka, Eduardo R; Ghosh, Joydeep

    2013-01-01

    Constrained clustering has been an active research topic since the last decade. Most studies focus on batch-mode algorithms. This brief introduces two algorithms for on-line constrained learning, named on-line linear constrained vector quantization error (O-LCVQE) and constrained rival penalized competitive learning (C-RPCL). The former is a variant of the LCVQE algorithm for on-line settings, whereas the latter is an adaptation of the (on-line) RPCL algorithm to deal with constrained clustering. The accuracy results--in terms of the normalized mutual information (NMI)--from experiments with nine datasets show that the partitions induced by O-LCVQE are competitive with those found by the (batch-mode) LCVQE. Compared with this formidable baseline algorithm, it is surprising that C-RPCL can provide better partitions (in terms of the NMI) for most of the datasets. Also, experiments on a large dataset show that on-line algorithms for constrained clustering can significantly reduce the computational time.

  13. Data-driven model reference control of MIMO vertical tank systems with model-free VRFT and Q-Learning.

    PubMed

    Radac, Mircea-Bogdan; Precup, Radu-Emil; Roman, Raul-Cristian

    2018-02-01

    This paper proposes a combined Virtual Reference Feedback Tuning-Q-learning model-free control approach, which tunes nonlinear static state feedback controllers to achieve output model reference tracking in an optimal control framework. The novel iterative Batch Fitted Q-learning strategy uses two neural networks to represent the value function (critic) and the controller (actor), and it is referred to as a mixed Virtual Reference Feedback Tuning-Batch Fitted Q-learning approach. Learning convergence of the Q-learning schemes generally depends, among other settings, on the efficient exploration of the state-action space. Handcrafting test signals for efficient exploration is difficult even for input-output stable unknown processes. Virtual Reference Feedback Tuning can ensure an initial stabilizing controller to be learned from few input-output data and it can be next used to collect substantially more input-state data in a controlled mode, in a constrained environment, by compensating the process dynamics. This data is used to learn significantly superior nonlinear state feedback neural networks controllers for model reference tracking, using the proposed Batch Fitted Q-learning iterative tuning strategy, motivating the original combination of the two techniques. The mixed Virtual Reference Feedback Tuning-Batch Fitted Q-learning approach is experimentally validated for water level control of a multi input-multi output nonlinear constrained coupled two-tank system. Discussions on the observed control behavior are offered. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.

  14. Binary Multidimensional Scaling for Hashing.

    PubMed

    Huang, Yameng; Lin, Zhouchen

    2017-10-04

    Hashing is a useful technique for fast nearest neighbor search due to its low storage cost and fast query speed. Unsupervised hashing aims at learning binary hash codes for the original features so that the pairwise distances can be best preserved. While several works have targeted on this task, the results are not satisfactory mainly due to the oversimplified model. In this paper, we propose a unified and concise unsupervised hashing framework, called Binary Multidimensional Scaling (BMDS), which is able to learn the hash code for distance preservation in both batch and online mode. In the batch mode, unlike most existing hashing methods, we do not need to simplify the model by predefining the form of hash map. Instead, we learn the binary codes directly based on the pairwise distances among the normalized original features by Alternating Minimization. This enables a stronger expressive power of the hash map. In the online mode, we consider the holistic distance relationship between current query example and those we have already learned, rather than only focusing on current data chunk. It is useful when the data come in a streaming fashion. Empirical results show that while being efficient for training, our algorithm outperforms state-of-the-art methods by a large margin in terms of distance preservation, which is practical for real-world applications.

  15. Using MATLAB Software on the Peregrine System | High-Performance Computing

    Science.gov Websites

    | NREL MATLAB Software on the Peregrine System Using MATLAB Software on the Peregrine System Learn how to use MATLAB software on the Peregrine system. Running MATLAB in Batch Mode Using the node. Understanding Versions and Licenses Learn about the MATLAB software versions and licenses

  16. Optimal control in microgrid using multi-agent reinforcement learning.

    PubMed

    Li, Fu-Dong; Wu, Min; He, Yong; Chen, Xin

    2012-11-01

    This paper presents an improved reinforcement learning method to minimize electricity costs on the premise of satisfying the power balance and generation limit of units in a microgrid with grid-connected mode. Firstly, the microgrid control requirements are analyzed and the objective function of optimal control for microgrid is proposed. Then, a state variable "Average Electricity Price Trend" which is used to express the most possible transitions of the system is developed so as to reduce the complexity and randomicity of the microgrid, and a multi-agent architecture including agents, state variables, action variables and reward function is formulated. Furthermore, dynamic hierarchical reinforcement learning, based on change rate of key state variable, is established to carry out optimal policy exploration. The analysis shows that the proposed method is beneficial to handle the problem of "curse of dimensionality" and speed up learning in the unknown large-scale world. Finally, the simulation results under JADE (Java Agent Development Framework) demonstrate the validity of the presented method in optimal control for a microgrid with grid-connected mode. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

  17. Runoff forecasting using a Takagi-Sugeno neuro-fuzzy model with online learning

    NASA Astrophysics Data System (ADS)

    Talei, Amin; Chua, Lloyd Hock Chye; Quek, Chai; Jansson, Per-Erik

    2013-04-01

    SummaryA study using local learning Neuro-Fuzzy System (NFS) was undertaken for a rainfall-runoff modeling application. The local learning model was first tested on three different catchments: an outdoor experimental catchment measuring 25 m2 (Catchment 1), a small urban catchment 5.6 km2 in size (Catchment 2), and a large rural watershed with area of 241.3 km2 (Catchment 3). The results obtained from the local learning model were comparable or better than results obtained from physically-based, i.e. Kinematic Wave Model (KWM), Storm Water Management Model (SWMM), and Hydrologiska Byråns Vattenbalansavdelning (HBV) model. The local learning algorithm also required a shorter training time compared to a global learning NFS model. The local learning model was next tested in real-time mode, where the model was continuously adapted when presented with current information in real time. The real-time implementation of the local learning model gave better results, without the need for retraining, when compared to a batch NFS model, where it was found that the batch model had to be retrained periodically in order to achieve similar results.

  18. Attention Cueing and Activity Equally Reduce False Alarm Rate in Visual-Auditory Associative Learning through Improving Memory.

    PubMed

    Nikouei Mahani, Mohammad-Ali; Haghgoo, Hojjat Allah; Azizi, Solmaz; Nili Ahmadabadi, Majid

    2016-01-01

    In our daily life, we continually exploit already learned multisensory associations and form new ones when facing novel situations. Improving our associative learning results in higher cognitive capabilities. We experimentally and computationally studied the learning performance of healthy subjects in a visual-auditory sensory associative learning task across active learning, attention cueing learning, and passive learning modes. According to our results, the learning mode had no significant effect on learning association of congruent pairs. In addition, subjects' performance in learning congruent samples was not correlated with their vigilance score. Nevertheless, vigilance score was significantly correlated with the learning performance of the non-congruent pairs. Moreover, in the last block of the passive learning mode, subjects significantly made more mistakes in taking non-congruent pairs as associated and consciously reported lower confidence. These results indicate that attention and activity equally enhanced visual-auditory associative learning for non-congruent pairs, while false alarm rate in the passive learning mode did not decrease after the second block. We investigated the cause of higher false alarm rate in the passive learning mode by using a computational model, composed of a reinforcement learning module and a memory-decay module. The results suggest that the higher rate of memory decay is the source of making more mistakes and reporting lower confidence in non-congruent pairs in the passive learning mode.

  19. Dissipation of hydrological tracers and the herbicide S-metolachlor in batch and continuous-flow wetlands.

    PubMed

    Maillard, Elodie; Lange, Jens; Schreiber, Steffi; Dollinger, Jeanne; Herbstritt, Barbara; Millet, Maurice; Imfeld, Gwenaël

    2016-02-01

    Pesticide dissipation in wetland systems with regard to hydrological conditions and operational modes is poorly known. Here, we investigated in artificial wetlands the impact of batch versus continuous-flow modes on the dissipation of the chiral herbicide S-metolachlor (S-MET) and hydrological tracers (bromide, uranine and sulforhodamine B). The wetlands received water contaminated with the commercial formulation Mercantor Gold(®) (960 g L(-1) of S-MET, 87% of the S-enantiomer). The tracer mass budget revealed that plant uptake, sorption, photo- and presumably biodegradation were prominent under batch mode (i.e. characterized by alternating oxic-anoxic conditions), in agreement with large dissipation of S-MET (90%) under batch mode. Degradation was the main dissipation pathway of S-MET in the wetlands. The degradate metolachlor oxanilic acid (MOXA) mainly formed under batch mode, whereas metolachlor ethanesulfonic acid (MESA) prevailed under continuous-flow mode, suggesting distinct degradation pathways in each wetland. R-enantiomer was preferentially degraded under batch mode, which indicated enantioselective biodegradation. The release of MESA and MOXA by the wetlands as well as the potential persistence of S-MET compared to R-MET under both oxic and anoxic conditions may be relevant for groundwater and ecotoxicological risk assessment. This study shows the effect of batch versus continuous modes on pollutant dissipation in wetlands, and that alternate biogeochemical conditions under batch mode enhance S-MET biodegradation. Copyright © 2015 Elsevier Ltd. All rights reserved.

  20. Online Bagging and Boosting

    NASA Technical Reports Server (NTRS)

    Oza, Nikunji C.

    2005-01-01

    Bagging and boosting are two of the most well-known ensemble learning methods due to their theoretical performance guarantees and strong experimental results. However, these algorithms have been used mainly in batch mode, i.e., they require the entire training set to be available at once and, in some cases, require random access to the data. In this paper, we present online versions of bagging and boosting that require only one pass through the training data. We build on previously presented work by presenting some theoretical results. We also compare the online and batch algorithms experimentally in terms of accuracy and running time.

  1. Blended Learning: Across the Disciplines, across the Academy. New Pedagogies and Practices for Teaching in Higher Education

    ERIC Educational Resources Information Center

    Glazer, Francine S., Ed.

    2011-01-01

    This is a practical introduction to blended learning, presenting examples of implementation across a broad spectrum of disciplines. For faculty unfamiliar with this mode of teaching, it illustrates how to address the core challenge of blended learning--to link the activities in each medium so that they reinforce each other to create a single,…

  2. 40 CFR 63.1402 - Definitions.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... properties may vary with time. For a unit operation operated in a batch mode (i.e., batch unit operation... means a unit operation operated in a batch mode. Block means the time period that comprises a single batch cycle. Combustion device burner means a device designed to mix and ignite fuel and air to provide...

  3. 40 CFR 63.1402 - Definitions.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... properties may vary with time. For a unit operation operated in a batch mode (i.e., batch unit operation... means a unit operation operated in a batch mode. Block means the time period that comprises a single batch cycle. Combustion device burner means a device designed to mix and ignite fuel and air to provide...

  4. 40 CFR 63.1402 - Definitions.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... properties may vary with time. For a unit operation operated in a batch mode (i.e., batch unit operation... means a unit operation operated in a batch mode. Block means the time period that comprises a single batch cycle. Combustion device burner means a device designed to mix and ignite fuel and air to provide...

  5. 40 CFR 63.1402 - Definitions.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... properties may vary with time. For a unit operation operated in a batch mode (i.e., batch unit operation... means a unit operation operated in a batch mode. Block means the time period that comprises a single batch cycle. Combustion device burner means a device designed to mix and ignite fuel and air to provide...

  6. 40 CFR 63.1402 - Definitions.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... properties may vary with time. For a unit operation operated in a batch mode (i.e., batch unit operation... means a unit operation operated in a batch mode. Block means the time period that comprises a single batch cycle. Combustion device burner means a device designed to mix and ignite fuel and air to provide...

  7. Towards autonomous neuroprosthetic control using Hebbian reinforcement learning.

    PubMed

    Mahmoudi, Babak; Pohlmeyer, Eric A; Prins, Noeline W; Geng, Shijia; Sanchez, Justin C

    2013-12-01

    Our goal was to design an adaptive neuroprosthetic controller that could learn the mapping from neural states to prosthetic actions and automatically adjust adaptation using only a binary evaluative feedback as a measure of desirability/undesirability of performance. Hebbian reinforcement learning (HRL) in a connectionist network was used for the design of the adaptive controller. The method combines the efficiency of supervised learning with the generality of reinforcement learning. The convergence properties of this approach were studied using both closed-loop control simulations and open-loop simulations that used primate neural data from robot-assisted reaching tasks. The HRL controller was able to perform classification and regression tasks using its episodic and sequential learning modes, respectively. In our experiments, the HRL controller quickly achieved convergence to an effective control policy, followed by robust performance. The controller also automatically stopped adapting the parameters after converging to a satisfactory control policy. Additionally, when the input neural vector was reorganized, the controller resumed adaptation to maintain performance. By estimating an evaluative feedback directly from the user, the HRL control algorithm may provide an efficient method for autonomous adaptation of neuroprosthetic systems. This method may enable the user to teach the controller the desired behavior using only a simple feedback signal.

  8. Effects of Fiber/Matrix Interface and its Composition on Mechanical Properties of Hi-Nicalon/Celsian Composites

    NASA Technical Reports Server (NTRS)

    Bansal, Narottam P.; Eldridge, Jeffrey I.

    1999-01-01

    To evaluate the effects of fiber coatings on composite mechanical properties. unidirectional celsian matrix composites reinforced with uncoated Hi-Nicalon fibers and those precoated with a dual BN/SiC layer in two separate batches (batch 1 and batch 2) were tested in three-point flexure. The uncoated-fiber reinforced composites showed catastrophic failure with strength of 210+/-35 MPa and a flat fracture surface. In contrast, composites reinforced with coated fibers exhibited graceful failure with extensive fiber pullout and showed significantly higher ultimate strengths, 904 and 759 MPa for the batch 1 and 2 coatings. respectively. Fiber push-in tests and microscopic examination indicated no chemical reaction at the uncoated or coated fiber-matrix interfaces that might be responsible for fiber strength degradation. Instead, the low strength of composite with uncoated fibers was due to degradation of the fiber strength from mechanical damage during composite processing. Despite identical processing, the first matrix cracking stresses (Sigma(sub mc)) of the composites reinforced with fibers coated in batch 1 and batch 2 were quite different, 436 and 122 MPa, respectively. The large difference in Sigma(sub mc) of the coated-fiber composites was attributed to differences in fiber sliding stresses (Tau(sub friction)), 121.2+/-48.7 and 10.4+/-3.1 MPa, respectively. for the two composites as determined by the fiber push-in method. Such a large difference in Tau(sub friction). for the two composites was found to be due to the difference in the compositions of the interface coatings. Scanning Auger microprobe analysis revealed the presence of carbon layers between the fiber and BN. and also between the BN and SiC coatings in the composite showing lower Tau(sub friction). This resulted in lower Sigma(sub mc) in agreement with the ACK theory. The ultimate strengths of the two composites depended mainly on the fiber volume fraction and were not significantly effected by Tau(sub friction) values, as expected. The poor reproducibility of the fiber coating composition between the two batches was judged to be the primary source of the large differences in performance of the two composites.

  9. Induced bioelectrochemical metabolism for bioremediation of petroleum refinery wastewater: Optimization of applied potential and flow of wastewater.

    PubMed

    Mohanakrishna, Gunda; Al-Raoush, Riyadh I; Abu-Reesh, Ibrahim M

    2018-07-01

    Hybrid based bioelectrochemical system (BES) configured with embedded anode and cathode electrodes in soil was tested for the bioelectrochemical degradation of petroleum refinery wastewater (PRW). Four applied potentials were studied to optimize under batch mode operation, among which 2 V resulted in higher COD degradation (69.2%) and power density (725 mW/m 2 ) during 7 days of operation. Further studies with continuous mode of operation at optimized potential (2 V) showed that hydraulic retention time (HRT) of 19 h achieved the highest COD removal (37%) and highest power density (561 mW/m 2 ). BES function with respect to treatment efficiencies of other pollutants of PRW was also identified with respect to oil and grease (batch mode, 91%; continuous mode, 34%), total dissolved salts (batch mode, 53%; continuous mode, 24%) and sulfates (batch mode, 59%; continuous mode, 42%). Soil microenvironment in association with BES forms complex processes, providing suitable conditions for efficient treatment of PRW. Copyright © 2018 Elsevier Ltd. All rights reserved.

  10. Searching CA Condensates, On-Line and Batch.

    ERIC Educational Resources Information Center

    Kaminecki, Ronald M.; And Others

    Batch mode processing is compared, using cost-effectiveness, with on-line processing for computer-aided searching of chemical abstracts. Consideration for time, need, coverage, and adaptability are found to be the criteria by which a searcher selects a method, and sometimes both methods are used. There is a tradeoff between batch mode's slower…

  11. Enhanced bioethanol production by fed-batch simultaneous saccharification and co-fermentation at high solid loading of Fenton reaction and sodium hydroxide sequentially pretreated sugarcane bagasse.

    PubMed

    Zhang, Teng; Zhu, Ming-Jun

    2017-04-01

    A study on the fed-batch simultaneous saccharification and co-fermentation (SSCF) of Fenton reaction combined with NaOH pretreated sugarcane bagasse (SCB) at a high solid loading of 10-30% (w/v) was investigated. Enzyme feeding mode, substrate feeding mode and combination of both were compared with the batch mode under respective solid loadings. Ethanol concentrations of above 80g/L were obtained in batch and enzyme feeding modes at a solid loading of 30% (w/v). Enzyme feeding mode was found to increase ethanol productivity and reduce enzyme loading to a value of 1.23g/L/h and 9FPU/g substrate, respectively. The present study provides an economically feasible process for high concentration bioethanol production. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Low-Cost Production of Composite Bushings for Jet Engine Applications

    NASA Technical Reports Server (NTRS)

    Gray, Robert A.

    1998-01-01

    The objectives of this research program were to reduce the manufacturing costs of variable stator vane bushings by 1) eliminating the expensive carbon fiber braiding operation, 2) replacing the batch mode impregnation, B-stage, and cutting operations with a continuous process, and 3) reducing the molding cycle and machining operations with injection molding to achieve near-net shapes. Braided bushings were successfully fabricated with both AMB-17XLD and AMB-TPD resin systems. The composite bushings achieved high glass transition temperature after post-cure (+300 C) and comparable weight loss to the PNM-15 bushings. ANM-17XLD bushings made with "batch-mode" molding compound (at 0.5 in. fiber length) achieved a +300 lb-force flange break strength which was superior to the continuous braided-fiber reinforced bushing. The non-MDA resin technology developed in this contract appears attractive for bushing applications that do not exceed a 300 C use temperature. Two thermoplastic polyimide resins were synthesized in order to generate injection molding compound powders. Excellent processing results were obtained at injection temperatures in excess of 300 C. Micro-tensile specimens were produced from each resin type and the Tg measurements (by TMA) for these samples were equivalent to AURUM(R). Thermal Gravimetric Analysis (TGA) conducted at 10 C/min showed that the non-MDA AMB-type polyimide thermoplastics had comparable weight loss to PMR-15 up to 500 C.

  13. Effects of Fiber Coating Composition on Mechanical Behavior of Silicon Carbide Fiber-Reinforced Celsian Composites

    NASA Technical Reports Server (NTRS)

    Bansal, Narottam P.; Elderidge, Jeffrey I.

    1998-01-01

    Celsian matrix composites reinforced with Hi-Nicalon fibers, precoated with a dual layer of BN/SiC by chemical vapor deposition in two separate batches, were fabricated. Mechanical properties of the composites were measured in three-point flexure. Despite supposedly identical processing, the composite panels fabricated with fibers coated in two batches exhibited substantially different mechanical behavior. The first matrix cracking stresses (sigma(sub mc)) of the composites reinforced with fibers coated in batch 1 and batch 2 were 436 and 122 MPa, respectively. This large difference in sigma(sub mc) was attributed to differences in fiber sliding stresses(tau(sub friction)), 121.2+/-48.7 and 10.4+/-3.1 MPa, respectively, for the two composites as determined by the fiber push-in method. Such a large difference in values of tau(sub friction) for the two composites was found to be due to the difference in the compositions of the interface coatings. Scanning Auger microprobe analysis revealed the presence of carbon layers between the fiber and BN, and also between the BN and SiC coatings in the composite showing lower tau(sub friction). This resulted in lower sigma(sub mc) in agreement with the ACK theory. The ultimate strengths of the two composites, 904 and 759 MPa, depended mainly on the fiber volume fraction and were not significantly effected by tau(sub friction) values, as expected. The poor reproducibility of the fiber coating composition between the two batches was judged to be the primary source of the large differences in performance of the two composites.

  14. Contemporary Approaches to Conditioning and Learning.

    ERIC Educational Resources Information Center

    McGuigan, F. J., Ed.; Lumsden, D. Barry, Ed.

    Chapters contained in this volume, each with a list of references appended, are: "Scientific Psychology in Transition" by Gregory A. Kimble; "Higher Mental Processes as the Bases for the Laws of Conditioning" by Eli Saltz; "Reification and Reality in Conditioning Paradigms: Implications of Results When Modes of Reinforcement are Changed" by David…

  15. An On-Chip Learning Neuromorphic Autoencoder With Current-Mode Transposable Memory Read and Virtual Lookup Table.

    PubMed

    Cho, Hwasuk; Son, Hyunwoo; Seong, Kihwan; Kim, Byungsub; Park, Hong-June; Sim, Jae-Yoon

    2018-02-01

    This paper presents an IC implementation of on-chip learning neuromorphic autoencoder unit in a form of rate-based spiking neural network. With a current-mode signaling scheme embedded in a 500 × 500 6b SRAM-based memory, the proposed architecture achieves simultaneous processing of multiplications and accumulations. In addition, a transposable memory read for both forward and backward propagations and a virtual lookup table are also proposed to perform an unsupervised learning of restricted Boltzmann machine. The IC is fabricated using 28-nm CMOS process and is verified in a three-layer network of encoder-decoder pair for training and recovery of images with two-dimensional pixels. With a dataset of 50 digits, the IC shows a normalized root mean square error of 0.078. Measured energy efficiencies are 4.46 pJ per synaptic operation for inference and 19.26 pJ per synaptic weight update for learning, respectively. The learning performance is also estimated by simulations if the proposed hardware architecture is extended to apply to a batch training of 60 000 MNIST datasets.

  16. Lipid Content and Cryotolerance of Bakers' Yeast in Frozen Doughs †

    PubMed Central

    Gélinas, Pierre; Fiset, Gisèle; Willemot, Claude; Goulet, Jacques

    1991-01-01

    The relationship between lipid content and tolerance to freezing at −50°C was studied in Saccharomyces cerevisiae grown under batch or fed-batch mode and various aeration and temperature conditions. A higher free-sterol-to-phospholipid ratio as well as higher free sterol and phospholipid contents correlated with the superior cryoresistance in dough or in water of the fed-batch-grown compared with the batch-grown cells. For both growth modes, the presence of excess dissolved oxygen in the culture medium greatly improved yeast cryoresistance and trehalose content (P. Gélinas, G. Fiset, A. LeDuy, and J. Goulet, Appl. Environ. Microbiol. 26:2453-2459, 1989) without significantly changing the lipid profile. Under the batch or fed-batch modes, no correlation was found between the cryotolerance of bakers' yeast and the total cellular lipid content, the total sterol content, the phospholipid unsaturation index, the phosphate or crude protein content, or the yeast cell morphology (volume and roundness). PMID:16348412

  17. Batch versus column modes for the adsorption of radioactive metal onto rice husk waste: conditions optimization through response surface methodology.

    PubMed

    Kausar, Abida; Bhatti, Haq Nawaz; Iqbal, Munawar; Ashraf, Aisha

    2017-09-01

    Batch and column adsorption modes were compared for the adsorption of U(VI) ions using rice husk waste biomass (RHWB). Response surface methodology was employed for the optimization of process variables, i.e., (pH (A), adsorbent dose (B), initial ion concentration (C)) in batch mode. The B, C and C 2 affected the U(VI) adsorption significantly in batch mode. The developed quadratic model was found to be validated on the basis of regression coefficient as well as analysis of variance. The predicted and actual values were found to be correlated well, with negligible residual value, and B, C and C 2 were significant terms. The column study was performed considering bed height, flow rate and initial metal ion concentration, and adsorption efficiency was evaluated through breakthrough curves and bed depth service time and Thomas models. Adsorption was found to be dependent on bed height and initial U(VI) ion concentration, and flow rate decreased the adsorption capacity. Thomas models fitted well to the U(VI) adsorption onto RHWB. Results revealed that RHWB has potential to remove U(VI) ions and batch adsorption was found to be efficient versus column mode.

  18. Enzyme-assisted supercritical carbon dioxide extraction of black pepper oleoresin for enhanced yield of piperine-rich extract.

    PubMed

    Dutta, Sayantani; Bhattacharjee, Paramita

    2015-07-01

    Black pepper (Piper nigrum L.), the King of Spices is the most popular spice globally and its active ingredient, piperine, is reportedly known for its therapeutic potency. In this work, enzyme-assisted supercritical carbon dioxide (SC-CO2) extraction of black pepper oleoresin was investigated using α-amylase (from Bacillus licheniformis) for enhanced yield of piperine-rich extract possessing good combination of phytochemical properties. Optimization of the extraction parameters (without enzyme), mainly temperature and pressure, was conducted in both batch and continuous modes and the optimized conditions that provided the maximum yield of piperine was in the batch mode, with a sample size of 20 g of black pepper powder (particle diameter 0.42 ± 0.02 mm) at 60 °C and 300 bar at 2 L/min of CO2 flow. Studies on activity of α-amylase were conducted under these optimized conditions in both batch and continuous modes, with varying amounts of lyophilized enzyme (2 mg, 5 mg and 10 mg) and time of exposure of the enzyme to SC-CO2 (2.25 h and 4.25 h). The specific activity of the enzyme increased by 2.13 times when treated in the continuous mode than in the batch mode (1.25 times increase). The structural changes of the treated enzymes were studied by (1)H NMR analyses. In case of α-amylase assisted extractions of black pepper, both batch and continuous modes significantly increased the yields and phytochemical properties of piperine-rich extracts; with higher increase in batch mode than in continuous. Copyright © 2014 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

  19. Policy improvement by a model-free Dyna architecture.

    PubMed

    Hwang, Kao-Shing; Lo, Chia-Yue

    2013-05-01

    The objective of this paper is to accelerate the process of policy improvement in reinforcement learning. The proposed Dyna-style system combines two learning schemes, one of which utilizes a temporal difference method for direct learning; the other uses relative values for indirect learning in planning between two successive direct learning cycles. Instead of establishing a complicated world model, the approach introduces a simple predictor of average rewards to actor-critic architecture in the simulation (planning) mode. The relative value of a state, defined as the accumulated differences between immediate reward and average reward, is used to steer the improvement process in the right direction. The proposed learning scheme is applied to control a pendulum system for tracking a desired trajectory to demonstrate its adaptability and robustness. Through reinforcement signals from the environment, the system takes the appropriate action to drive an unknown dynamic to track desired outputs in few learning cycles. Comparisons are made between the proposed model-free method, a connectionist adaptive heuristic critic, and an advanced method of Dyna-Q learning in the experiments of labyrinth exploration. The proposed method outperforms its counterparts in terms of elapsed time and convergence rate.

  20. Budget Online Learning Algorithm for Least Squares SVM.

    PubMed

    Jian, Ling; Shen, Shuqian; Li, Jundong; Liang, Xijun; Li, Lei

    2017-09-01

    Batch-mode least squares support vector machine (LSSVM) is often associated with unbounded number of support vectors (SVs'), making it unsuitable for applications involving large-scale streaming data. Limited-scale LSSVM, which allows efficient updating, seems to be a good solution to tackle this issue. In this paper, to train the limited-scale LSSVM dynamically, we present a budget online LSSVM (BOLSSVM) algorithm. Methodologically, by setting a fixed budget for SVs', we are able to update the LSSVM model according to the updated SVs' set dynamically without retraining from scratch. In particular, when a new small chunk of SVs' substitute for the old ones, the proposed algorithm employs a low rank correction technology and the Sherman-Morrison-Woodbury formula to compute the inverse of saddle point matrix derived from the LSSVM's Karush-Kuhn-Tucker (KKT) system, which, in turn, updates the LSSVM model efficiently. In this way, the proposed BOLSSVM algorithm is especially useful for online prediction tasks. Another merit of the proposed BOLSSVM is that it can be used for k -fold cross validation. Specifically, compared with batch-mode learning methods, the computational complexity of the proposed BOLSSVM method is significantly reduced from O(n 4 ) to O(n 3 ) for leave-one-out cross validation with n training samples. The experimental results of classification and regression on benchmark data sets and real-world applications show the validity and effectiveness of the proposed BOLSSVM algorithm.

  1. Strategies for the startup of methanogenic inverse fluidized-bed reactors using colonized particles.

    PubMed

    Alvarado-Lassman, A; Sandoval-Ramos, A; Flores-Altamirano, M G; Vallejo-Cantú, N A; Méndez-Contreras, J M

    2010-05-01

    One of the inconveniences in the startup of methanogenic inverse fluidized-bed reactors (IFBRs) is the long period required for biofilm formation and stabilization of the system. Previous researchers have preferred to start up in batch mode to shorten stabilization times. Much less work has been done with continuous-mode startup for the IFBR configuration of reactors. In this study, we prepared two IFBRs with similar characteristics to compare startup times for batch- and continuous-operation modes. The reactors were inoculated with a small quantity of colonized particles and run for a period of 3 months, to establish the optimal startup strategy using synthetic media as a substrate (glucose as a source of carbon). After the startup stage, the continuous- and batch-mode reactors removed more than 80% of the chemical oxygen demand (COD) in 51 and 60 days of operation, respectively; however, at the end of the experiments, the continuous-mode reactor had more biomass attached to the support media than the batch-mode reactor. Both reactors developed fully covered support media, but only the continuous-mode reactor had methane yields close to the theoretical value that is typical of stable reactors. Then, a combined startup strategy was proposed, with industrial wastewater as the substrate, using a sequence of batch cycles followed by continuous operation, which allows stable operation at an organic loading rate of 20 g COD/L x d in 15 days. Using a fraction of colonized support as an inoculum presents advantages, with respect to previously reported strategies.

  2. Formulation of advanced consumables management models: Environmental control and electrical power system performance models requirements

    NASA Technical Reports Server (NTRS)

    Daly, J. K.; Torian, J. G.

    1979-01-01

    Software design specifications for developing environmental control and life support system (ECLSS) and electrical power system (EPS) programs into interactive computer programs are presented. Specifications for the ECLSS program are at the detail design level with respect to modification of an existing batch mode program. The FORTRAN environmental analysis routines (FEAR) are the subject batch mode program. The characteristics of the FEAR program are included for use in modifying batch mode programs to form interactive programs. The EPS program specifications are at the preliminary design level. Emphasis is on top-down structuring in the development of an interactive program.

  3. Efficient production of l-lactic acid from hydrolysate of Jerusalem artichoke with immobilized cells of Lactococcus lactis in fibrous bed bioreactors.

    PubMed

    Shi, Zhouming; Wei, Peilian; Zhu, Xiangcheng; Cai, Jin; Huang, Lei; Xu, Zhinan

    2012-10-10

    Hydrolysate of Jerusalem artichoke was applied for the production of l-lactic acid by immobilized Lactococcus lactis cells in a fibrous bed bioreactor system. Preliminary experiments had indicated that the high quality hydrolysate, which was derived from the 40 min acid treatment at 95 °C and pH 1.8, was sufficient to support the cell growth and synthesis of l-lactic acid. With the addition of 5 g/l yeast extract, the fermentative performance of free cell system was evidently improved. After the basal settlement of hydrolysate based fermentation, the batch mode and the fed-batch mode fermentation were carried out in the free cell system and the fibrous bed bioreactor system, respectively. In all cases the immobilized cells presented the superior ability to produce l-lactic acid. The comparison of batch mode and fed-batch mode also indicated that the growth-limiting feeding strategy could reduce the lag phase of fermentation process and enhance the production of l-lactic acid. The achieved maximum concentration of l-lactic acid was 142 g/l in the fed-batch mode. Subsequent repeated-batch fermentation of the fibrous bed bioreactor system had further exhibited the persistence and stability of this system for the high production of l-lactic acid in a long term. Our work suggested the great potential of the fibrous bed bioreactor system and hydrolysate of J. artichoke in the economical production of l-lactic acid at industrial scale. Copyright © 2012 Elsevier Inc. All rights reserved.

  4. Bioprocessing Data for the Production of Marine Enzymes

    PubMed Central

    Sarkar, Sreyashi; Pramanik, Arnab; Mitra, Anindita; Mukherjee, Joydeep

    2010-01-01

    This review is a synopsis of different bioprocess engineering approaches adopted for the production of marine enzymes. Three major modes of operation: batch, fed-batch and continuous have been used for production of enzymes (such as protease, chitinase, agarase, peroxidase) mainly from marine bacteria and fungi on a laboratory bioreactor and pilot plant scales. Submerged, immobilized and solid-state processes in batch mode were widely employed. The fed-batch process was also applied in several bioprocesses. Continuous processes with suspended cells as well as with immobilized cells have been used. Investigations in shake flasks were conducted with the prospect of large-scale processing in reactors. PMID:20479981

  5. An Integer Batch Scheduling Model for a Single Machine with Simultaneous Learning and Deterioration Effects to Minimize Total Actual Flow Time

    NASA Astrophysics Data System (ADS)

    Yusriski, R.; Sukoyo; Samadhi, T. M. A. A.; Halim, A. H.

    2016-02-01

    In the manufacturing industry, several identical parts can be processed in batches, and setup time is needed between two consecutive batches. Since the processing times of batches are not always fixed during a scheduling period due to learning and deterioration effects, this research deals with batch scheduling problems with simultaneous learning and deterioration effects. The objective is to minimize total actual flow time, defined as a time interval between the arrival of all parts at the shop and their common due date. The decision variables are the number of batches, integer batch sizes, and the sequence of the resulting batches. This research proposes a heuristic algorithm based on the Lagrange Relaxation. The effectiveness of the proposed algorithm is determined by comparing the resulting solutions of the algorithm to the respective optimal solution obtained from the enumeration method. Numerical experience results show that the average of difference among the solutions is 0.05%.

  6. Effective reinforcement learning following cerebellar damage requires a balance between exploration and motor noise.

    PubMed

    Therrien, Amanda S; Wolpert, Daniel M; Bastian, Amy J

    2016-01-01

    Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. © The Author (2015). Published by Oxford University Press on behalf of the Guarantors of Brain.

  7. Effective reinforcement learning following cerebellar damage requires a balance between exploration and motor noise

    PubMed Central

    Therrien, Amanda S.; Wolpert, Daniel M.

    2016-01-01

    Abstract See Miall and Galea (doi: 10.1093/awv343 ) for a scientific commentary on this article. Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. PMID:26626368

  8. Apprenticeship Learning: Learning to Schedule from Human Experts

    DTIC Science & Technology

    2016-06-09

    approaches to learning such models are based on Markov models, such as reinforcement learning or inverse reinforcement learning (Busoniu, Babuska, and De...via inverse reinforcement learning. In ICML. Barto, A. G., and Mahadevan, S. 2003. Recent advances in hierarchical reinforcement learning. Discrete...of tasks with temporal constraints. In Proc. AAAI, 2110–2116. Odom, P., and Natarajan, S. 2015. Active advice seeking for inverse reinforcement

  9. Stochastic Averaging for Constrained Optimization With Application to Online Resource Allocation

    NASA Astrophysics Data System (ADS)

    Chen, Tianyi; Mokhtari, Aryan; Wang, Xin; Ribeiro, Alejandro; Giannakis, Georgios B.

    2017-06-01

    Existing approaches to resource allocation for nowadays stochastic networks are challenged to meet fast convergence and tolerable delay requirements. The present paper leverages online learning advances to facilitate stochastic resource allocation tasks. By recognizing the central role of Lagrange multipliers, the underlying constrained optimization problem is formulated as a machine learning task involving both training and operational modes, with the goal of learning the sought multipliers in a fast and efficient manner. To this end, an order-optimal offline learning approach is developed first for batch training, and it is then generalized to the online setting with a procedure termed learn-and-adapt. The novel resource allocation protocol permeates benefits of stochastic approximation and statistical learning to obtain low-complexity online updates with learning errors close to the statistical accuracy limits, while still preserving adaptation performance, which in the stochastic network optimization context guarantees queue stability. Analysis and simulated tests demonstrate that the proposed data-driven approach improves the delay and convergence performance of existing resource allocation schemes.

  10. Analysis of bulk arrival queueing system with batch size dependent service and working vacation

    NASA Astrophysics Data System (ADS)

    Niranjan, S. P.; Indhira, K.; Chandrasekaran, V. M.

    2018-04-01

    This paper concentrates on single server bulk arrival queue system with batch size dependent service and working vacation. The server provides service in two service modes depending upon the queue length. The server provides single service if the queue length is at least `a'. On the other hand the server provides fixed batch service if the queue length is at least `k' (k > a). Batch service is provided with some fixed batch size `k'. After completion of service if the queue length is less than `a' then the server leaves for working vacation. During working vacation customers are served with lower service rate than the regular service rate. Service during working vacation also contains two service modes. For the proposed model probability generating function of the queue length at an arbitrary time will be obtained by using supplementary variable technique. Some performance measures will also be presented with suitable numerical illustrations.

  11. Rational and Mechanistic Perspectives on Reinforcement Learning

    ERIC Educational Resources Information Center

    Chater, Nick

    2009-01-01

    This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…

  12. Effects of carbon brush anode size and loading on microbial fuel cell performance in batch and continuous mode

    NASA Astrophysics Data System (ADS)

    Lanas, Vanessa; Ahn, Yongtae; Logan, Bruce E.

    2014-02-01

    Larger scale microbial fuel cells (MFCs) require compact architectures to efficiently treat wastewater. We examined how anode-brush diameter, number of anodes, and electrode spacing affected the performance of the MFCs operated in fed-batch and continuous flow mode. All anodes were initially tested with the brush core set at the same distance from the cathode. In fed-batch mode, the configuration with three larger brushes (25 mm diameter) produced 80% more power (1240 mW m-2) than reactors with eight smaller brushes (8 mm) (690 mW m-2). The higher power production by the larger brushes was due to more negative and stable anode potentials than the smaller brushes. The same general result was obtained in continuous flow operation, although power densities were reduced. However, by moving the center of the smaller brushes closer to the cathode (from 16.5 to 8 mm), power substantially increased from 690 to 1030 mW m-2 in fed batch mode. In continuous flow mode, power increased from 280 to 1020 mW m-2, resulting in more power production from the smaller brushes than the larger brushes (540 mW m-2). These results show that multi-electrode MFCs can be optimized by selecting smaller anodes, placed as close as possible to the cathode.

  13. Dynamic behavior of Yarrowia lipolytica in response to pH perturbations: dependence of the stress response on the culture mode.

    PubMed

    Timoumi, Asma; Cléret, Mégane; Bideaux, Carine; Guillouet, Stéphane E; Allouche, Yohan; Molina-Jouve, Carole; Fillaudeau, Luc; Gorret, Nathalie

    2017-01-01

    Yarrowia lipolytica, a non-conventional yeast with a promising biotechnological potential, is able to undergo metabolic and morphological changes in response to environmental conditions. The effect of pH perturbations of different types (pulses, Heaviside) on the dynamic behavior of Y. lipolytica W29 strain was characterized under two modes of culture: batch and continuous. In batch cultures, different pH (4.5, 5.6 (optimal condition), and 7) were investigated in order to identify the pH inducing a stress response (metabolic and/or morphologic) in Y. lipolytica. Macroscopic behavior (kinetic parameters, yields, viability) of the yeast was slightly affected by pH. However, contrary to the culture at pH 5.6, a filamentous growth was induced in batch experiments at pH 4.5 and 7. Proportions of the filamentous subpopulation reached 84 and 93 % (v/v) under acidic and neutral conditions, respectively. Given the significant impact of neutral pH on morphology, pH perturbations from 5.6 to 7 were subsequently assayed in batch and continuous bioreactors. For both process modes, the growth dynamics remained fundamentally unaltered during exposure to stress. Nevertheless, morphological behavior of the yeast was dependent on the culture mode. Specifically, in batch bioreactors where cells proliferated at their maximum growth rate, mycelia were mainly formed. Whereas, in continuous cultures at controlled growth rates (from 0.03 to 0.20 h -1 ) even closed to the maximum growth rate of the stain (0.24 h -1 ), yeast-like forms predominated. This pointed out differences in the kinetic behavior of filamentous and yeast subpopulations, cell age distribution, and pH adaptive mechanisms between both modes of culture.

  14. Negative reinforcement learning is affected in substance dependence.

    PubMed

    Thompson, Laetitia L; Claus, Eric D; Mikulich-Gilbertson, Susan K; Banich, Marie T; Crowley, Thomas; Krmpotich, Theodore; Miller, David; Tanabe, Jody

    2012-06-01

    Negative reinforcement results in behavior to escape or avoid an aversive outcome. Withdrawal symptoms are purported to be negative reinforcers in perpetuating substance dependence, but little is known about negative reinforcement learning in this population. The purpose of this study was to examine reinforcement learning in substance dependent individuals (SDI), with an emphasis on assessing negative reinforcement learning. We modified the Iowa Gambling Task to separately assess positive and negative reinforcement. We hypothesized that SDI would show differences in negative reinforcement learning compared to controls and we investigated whether learning differed as a function of the relative magnitude or frequency of the reinforcer. Thirty subjects dependent on psychostimulants were compared with 28 community controls on a decision making task that manipulated outcome frequencies and magnitudes and required an action to avoid a negative outcome. SDI did not learn to avoid negative outcomes to the same degree as controls. This difference was driven by the magnitude, not the frequency, of negative feedback. In contrast, approach behaviors in response to positive reinforcement were similar in both groups. Our findings are consistent with a specific deficit in negative reinforcement learning in SDI. SDI were relatively insensitive to the magnitude, not frequency, of loss. If this generalizes to drug-related stimuli, it suggests that repeated episodes of withdrawal may drive relapse more than the severity of a single episode. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  15. An Upside to Reward Sensitivity: The Hippocampus Supports Enhanced Reinforcement Learning in Adolescence.

    PubMed

    Davidow, Juliet Y; Foerde, Karin; Galván, Adriana; Shohamy, Daphna

    2016-10-05

    Adolescents are notorious for engaging in reward-seeking behaviors, a tendency attributed to heightened activity in the brain's reward systems during adolescence. It has been suggested that reward sensitivity in adolescence might be adaptive, but evidence of an adaptive role has been scarce. Using a probabilistic reinforcement learning task combined with reinforcement learning models and fMRI, we found that adolescents showed better reinforcement learning and a stronger link between reinforcement learning and episodic memory for rewarding outcomes. This behavioral benefit was related to heightened prediction error-related BOLD activity in the hippocampus and to stronger functional connectivity between the hippocampus and the striatum at the time of reinforcement. These findings reveal an important role for the hippocampus in reinforcement learning in adolescence and suggest that reward sensitivity in adolescence is related to adaptive differences in how adolescents learn from experience. Copyright © 2016 Elsevier Inc. All rights reserved.

  16. Roles of OA1 octopamine receptor and Dop1 dopamine receptor in mediating appetitive and aversive reinforcement revealed by RNAi studies

    PubMed Central

    Awata, Hiroko; Wakuda, Ryo; Ishimaru, Yoshiyasu; Matsuoka, Yuji; Terao, Kanta; Katata, Satomi; Matsumoto, Yukihisa; Hamanaka, Yoshitaka; Noji, Sumihare; Mito, Taro; Mizunami, Makoto

    2016-01-01

    Revealing reinforcing mechanisms in associative learning is important for elucidation of brain mechanisms of behavior. In mammals, dopamine neurons are thought to mediate both appetitive and aversive reinforcement signals. Studies using transgenic fruit-flies suggested that dopamine neurons mediate both appetitive and aversive reinforcements, through the Dop1 dopamine receptor, but our studies using octopamine and dopamine receptor antagonists and using Dop1 knockout crickets suggested that octopamine neurons mediate appetitive reinforcement and dopamine neurons mediate aversive reinforcement in associative learning in crickets. To fully resolve this issue, we examined the effects of silencing of expression of genes that code the OA1 octopamine receptor and Dop1 and Dop2 dopamine receptors by RNAi in crickets. OA1-silenced crickets exhibited impairment in appetitive learning with water but not in aversive learning with sodium chloride solution, while Dop1-silenced crickets exhibited impairment in aversive learning but not in appetitive learning. Dop2-silenced crickets showed normal scores in both appetitive learning and aversive learning. The results indicate that octopamine neurons mediate appetitive reinforcement via OA1 and that dopamine neurons mediate aversive reinforcement via Dop1 in crickets, providing decisive evidence that neurotransmitters and receptors that mediate appetitive reinforcement indeed differ among different species of insects. PMID:27412401

  17. Roles of OA1 octopamine receptor and Dop1 dopamine receptor in mediating appetitive and aversive reinforcement revealed by RNAi studies.

    PubMed

    Awata, Hiroko; Wakuda, Ryo; Ishimaru, Yoshiyasu; Matsuoka, Yuji; Terao, Kanta; Katata, Satomi; Matsumoto, Yukihisa; Hamanaka, Yoshitaka; Noji, Sumihare; Mito, Taro; Mizunami, Makoto

    2016-07-14

    Revealing reinforcing mechanisms in associative learning is important for elucidation of brain mechanisms of behavior. In mammals, dopamine neurons are thought to mediate both appetitive and aversive reinforcement signals. Studies using transgenic fruit-flies suggested that dopamine neurons mediate both appetitive and aversive reinforcements, through the Dop1 dopamine receptor, but our studies using octopamine and dopamine receptor antagonists and using Dop1 knockout crickets suggested that octopamine neurons mediate appetitive reinforcement and dopamine neurons mediate aversive reinforcement in associative learning in crickets. To fully resolve this issue, we examined the effects of silencing of expression of genes that code the OA1 octopamine receptor and Dop1 and Dop2 dopamine receptors by RNAi in crickets. OA1-silenced crickets exhibited impairment in appetitive learning with water but not in aversive learning with sodium chloride solution, while Dop1-silenced crickets exhibited impairment in aversive learning but not in appetitive learning. Dop2-silenced crickets showed normal scores in both appetitive learning and aversive learning. The results indicate that octopamine neurons mediate appetitive reinforcement via OA1 and that dopamine neurons mediate aversive reinforcement via Dop1 in crickets, providing decisive evidence that neurotransmitters and receptors that mediate appetitive reinforcement indeed differ among different species of insects.

  18. The reinforcing value and liking of resistance training and aerobic exercise as predictors of adult's physical activity.

    PubMed

    Flack, Kyle D; Johnson, LuAnn; Roemmich, James N

    2017-10-01

    Reinforcing value (motivating value) is a stronger predictor than hedonic value (liking) for engaging in drug use, gambling, and eating. The associations of reinforcing value and liking with physical activity of adults have not yet been studied and may depend on the modes of exercise (e.g., aerobic/cardiovascular exercise, resistance training) under consideration. The purpose of this study was to test associations of the reinforcing value and liking of aerobic exercise training (AT) and resistance exercise training (RT) modes of exercise with usual participation in aerobic and resistance exercise in adults. Men (n=38) and women (n=50) were measured for their liking and relative reinforcing value (RRV) of AT and RT, for their usual vigorous physical activity (VPA) participation, and for usual resistance exercise behavior (Yale physical activity questionnaire). The RRV of AT (RRVAT) and liking of AT were correlated, (r=0.22, p<0.04), as were the RRV of RT (RRVRT) and liking of RT (r=0.42, p˂0.01). The reinforcing value for, but not the liking of, a mode of exercise predicted how much an individual engaged in that mode of exercise. RRVAT (p˂0.01) was positively associated with usual VPA. RRVRT (p˂0.01) was positively associated with RT behavior. The hedonic value of AT and of RT were not associated (p>0.30) with VPA or RT behavior. Reinforcing value of a mode of exercise is a stronger predictor than the liking of that mode of exercise for usual amount of participation in the exercise. Published by Elsevier Inc.

  19. Batch production of microchannel plate photo-multipliers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Frisch, Henry J.; Wetstein, Matthew; Elagin, Andrey

    In-situ methods for the batch fabrication of flat-panel micro-channel plate (MCP) photomultiplier tube (PMT) detectors (MCP-PMTs), without transporting either the window or the detector assembly inside a vacuum vessel are provided. The method allows for the synthesis of a reflection-mode photocathode on the entrance to the pores of a first MCP or the synthesis of a transmission-mode photocathode on the vacuum side of a photodetector entrance window.

  20. Investigation of hydrocarbon oil transformation by gliding arc discharge: comparison of batch and recirculated configurations

    NASA Astrophysics Data System (ADS)

    Whitehead, J. Christopher; Prantsidou, Maria

    2016-04-01

    The degradation of liquid dodecane was studied in a gliding arc discharge (GAD) of humid argon or nitrogen. A batch or recirculating configuration was used. The products in the gaseous and liquid phase were analysed by infrared and chromatography and optical emission spectroscopy was used to identify the excited species in the discharge. The best degradation performance comes from the use of humid N2 but a GAD of humid argon produces fewer gas-phase products but more liquid-phase end-products. A wide range of products such as heavier saturated or unsaturated hydrocarbons both aliphatic and aromatic, and oxidation products mainly alcohols, but also aldehydes, ketones and esters are produced in the liquid-phase. The recirculating treatment mode is more effective than the batch mode increasing the reactivity and changing the product selectivities. Overall, the study shows promising results for the organic liquid waste treatment, especially in the recirculating mode.

  1. Longitudinal investigation on learned helplessness tested under negative and positive reinforcement involving stimulus control.

    PubMed

    Oliveira, Emileane C; Hunziker, Maria Helena

    2014-07-01

    In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title. Copyright © 2014. Published by Elsevier B.V.

  2. A need for a standardization in anaerobic digestion experiments? Let's get some insight from meta-analysis and multivariate analysis.

    PubMed

    Lavergne, Céline; Jeison, David; Ortega, Valentina; Chamy, Rolando; Donoso-Bravo, Andrés

    2018-09-15

    An important variability in the experimental results in anaerobic digestion lab test has been reported. This study presents a meta-analysis coupled with multivariate analysis aiming to assess the impact of this experimental variability in batch and continuous operation at mesophilic and thermophilic anaerobic digestion of waste activated sludge. An analysis of variance showed that there was no significant difference between mesophilic and thermophilic conditions in both continuous and batch conditions. Concerning the operation mode, the values of methane yield were significantly higher in batch experiment than in continuous reactors. According to the PCA, for both cases, the methane yield is positive correlated to the temperature rises. Interestingly, in the batch experiments, the higher the volatile solids in the substrate was, the lowest was the methane production, which is correlated to experimental flaws when setting up those tests. In continuous mode, unlike the batch test, the methane yield is strongly (positively) correlated to the organic content of the substrate. Experimental standardization, above all, in batch conditions are urgently necessary or move to continuous experiments for reporting results. The modeling can also be a source of disturbance in batch test. Copyright © 2018 Elsevier Ltd. All rights reserved.

  3. Rational and mechanistic perspectives on reinforcement learning.

    PubMed

    Chater, Nick

    2009-12-01

    This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: mechanistic and rational. Reinforcement learning is often viewed in mechanistic terms--as describing the operation of aspects of an agent's cognitive and neural machinery. Yet it can also be viewed as a rational level of description, specifically, as describing a class of methods for learning from experience, using minimal background knowledge. This paper considers how rational and mechanistic perspectives differ, and what types of evidence distinguish between them. Reinforcement learning research in the cognitive and brain sciences is often implicitly committed to the mechanistic interpretation. Here the opposite view is put forward: that accounts of reinforcement learning should apply at the rational level, unless there is strong evidence for a mechanistic interpretation. Implications of this viewpoint for reinforcement-based theories in the cognitive and brain sciences are discussed.

  4. Stress affects instrumental learning based on positive or negative reinforcement in interaction with personality in domestic horses

    PubMed Central

    Valenchon, Mathilde; Lévy, Frédéric; Moussu, Chantal; Lansade, Léa

    2017-01-01

    The present study investigated how stress affects instrumental learning performance in horses (Equus caballus) depending on the type of reinforcement. Horses were assigned to four groups (N = 15 per group); each group received training with negative or positive reinforcement in the presence or absence of stressors unrelated to the learning task. The instrumental learning task consisted of the horse entering one of two compartments at the appearance of a visual signal given by the experimenter. In the absence of stressors unrelated to the task, learning performance did not differ between negative and positive reinforcements. The presence of stressors unrelated to the task (exposure to novel and sudden stimuli) impaired learning performance. Interestingly, this learning deficit was smaller when the negative reinforcement was used. The negative reinforcement, considered as a stressor related to the task, could have counterbalanced the impact of the extrinsic stressor by focusing attention toward the learning task. In addition, learning performance appears to differ between certain dimensions of personality depending on the presence of stressors and the type of reinforcement. These results suggest that when negative reinforcement is used (i.e. stressor related to the task), the most fearful horses may be the best performers in the absence of stressors but the worst performers when stressors are present. On the contrary, when positive reinforcement is used, the most fearful horses appear to be consistently the worst performers, with and without exposure to stressors unrelated to the learning task. This study is the first to demonstrate in ungulates that stress affects learning performance differentially according to the type of reinforcement and in interaction with personality. It provides fundamental and applied perspectives in the understanding of the relationships between personality and training abilities. PMID:28475581

  5. Racial bias shapes social reinforcement learning.

    PubMed

    Lindström, Björn; Selbing, Ida; Molapour, Tanaz; Olsson, Andreas

    2014-03-01

    Both emotional facial expressions and markers of racial-group belonging are ubiquitous signals in social interaction, but little is known about how these signals together affect future behavior through learning. To address this issue, we investigated how emotional (threatening or friendly) in-group and out-group faces reinforced behavior in a reinforcement-learning task. We asked whether reinforcement learning would be modulated by intergroup attitudes (i.e., racial bias). The results showed that individual differences in racial bias critically modulated reinforcement learning. As predicted, racial bias was associated with more efficiently learned avoidance of threatening out-group individuals. We used computational modeling analysis to quantitatively delimit the underlying processes affected by social reinforcement. These analyses showed that racial bias modulates the rate at which exposure to threatening out-group individuals is transformed into future avoidance behavior. In concert, these results shed new light on the learning processes underlying social interaction with racial-in-group and out-group individuals.

  6. "Notice of Violation of IEEE Publication Principles" Multiobjective Reinforcement Learning: A Comprehensive Overview.

    PubMed

    Liu, Chunming; Xu, Xin; Hu, Dewen

    2013-04-29

    Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted.

  7. Hierarchically organized behavior and its neural foundations: A reinforcement-learning perspective

    PubMed Central

    Botvinick, Matthew M.; Niv, Yael; Barto, Andrew C.

    2009-01-01

    Research on human and animal behavior has long emphasized its hierarchical structure — the divisibility of ongoing behavior into discrete tasks, which are comprised of subtask sequences, which in turn are built of simple actions. The hierarchical structure of behavior has also been of enduring interest within neuroscience, where it has been widely considered to reflect prefrontal cortical functions. In this paper, we reexamine behavioral hierarchy and its neural substrates from the point of view of recent developments in computational reinforcement learning. Specifically, we consider a set of approaches known collectively as hierarchical reinforcement learning, which extend the reinforcement learning paradigm by allowing the learning agent to aggregate actions into reusable subroutines or skills. A close look at the components of hierarchical reinforcement learning suggests how they might map onto neural structures, in particular regions within the dorsolateral and orbital prefrontal cortex. It also suggests specific ways in which hierarchical reinforcement learning might provide a complement to existing psychological models of hierarchically structured behavior. A particularly important question that hierarchical reinforcement learning brings to the fore is that of how learning identifies new action routines that are likely to provide useful building blocks in solving a wide range of future problems. Here and at many other points, hierarchical reinforcement learning offers an appealing framework for investigating the computational and neural underpinnings of hierarchically structured behavior. PMID:18926527

  8. Batch Proving and Proof Scripting in PVS

    NASA Technical Reports Server (NTRS)

    Munoz, Cesar A.

    2007-01-01

    The batch execution modes of PVS are powerful, but highly technical, features of the system that are mostly accessible to expert users. This paper presents a PVS tool, called ProofLite, that extends the theorem prover interface with a batch proving utility and a proof scripting notation. ProofLite enables a semi-literate proving style where specification and proof scripts reside in the same file. The goal of ProofLite is to provide batch proving and proof scripting capabilities to regular, non-expert, users of PVS.

  9. A Regularizer Approach for RBF Networks Under the Concurrent Weight Failure Situation.

    PubMed

    Leung, Chi-Sing; Wan, Wai Yan; Feng, Ruibin

    2017-06-01

    Many existing results on fault-tolerant algorithms focus on the single fault source situation, where a trained network is affected by one kind of weight failure. In fact, a trained network may be affected by multiple kinds of weight failure. This paper first studies how the open weight fault and the multiplicative weight noise degrade the performance of radial basis function (RBF) networks. Afterward, we define the objective function for training fault-tolerant RBF networks. Based on the objective function, we then develop two learning algorithms, one batch mode and one online mode. Besides, the convergent conditions of our online algorithm are investigated. Finally, we develop a formula to estimate the test set error of faulty networks trained from our approach. This formula helps us to optimize some tuning parameters, such as RBF width.

  10. Navigating complex decision spaces: Problems and paradigms in sequential choice

    PubMed Central

    Walsh, Matthew M.; Anderson, John R.

    2015-01-01

    To behave adaptively, we must learn from the consequences of our actions. Doing so is difficult when the consequences of an action follow a delay. This introduces the problem of temporal credit assignment. When feedback follows a sequence of decisions, how should the individual assign credit to the intermediate actions that comprise the sequence? Research in reinforcement learning provides two general solutions to this problem: model-free reinforcement learning and model-based reinforcement learning. In this review, we examine connections between stimulus-response and cognitive learning theories, habitual and goal-directed control, and model-free and model-based reinforcement learning. We then consider a range of problems related to temporal credit assignment. These include second-order conditioning and secondary reinforcers, latent learning and detour behavior, partially observable Markov decision processes, actions with distributed outcomes, and hierarchical learning. We ask whether humans and animals, when faced with these problems, behave in a manner consistent with reinforcement learning techniques. Throughout, we seek to identify neural substrates of model-free and model-based reinforcement learning. The former class of techniques is understood in terms of the neurotransmitter dopamine and its effects in the basal ganglia. The latter is understood in terms of a distributed network of regions including the prefrontal cortex, medial temporal lobes cerebellum, and basal ganglia. Not only do reinforcement learning techniques have a natural interpretation in terms of human and animal behavior, but they also provide a useful framework for understanding neural reward valuation and action selection. PMID:23834192

  11. Implementation study of an analog spiking neural network for assisting cardiac delay prediction in a cardiac resynchronization therapy device.

    PubMed

    Sun, Qing; Schwartz, François; Michel, Jacques; Herve, Yannick; Dalmolin, Renzo

    2011-06-01

    In this paper, we aim at developing an analog spiking neural network (SNN) for reinforcing the performance of conventional cardiac resynchronization therapy (CRT) devices (also called biventricular pacemakers). Targeting an alternative analog solution in 0.13- μm CMOS technology, this paper proposes an approach to improve cardiac delay predictions in every cardiac period in order to assist the CRT device to provide real-time optimal heartbeats. The primary analog SNN architecture is proposed and its implementation is studied to fulfill the requirement of very low energy consumption. By using the Hebbian learning and reinforcement learning algorithms, the intended adaptive CRT device works with different functional modes. The simulations of both learning algorithms have been carried out, and they were shown to demonstrate the global functionalities. To improve the realism of the system, we introduce various heart behavior models (with constant/variable heart rates) that allow pathologic simulations with/without noise on the signals of the input sensors. The simulations of the global system (pacemaker models coupled with heart models) have been investigated and used to validate the analog spiking neural network implementation.

  12. Model-Based Reinforcement Learning under Concurrent Schedules of Reinforcement in Rodents

    ERIC Educational Resources Information Center

    Huh, Namjung; Jo, Suhyun; Kim, Hoseok; Sul, Jung Hoon; Jung, Min Whan

    2009-01-01

    Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's…

  13. Media and Information Literacy (MIL) in journalistic learning: strategies for accurately engaging with information and reporting news

    NASA Astrophysics Data System (ADS)

    Inayatillah, F.

    2018-01-01

    In the era of digital technology, there is abundant information from various sources. This ease of access needs to be accompanied by the ability to engage with the information wisely. Thus, information and media literacy is required. From the results of preliminary observations, it was found that the students of Universitas Negeri Surabaya, whose major is Indonesian Literature, and they take journalistic course lack of the skill of media and information literacy (MIL). Therefore, they need to be equipped with MIL. The method used is descriptive qualitative, which includes data collection, data analysis, and presentation of data analysis. Observation and documentation techniques were used to obtain data of MIL’s impact on journalistic learning for students. This study aims at describing the important role of MIL for students of journalistic and its impact on journalistic learning for students of Indonesian literature batch 2014. The results of this research indicate that journalistic is a science that is essential for students because it affects how a person perceives news report. Through the reinforcement of the course, students can avoid a hoax. MIL-based journalistic learning makes students will be more skillful at absorbing, processing, and presenting information accurately. The subject influences students in engaging with information so that they can report news credibly.

  14. GA-based fuzzy reinforcement learning for control of a magnetic bearing system.

    PubMed

    Lin, C T; Jou, C P

    2000-01-01

    This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system.

  15. Optimization of high solids fed-batch saccharification of sugarcane bagasse based on system viscosity changes.

    PubMed

    Liu, Yunyun; Xu, Jingliang; Zhang, Yu; Yuan, Zhenhong; Xie, Jun

    2015-10-10

    Viscosity trends in alkali-pretreated sugarcane bagasse (SCB) slurries undergoing high solids fed-batch enzymatic hydrolysis were measured for a range of solids loading from 15% to 36%. Solids liquefaction times were related to system viscosity changes. The viscosity decreased quickly for low solids loading, and increased with increasing solids content. Fed-batch hydrolysis was initiated with 15% solids loading, and an additional 8%, 7% and 6% were successively added after the system viscosity decreased to stable values to achieve a final solids content of 36%. Two enzyme-adding modes with 8.5FPU/g solid were investigated. The batch mode with all enzyme being added at the beginning of the reaction produced the highest yields, with approximately 231.7g/L total sugars and 134.9g/L glucose being obtained after 96h with nearly 60% of the final glucan conversion rate. This finding indicates that under the right conditions, the fed-batch strategy might be a plausible way to produce high sugars under high solids. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. 11.2 YIP Human In the Loop Statistical RelationalLearners

    DTIC Science & Technology

    2017-10-23

    learning formalisms including inverse reinforcement learning [4] and statistical relational learning [7, 5, 8]. We have also applied our algorithms in...one introduced for label preferences. 4 Figure 2: Active Advice Seeking for Inverse Reinforcement Learning. active advice seeking is in selecting the...learning tasks. 1.2.1 Sequential Decision-Making Our previous work on advice for inverse reinforcement learning (IRL) defined advice as action

  17. Classroom-based and distance learning education and training courses in end-of-life care for health and social care staff: a systematic review.

    PubMed

    Pulsford, David; Jackson, Georgina; O'Brien, Terri; Yates, Sue; Duxbury, Joy

    2013-03-01

    Staff from a range of health and social care professions report deficits in their knowledge and skills when providing end-of-life and palliative care, and education and training has been advocated at a range of levels. To review the literature related to classroom-based and distance learning education and training initiatives for health and social care staff in end-of-life and palliative care, in terms of their target audience, extent, modes of delivery, content and teaching and learning strategies, and to identify the most effective educational strategies for enhancing care. A systematic review of the literature evaluating classroom-based and distance learning education and training courses for health and social care staff in end-of-life and palliative care. Online databases CINAHL, MEDLINE, EMBASE and PSYCHINFO between January 2000 and July 2010. Studies were selected that discussed specific education and training initiatives and included pre-and post-test evaluation of participants' learning. 30 studies met eligibility criteria. The majority reported successful outcomes, though there were some exceptions. Level of prior experience and availability of practice reinforcement influenced learning. Participative and interactive learning strategies were predominantly used along with discussion of case scenarios. Multi-professional learning was infrequently reported and service user and carer input to curriculum development and delivery was reported in only one study. Classroom-based education and training is useful for enhancing professionals' skills and perceived preparedness for delivering end-of-life care but should be reinforced by actual practice experience.

  18. Effects of dopamine on reinforcement learning and consolidation in Parkinson's disease.

    PubMed

    Grogan, John P; Tsivos, Demitra; Smith, Laura; Knight, Brogan E; Bogacz, Rafal; Whone, Alan; Coulthard, Elizabeth J

    2017-07-10

    Emerging evidence suggests that dopamine may modulate learning and memory with important implications for understanding the neurobiology of memory and future therapeutic targeting. An influential hypothesis posits that dopamine biases reinforcement learning. More recent data also suggest an influence during both consolidation and retrieval. Eighteen Parkinson's disease patients learned through feedback ON or OFF medication, with memory tested 24 hr later ON or OFF medication (4 conditions, within-subjects design with matched healthy control group). Patients OFF medication during learning decreased in memory accuracy over the following 24 hr. In contrast to previous studies, however, dopaminergic medication during learning and testing did not affect expression of positive or negative reinforcement. Two further experiments were run without the 24 hr delay, but they too failed to reproduce effects of dopaminergic medication on reinforcement learning. While supportive of a dopaminergic role in consolidation, this study failed to replicate previous findings on reinforcement learning.

  19. Enhanced Experience Replay for Deep Reinforcement Learning

    DTIC Science & Technology

    2015-11-01

    ARL-TR-7538 ● NOV 2015 US Army Research Laboratory Enhanced Experience Replay for Deep Reinforcement Learning by David Doria...Experience Replay for Deep Reinforcement Learning by David Doria, Bryan Dawson, and Manuel Vindiola Computational and Information Sciences Directorate...

  20. Prespeech motor learning in a neural network using reinforcement.

    PubMed

    Warlaumont, Anne S; Westermann, Gert; Buder, Eugene H; Oller, D Kimbrough

    2013-02-01

    Vocal motor development in infancy provides a crucial foundation for language development. Some significant early accomplishments include learning to control the process of phonation (the production of sound at the larynx) and learning to produce the sounds of one's language. Previous work has shown that social reinforcement shapes the kinds of vocalizations infants produce. We present a neural network model that provides an account of how vocal learning may be guided by reinforcement. The model consists of a self-organizing map that outputs to muscles of a realistic vocalization synthesizer. Vocalizations are spontaneously produced by the network. If a vocalization meets certain acoustic criteria, it is reinforced, and the weights are updated to make similar muscle activations increasingly likely to recur. We ran simulations of the model under various reinforcement criteria and tested the types of vocalizations it produced after learning in the different conditions. When reinforcement was contingent on the production of phonated (i.e. voiced) sounds, the network's post-learning productions were almost always phonated, whereas when reinforcement was not contingent on phonation, the network's post-learning productions were almost always not phonated. When reinforcement was contingent on both phonation and proximity to English vowels as opposed to Korean vowels, the model's post-learning productions were more likely to resemble the English vowels and vice versa. Copyright © 2012 Elsevier Ltd. All rights reserved.

  1. Reconciling Reinforcement Learning Models with Behavioral Extinction and Renewal: Implications for Addiction, Relapse, and Problem Gambling

    ERIC Educational Resources Information Center

    Redish, A. David; Jensen, Steve; Johnson, Adam; Kurth-Nelson, Zeb

    2007-01-01

    Because learned associations are quickly renewed following extinction, the extinction process must include processes other than unlearning. However, reinforcement learning models, such as the temporal difference reinforcement learning (TDRL) model, treat extinction as an unlearning of associated value and are thus unable to capture renewal. TDRL…

  2. A comparison of process performance during the anaerobic mono- and co-digestion of slaughterhouse waste through different operational modes.

    PubMed

    Pagés-Díaz, Jhosané; Pereda-Reyes, Ileana; Sanz, Jose Luis; Lundin, Magnus; Taherzadeh, Mohammad J; Horváth, Ilona Sárvári

    2018-02-01

    The use of consecutive feeding was applied to investigate the response of the microbial biomass to a second addition of substrates in terms of biodegradation using batch tests as a promising alternative to predict the behavior of the process. Anaerobic digestion (AD) of the slaughterhouse waste (SB) and its co-digestion with manure (M), various crops (VC), and municipal solid waste were evaluated. The results were then correlated to previous findings obtained by the authors for similar mixtures in batch and semi-continuous operation modes. AD of the SB failed showing total inhibition after a second feeding. Co-digestion of the SB+M showed a significant improvement for all of the response variables investigated after the second feeding, while co-digestion of the SB+VC resulted in a decline in all of these response variables. Similar patterns were previously detected, during both the batch and the semi-continuous modes. Copyright © 2017. Published by Elsevier B.V.

  3. Fatigue crack growth in fiber reinforced plastics

    NASA Technical Reports Server (NTRS)

    Mandell, J. F.

    1979-01-01

    Fatigue crack growth in fiber composites occurs by such complex modes as to frustrate efforts at developing comprehensive theories and models. Under certain loading conditions and with certain types of reinforcement, simpler modes of fatigue crack growth are observed. These modes are more amenable to modeling efforts, and the fatigue crack growth rate can be predicted in some cases. Thus, a formula for prediction of ligamented mode fatigue crack growth rate is available.

  4. Behavioral and neural properties of social reinforcement learning

    PubMed Central

    Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Libby, Victoria; Glover, Gary; Voss, Henning U.; Ballon, Douglas J.; Casey, BJ

    2011-01-01

    Social learning is critical for engaging in complex interactions with other individuals. Learning from positive social exchanges, such as acceptance from peers, may be similar to basic reinforcement learning. We formally test this hypothesis by developing a novel paradigm that is based upon work in non-human primates and human imaging studies of reinforcement learning. The probability of receiving positive social reinforcement from three distinct peers was parametrically manipulated while brain activity was recorded in healthy adults using event-related functional magnetic resonance imaging (fMRI). Over the course of the experiment, participants responded more quickly to faces of peers who provided more frequent positive social reinforcement, and rated them as more likeable. Modeling trial-by-trial learning showed ventral striatum and orbital frontal cortex activity correlated positively with forming expectations about receiving social reinforcement. Rostral anterior cingulate cortex activity tracked positively with modulations of expected value of the cues (peers). Together, the findings across three levels of analysis - social preferences, response latencies and modeling neural responses – are consistent with reinforcement learning theory and non-human primate electrophysiological studies of reward. This work highlights the fundamental influence of acceptance by one’s peers in altering subsequent behavior. PMID:21917787

  5. Annual ADP planning document

    NASA Technical Reports Server (NTRS)

    Mogilevsky, M.

    1973-01-01

    The Category A computer systems at KSC (Al and A2) which perform scientific and business/administrative operations are described. This data division is responsible for scientific requirements supporting Saturn, Atlas/Centaur, Titan/Centaur, Titan III, and Delta vehicles, and includes realtime functions, Apollo-Soyuz Test Project (ASTP), and the Space Shuttle. The work is performed chiefly on the GEL-635 (Al) system located in the Central Instrumentation Facility (CIF). The Al system can perform computations and process data in three modes: (1) real-time critical mode; (2) real-time batch mode; and (3) batch mode. The Division's IBM-360/50 (A2) system, also at the CIF, performs business/administrative data processing such as personnel, procurement, reliability, financial management and payroll, real-time inventory management, GSE accounting, preventive maintenance, and integrated launch vehicle modification status.

  6. The combination of appetitive and aversive reinforcers and the nature of their interaction during auditory learning.

    PubMed

    Ilango, A; Wetzel, W; Scheich, H; Ohl, F W

    2010-03-31

    Learned changes in behavior can be elicited by either appetitive or aversive reinforcers. It is, however, not clear whether the two types of motivation, (approaching appetitive stimuli and avoiding aversive stimuli) drive learning in the same or different ways, nor is their interaction understood in situations where the two types are combined in a single experiment. To investigate this question we have developed a novel learning paradigm for Mongolian gerbils, which not only allows rewards and punishments to be presented in isolation or in combination with each other, but also can use these opposite reinforcers to drive the same learned behavior. Specifically, we studied learning of tone-conditioned hurdle crossing in a shuttle box driven by either an appetitive reinforcer (brain stimulation reward) or an aversive reinforcer (electrical footshock), or by a combination of both. Combination of the two reinforcers potentiated speed of acquisition, led to maximum possible performance, and delayed extinction as compared to either reinforcer alone. Additional experiments, using partial reinforcement protocols and experiments in which one of the reinforcers was omitted after the animals had been previously trained with the combination of both reinforcers, indicated that appetitive and aversive reinforcers operated together but acted in different ways: in this particular experimental context, punishment appeared to be more effective for initial acquisition and reward more effective to maintain a high level of conditioned responses (CRs). The results imply that learning mechanisms in problem solving were maximally effective when the initial punishment of mistakes was combined with the subsequent rewarding of correct performance. Copyright 2010 IBRO. Published by Elsevier Ltd. All rights reserved.

  7. Punishment Insensitivity and Impaired Reinforcement Learning in Preschoolers

    ERIC Educational Resources Information Center

    Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.

    2014-01-01

    Background: Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a…

  8. The cerebellum: a neural system for the study of reinforcement learning.

    PubMed

    Swain, Rodney A; Kerr, Abigail L; Thompson, Richard F

    2011-01-01

    In its strictest application, the term "reinforcement learning" refers to a computational approach to learning in which an agent (often a machine) interacts with a mutable environment to maximize reward through trial and error. The approach borrows essentials from several fields, most notably Computer Science, Behavioral Neuroscience, and Psychology. At the most basic level, a neural system capable of mediating reinforcement learning must be able to acquire sensory information about the external environment and internal milieu (either directly or through connectivities with other brain regions), must be able to select a behavior to be executed, and must be capable of providing evaluative feedback about the success of that behavior. Given that Psychology informs us that reinforcers, both positive and negative, are stimuli or consequences that increase the probability that the immediately antecedent behavior will be repeated and that reinforcer strength or viability is modulated by the organism's past experience with the reinforcer, its affect, and even the state of its muscles (e.g., eyes open or closed); it is the case that any neural system that supports reinforcement learning must also be sensitive to these same considerations. Once learning is established, such a neural system must finally be able to maintain continued response expression and prevent response drift. In this report, we examine both historical and recent evidence that the cerebellum satisfies all of these requirements. While we report evidence from a variety of learning paradigms, the majority of our discussion will focus on classical conditioning of the rabbit eye blink response as an ideal model system for the study of reinforcement and reinforcement learning.

  9. Continuous Flow Chemistry: Reaction of Diphenyldiazomethane with p-Nitrobenzoic Acid.

    PubMed

    Aw, Alex; Fritz, Marshall; Napoline, Jonathan W; Pollet, Pamela; Liotta, Charles L

    2017-11-15

    Continuous flow technology has been identified as instrumental for its environmental and economic advantages leveraging superior mixing, heat transfer and cost savings through the "scaling out" strategy as opposed to the traditional "scaling up". Herein, we report the reaction of diphenyldiazomethane with p-nitrobenzoic acid in both batch and flow modes. To effectively transfer the reaction from batch to flow mode, it is essential to first conduct the reaction in batch. As a consequence, the reaction of diphenyldiazomethane was first studied in batch as a function of temperature, reaction time, and concentration to obtain kinetic information and process parameters. The glass flow reactor set-up is described and combines two types of reaction modules with "mixing" and "linear" microstructures. Finally, the reaction of diphenyldiazomethane with p-nitrobenzoic acid was successfully conducted in the flow reactor, with up to 95% conversion of the diphenyldiazomethane in 11 min. This proof of concept reaction aims to provide insight for scientists to consider flow technology's competitiveness, sustainability, and versatility in their research.

  10. Lesions of dorsal striatum eliminate lose-switch responding but not mixed-response strategies in rats.

    PubMed

    Skelin, Ivan; Hakstol, Rhys; VanOyen, Jenn; Mudiayi, Dominic; Molina, Leonardo A; Holec, Victoria; Hong, Nancy S; Euston, David R; McDonald, Robert J; Gruber, Aaron J

    2014-05-01

    We used focal brain lesions in rats to examine how dorsomedial (DMS) and dorsolateral (DLS) regions of the striatum differently contribute to response adaptation driven by the delivery or omission of rewards. Rats performed a binary choice task under two modes: one in which responses were rewarded on half of the trials regardless of choice; and another 'competitive' one in which only unpredictable choices were rewarded. In both modes, control animals were more likely to use a predictable lose-switch strategy than animals with lesions of either DMS or DLS. Animals with lesions of DMS presumably relied more on DLS for behavioural control, and generated repetitive responses in the first mode. These animals then shifted to a random response strategy in the competitive mode, thereby performing better than controls or animals with DLS lesions. Analysis using computational models of reinforcement learning indicated that animals with striatal lesions, particularly of the DLS, had blunted reward sensitivity and less stochasticity in the choice mechanism. These results provide further evidence that the rodent DLS is involved in rapid response adaptation that is more sophisticated than that embodied by the classic notion of habit formation driven by gradual stimulus-response learning. © 2014 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.

  11. Switching the mode of sucrose utilization by Saccharomyces cerevisiae

    PubMed Central

    Badotti, Fernanda; Dário, Marcelo G; Alves, Sergio L; Cordioli, Maria Luiza A; Miletti, Luiz C; de Araujo, Pedro S; Stambuk, Boris U

    2008-01-01

    Background Overflow metabolism is an undesirable characteristic of aerobic cultures of Saccharomyces cerevisiae during biomass-directed processes. It results from elevated sugar consumption rates that cause a high substrate conversion to ethanol and other bi-products, severely affecting cell physiology, bioprocess performance, and biomass yields. Fed-batch culture, where sucrose consumption rates are controlled by the external addition of sugar aiming at its low concentrations in the fermentor, is the classical bioprocessing alternative to prevent sugar fermentation by yeasts. However, fed-batch fermentations present drawbacks that could be overcome by simpler batch cultures at relatively high (e.g. 20 g/L) initial sugar concentrations. In this study, a S. cerevisiae strain lacking invertase activity was engineered to transport sucrose into the cells through a low-affinity and low-capacity sucrose-H+ symport activity, and the growth kinetics and biomass yields on sucrose analyzed using simple batch cultures. Results We have deleted from the genome of a S. cerevisiae strain lacking invertase the high-affinity sucrose-H+ symporter encoded by the AGT1 gene. This strain could still grow efficiently on sucrose due to a low-affinity and low-capacity sucrose-H+ symport activity mediated by the MALx1 maltose permeases, and its further intracellular hydrolysis by cytoplasmic maltases. Although sucrose consumption by this engineered yeast strain was slower than with the parental yeast strain, the cells grew efficiently on sucrose due to an increased respiration of the carbon source. Consequently, this engineered yeast strain produced less ethanol and 1.5 to 2 times more biomass when cultivated in simple batch mode using 20 g/L sucrose as the carbon source. Conclusion Higher cell densities during batch cultures on 20 g/L sucrose were achieved by using a S. cerevisiae strain engineered in the sucrose uptake system. Such result was accomplished by effectively reducing sucrose uptake by the yeast cells, avoiding overflow metabolism, with the concomitant reduction in ethanol production. The use of this modified yeast strain in simpler batch culture mode can be a viable option to more complicated traditional sucrose-limited fed-batch cultures for biomass-directed processes of S. cerevisiae. PMID:18304329

  12. Switching the mode of sucrose utilization by Saccharomyces cerevisiae.

    PubMed

    Badotti, Fernanda; Dário, Marcelo G; Alves, Sergio L; Cordioli, Maria Luiza A; Miletti, Luiz C; de Araujo, Pedro S; Stambuk, Boris U

    2008-02-27

    Overflow metabolism is an undesirable characteristic of aerobic cultures of Saccharomyces cerevisiae during biomass-directed processes. It results from elevated sugar consumption rates that cause a high substrate conversion to ethanol and other bi-products, severely affecting cell physiology, bioprocess performance, and biomass yields. Fed-batch culture, where sucrose consumption rates are controlled by the external addition of sugar aiming at its low concentrations in the fermentor, is the classical bioprocessing alternative to prevent sugar fermentation by yeasts. However, fed-batch fermentations present drawbacks that could be overcome by simpler batch cultures at relatively high (e.g. 20 g/L) initial sugar concentrations. In this study, a S. cerevisiae strain lacking invertase activity was engineered to transport sucrose into the cells through a low-affinity and low-capacity sucrose-H+ symport activity, and the growth kinetics and biomass yields on sucrose analyzed using simple batch cultures. We have deleted from the genome of a S. cerevisiae strain lacking invertase the high-affinity sucrose-H+ symporter encoded by the AGT1 gene. This strain could still grow efficiently on sucrose due to a low-affinity and low-capacity sucrose-H+ symport activity mediated by the MALx1 maltose permeases, and its further intracellular hydrolysis by cytoplasmic maltases. Although sucrose consumption by this engineered yeast strain was slower than with the parental yeast strain, the cells grew efficiently on sucrose due to an increased respiration of the carbon source. Consequently, this engineered yeast strain produced less ethanol and 1.5 to 2 times more biomass when cultivated in simple batch mode using 20 g/L sucrose as the carbon source. Higher cell densities during batch cultures on 20 g/L sucrose were achieved by using a S. cerevisiae strain engineered in the sucrose uptake system. Such result was accomplished by effectively reducing sucrose uptake by the yeast cells, avoiding overflow metabolism, with the concomitant reduction in ethanol production. The use of this modified yeast strain in simpler batch culture mode can be a viable option to more complicated traditional sucrose-limited fed-batch cultures for biomass-directed processes of S. cerevisiae.

  13. The role of GABAB receptors in human reinforcement learning.

    PubMed

    Ort, Andres; Kometer, Michael; Rohde, Judith; Seifritz, Erich; Vollenweider, Franz X

    2014-10-01

    Behavioral evidence from human studies suggests that the γ-aminobutyric acid type B receptor (GABAB receptor) agonist baclofen modulates reinforcement learning and reduces craving in patients with addiction spectrum disorders. However, in contrast to the well established role of dopamine in reinforcement learning, the mechanisms by which the GABAB receptor influences reinforcement learning in humans remain completely unknown. To further elucidate this issue, a cross-over, double-blind, placebo-controlled study was performed in healthy human subjects (N=15) to test the effects of baclofen (20 and 50mg p.o.) on probabilistic reinforcement learning. Outcomes were the feedback-induced P2 component of the event-related potential, the feedback-related negativity, and the P300 component of the event-related potential. Baclofen produced a reduction of P2 amplitude over the course of the experiment, but did not modulate the feedback-related negativity. Furthermore, there was a trend towards increased learning after baclofen administration relative to placebo over the course of the experiment. The present results extend previous theories of reinforcement learning, which focus on the importance of mesolimbic dopamine signaling, and indicate that stimulation of cortical GABAB receptors in a fronto-parietal network leads to better attentional allocation in reinforcement learning. This observation is a first step in our understanding of how baclofen may improve reinforcement learning in healthy subjects. Further studies with bigger sample sizes are needed to corroborate this conclusion and furthermore, test this effect in patients with addiction spectrum disorder. Copyright © 2014 Elsevier B.V. and ECNP. All rights reserved.

  14. Effects of dopamine on reinforcement learning and consolidation in Parkinson’s disease

    PubMed Central

    Grogan, John P; Tsivos, Demitra; Smith, Laura; Knight, Brogan E; Bogacz, Rafal; Whone, Alan; Coulthard, Elizabeth J

    2017-01-01

    Emerging evidence suggests that dopamine may modulate learning and memory with important implications for understanding the neurobiology of memory and future therapeutic targeting. An influential hypothesis posits that dopamine biases reinforcement learning. More recent data also suggest an influence during both consolidation and retrieval. Eighteen Parkinson’s disease patients learned through feedback ON or OFF medication, with memory tested 24 hr later ON or OFF medication (4 conditions, within-subjects design with matched healthy control group). Patients OFF medication during learning decreased in memory accuracy over the following 24 hr. In contrast to previous studies, however, dopaminergic medication during learning and testing did not affect expression of positive or negative reinforcement. Two further experiments were run without the 24 hr delay, but they too failed to reproduce effects of dopaminergic medication on reinforcement learning. While supportive of a dopaminergic role in consolidation, this study failed to replicate previous findings on reinforcement learning. DOI: http://dx.doi.org/10.7554/eLife.26801.001 PMID:28691905

  15. 40 CFR 63.1323 - Batch process vents-methods and procedures for group determination.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... accepted chemical engineering principles, measurable process parameters, or physical or chemical laws or... recovering monomer, reaction products, by-products, or solvent from a stripper operated in batch mode, and the primary condenser recovering monomer, reaction products, by-products, or solvent from a...

  16. Strategies for improving production performance of probiotic Pediococcus acidilactici viable cell by overcoming lactic acid inhibition.

    PubMed

    Othman, Majdiah; Ariff, Arbakariya B; Wasoh, Helmi; Kapri, Mohd Rizal; Halim, Murni

    2017-11-27

    Lactic acid bacteria are industrially important microorganisms recognized for fermentative ability mostly in their probiotic benefits as well as lactic acid production for various applications. Fermentation conditions such as concentration of initial glucose in the culture, concentration of lactic acid accumulated in the culture, types of pH control strategy, types of aeration mode and different agitation speed had influenced the cultivation performance of batch fermentation of Pediococcus acidilactici. The maximum viable cell concentration obtained in constant fed-batch fermentation at a feeding rate of 0.015 L/h was 6.1 times higher with 1.6 times reduction in lactic acid accumulation compared to batch fermentation. Anion exchange resin, IRA 67 was found to have the highest selectivity towards lactic acid compared to other components studied. Fed-batch fermentation of P. acidilactici coupled with lactic acid removal system using IRA 67 resin showed 55.5 and 9.1 times of improvement in maximum viable cell concentration compared to fermentation without resin for batch and fed-batch mode respectively. The improvement of the P. acidilactici growth in the constant fed-batch fermentation indicated the use of minimal and simple process control equipment is an effective approach for reducing by-product inhibition. Further improvement in the cultivation performance of P. acidilactici in fed-bath fermentation with in situ addition of anion-exchange resin significantly helped to enhance the growth of P. acidilactici by reducing the inhibitory effect of lactic acid and thus increasing probiotic production.

  17. Fear of losing money? Aversive conditioning with secondary reinforcers.

    PubMed

    Delgado, M R; Labouliere, C D; Phelps, E A

    2006-12-01

    Money is a secondary reinforcer that acquires its value through social communication and interaction. In everyday human behavior and laboratory studies, money has been shown to influence appetitive or reward learning. It is unclear, however, if money has a similar impact on aversive learning. The goal of this study was to investigate the efficacy of money in aversive learning, comparing it with primary reinforcers that are traditionally used in fear conditioning paradigms. A series of experiments were conducted in which participants initially played a gambling game that led to a monetary gain. They were then presented with an aversive conditioning paradigm, with either shock (primary reinforcer) or loss of money (secondary reinforcer) as the unconditioned stimulus. Skin conductance responses and subjective ratings indicated that potential monetary loss modulated the conditioned response. Depending on the presentation context, the secondary reinforcer was as effective as the primary reinforcer during aversive conditioning. These results suggest that stimuli that acquire reinforcing properties through social communication and interaction, such as money, can effectively influence aversive learning.

  18. Reinforcement learning and Tourette syndrome.

    PubMed

    Palminteri, Stefano; Pessiglione, Mathias

    2013-01-01

    In this chapter, we report the first experimental explorations of reinforcement learning in Tourette syndrome, realized by our team in the last few years. This report will be preceded by an introduction aimed to provide the reader with the state of the art of the knowledge concerning the neural bases of reinforcement learning at the moment of these studies and the scientific rationale beyond them. In short, reinforcement learning is learning by trial and error to maximize rewards and minimize punishments. This decision-making and learning process implicates the dopaminergic system projecting to the frontal cortex-basal ganglia circuits. A large body of evidence suggests that the dysfunction of the same neural systems is implicated in the pathophysiology of Tourette syndrome. Our results show that Tourette condition, as well as the most common pharmacological treatments (dopamine antagonists), affects reinforcement learning performance in these patients. Specifically, the results suggest a deficit in negative reinforcement learning, possibly underpinned by a functional hyperdopaminergia, which could explain the persistence of tics, despite their evident inadaptive (negative) value. This idea, together with the implications of these results in Tourette therapy and the future perspectives, is discussed in Section 4 of this chapter. © 2013 Elsevier Inc. All rights reserved.

  19. On the integration of reinforcement learning and approximate reasoning for control

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1991-01-01

    The author discusses the importance of strengthening the knowledge representation characteristic of reinforcement learning techniques using methods such as approximate reasoning. The ARIC (approximate reasoning-based intelligent control) architecture is an example of such a hybrid approach in which the fuzzy control rules are modified (fine-tuned) using reinforcement learning. ARIC also demonstrates that it is possible to start with an approximately correct control knowledge base and learn to refine this knowledge through further experience. On the other hand, techniques such as the TD (temporal difference) algorithm and Q-learning establish stronger theoretical foundations for their use in adaptive control and also in stability analysis of hybrid reinforcement learning and approximate reasoning-based controllers.

  20. Intelligence moderates reinforcement learning: a mini-review of the neural evidence

    PubMed Central

    2014-01-01

    Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. PMID:25185818

  1. Intelligence moderates reinforcement learning: a mini-review of the neural evidence.

    PubMed

    Chen, Chong

    2015-06-01

    Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. Copyright © 2015 the American Physiological Society.

  2. Prespeech motor learning in a neural network using reinforcement☆

    PubMed Central

    Warlaumont, Anne S.; Westermann, Gert; Buder, Eugene H.; Oller, D. Kimbrough

    2012-01-01

    Vocal motor development in infancy provides a crucial foundation for language development. Some significant early accomplishments include learning to control the process of phonation (the production of sound at the larynx) and learning to produce the sounds of one’s language. Previous work has shown that social reinforcement shapes the kinds of vocalizations infants produce. We present a neural network model that provides an account of how vocal learning may be guided by reinforcement. The model consists of a self-organizing map that outputs to muscles of a realistic vocalization synthesizer. Vocalizations are spontaneously produced by the network. If a vocalization meets certain acoustic criteria, it is reinforced, and the weights are updated to make similar muscle activations increasingly likely to recur. We ran simulations of the model under various reinforcement criteria and tested the types of vocalizations it produced after learning in the differ-ent conditions. When reinforcement was contingent on the production of phonated (i.e. voiced) sounds, the network’s post learning productions were almost always phonated, whereas when reinforcement was not contingent on phonation, the network’s post-learning productions were almost always not phonated. When reinforcement was contingent on both phonation and proximity to English vowels as opposed to Korean vowels, the model’s post-learning productions were more likely to resemble the English vowels and vice versa. PMID:23275137

  3. Synthesis and devolatilization of M-97 NVB silicone gum compounded into silica reinforced silicone base

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schneider, J.W.

    1986-06-01

    Silica reinforced silicon bases having 0.31 weight percent vinyl content were prepared by using a blend of low and high vinyl content devolatilized M-97 NVB silicone gum. The M-97 NVB is a custom dimethyl-, diphenyl-, methylvinylsiloxane gum. The silicon gum was devolatilized to evaluate the anticipated improved handling characteristics. Previous procured batches of M-97 NVB had not been devolatilized and difficult handling problems were encountered. The synthesis, devolatilization, and compound processes for the M-97 NVB silicone gum are discussed.

  4. Reinforcement learning in complementarity game and population dynamics

    NASA Astrophysics Data System (ADS)

    Jost, Jürgen; Li, Wei

    2014-02-01

    We systematically test and compare different reinforcement learning schemes in a complementarity game [J. Jost and W. Li, Physica A 345, 245 (2005), 10.1016/j.physa.2004.07.005] played between members of two populations. More precisely, we study the Roth-Erev, Bush-Mosteller, and SoftMax reinforcement learning schemes. A modified version of Roth-Erev with a power exponent of 1.5, as opposed to 1 in the standard version, performs best. We also compare these reinforcement learning strategies with evolutionary schemes. This gives insight into aspects like the issue of quick adaptation as opposed to systematic exploration or the role of learning rates.

  5. 40 CFR 63.488 - Methods and procedures for batch front-end process vent group determination.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... engineering principles, measurable process parameters, or physical or chemical laws or properties. Examples of... primary condenser recovering monomer, reaction products, by-products, or solvent from a stripper operated in batch mode, and the primary condenser recovering monomer, reaction products, by-products, or...

  6. MECHANISMS GOVERNING TRANSIENTS FROM THE BATCH INCINERATION OF LIQUID WASTES IN ROTARY KILNS

    EPA Science Inventory

    When "containerized" liquid wastes, bound on sorbents. are introduced into a rotary kiln in a batch mode, transient phenomena in-volving heat transfer into, and waste mass transfer out of, the sorbent can oromote the raoid release of waste vaoor into the kiln environment. This ra...

  7. Autonomous Performance Monitoring System: Monitoring and Self-Tuning (MAST)

    NASA Technical Reports Server (NTRS)

    Peterson, Chariya; Ziyad, Nigel A.

    2000-01-01

    Maintaining the long-term performance of software onboard a spacecraft can be a major factor in the cost of operations. In particular, the task of controlling and maintaining a future mission of distributed spacecraft will undoubtedly pose a great challenge, since the complexity of multiple spacecraft flying in formation grows rapidly as the number of spacecraft in the formation increases. Eventually, new approaches will be required in developing viable control systems that can handle the complexity of the data and that are flexible, reliable and efficient. In this paper we propose a methodology that aims to maintain the accuracy of flight software, while reducing the computational complexity of software tuning tasks. The proposed Monitoring and Self-Tuning (MAST) method consists of two parts: a flight software monitoring algorithm and a tuning algorithm. The dependency on the software being monitored is mostly contained in the monitoring process, while the tuning process is a generic algorithm independent of the detailed knowledge on the software. This architecture will enable MAST to be applicable to different onboard software controlling various dynamics of the spacecraft, such as attitude self-calibration, and formation control. An advantage of MAST over conventional techniques such as filter or batch least square is that the tuning algorithm uses machine learning approach to handle uncertainty in the problem domain, resulting in reducing over all computational complexity. The underlying concept of this technique is a reinforcement learning scheme based on cumulative probability generated by the historical performance of the system. The success of MAST will depend heavily on the reinforcement scheme used in the tuning algorithm, which guarantees the tuning solutions exist.

  8. An integer batch scheduling model considering learning, forgetting, and deterioration effects for a single machine to minimize total inventory holding cost

    NASA Astrophysics Data System (ADS)

    Yusriski, R.; Sukoyo; Samadhi, T. M. A. A.; Halim, A. H.

    2018-03-01

    This research deals with a single machine batch scheduling model considering the influenced of learning, forgetting, and machine deterioration effects. The objective of the model is to minimize total inventory holding cost, and the decision variables are the number of batches (N), batch sizes (Q[i], i = 1, 2, .., N) and the sequence of processing the resulting batches. The parts to be processed are received at the right time and the right quantities, and all completed parts must be delivered at a common due date. We propose a heuristic procedure based on the Lagrange method to solve the problem. The effectiveness of the procedure is evaluated by comparing the resulting solution to the optimal solution obtained from the enumeration procedure using the integer composition technique and shows that the average effectiveness is 94%.

  9. SIMPLE: An Introduction.

    ERIC Educational Resources Information Center

    Endres, Frank L.

    Symbolic Interactive Matrix Processing Language (SIMPLE) is a conversational matrix-oriented source language suited to a batch or a time-sharing environment. The two modes of operation of SIMPLE are conversational mode and programing mode. This program uses a TAURUS time-sharing system and cathode ray terminals or teletypes. SIMPLE performs all…

  10. Dextran Utilization During Its Synthesis by Weissella cibaria RBA12 Can Be Overcome by Fed-Batch Fermentation in a Bioreactor.

    PubMed

    Baruah, Rwivoo; Deka, Barsha; Kashyap, Niharika; Goyal, Arun

    2018-01-01

    Weissella cibaria RBA12 produced a maximum of 9 mg/ml dextran (with 90% efficiency) using shake flask culture under the optimized concentration of medium components viz. 2% (w/v) of each sucrose, yeast extract, and K 2 HPO 4 after incubation at optimized conditions of 20 °C and 180 rpm for 24 h. The optimized medium and conditions were used for scale-up of dextran production from Weissella cibaria RBA12 in 2.5-l working volume under batch fermentation in a bioreactor that yielded a maximum of 9.3 mg/ml dextran (with 93% efficiency) at 14 h. After 14 h, dextran produced was utilized by the bacterium till 18 h in its stationary phase under sucrose depleted conditions. Dextran utilization was further studied by fed-batch fermentation using sucrose feed. Dextran on production under fed-batch fermentation in bioreactor gave 35.8 mg/ml after 32 h. In fed-batch mode, there was no decrease in dextran concentration as observed in the batch mode. This showed that the utilization of dextran by Weissella cibaria RBA12 is initiated when there is sucrose depletion and therefore the presence of sucrose can possibly overcome the dextran hydrolysis. This is the first report of utilization of dextran, post-sucrose depletion by Weissella sp. studied in bioreactor.

  11. The prefrontal cortex and hybrid learning during iterative competitive games.

    PubMed

    Abe, Hiroshi; Seo, Hyojung; Lee, Daeyeol

    2011-12-01

    Behavioral changes driven by reinforcement and punishment are referred to as simple or model-free reinforcement learning. Animals can also change their behaviors by observing events that are neither appetitive nor aversive when these events provide new information about payoffs available from alternative actions. This is an example of model-based reinforcement learning and can be accomplished by incorporating hypothetical reward signals into the value functions for specific actions. Recent neuroimaging and single-neuron recording studies showed that the prefrontal cortex and the striatum are involved not only in reinforcement and punishment, but also in model-based reinforcement learning. We found evidence for both types of learning, and hence hybrid learning, in monkeys during simulated competitive games. In addition, in both the dorsolateral prefrontal cortex and orbitofrontal cortex, individual neurons heterogeneously encoded signals related to actual and hypothetical outcomes from specific actions, suggesting that both areas might contribute to hybrid learning. © 2011 New York Academy of Sciences.

  12. Improving performance of MFC by design alteration and adding cathodic electrolytes.

    PubMed

    Jadhav, G S; Ghangrekar, M M

    2008-12-01

    Performance of two microbial fuel cells (MFCs) was investigated under batch and continuous mode of operation using different cathodic electrolyte. The wastewater was supplied from the bottom port provided to the anode chamber in both the MFCs and the effluent left the anode chamber from the top port in MFC-1, whereas in MFC-2, the effluent exit was provided close to membrane. Stainless steel (SS) mesh anode was used in both the MFCs with surface area of 167 and 100 cm(2) in MFC-1 and MFC-2, respectively. Under batch mode and continuous mode of operation, these MFCs gave chemical oxygen demand removal efficiency more than 85% and about 68%, respectively. Under batch mode of operation, maximum power density of 39.95 and 56.87 mW/m(2) and maximum current density of 180.83 and 295 mA/m(2) were obtained in MFC-1 and MFC-2, respectively. Under continuous mode of operation, a reduction in power and current density was observed. Even with less surface area of the anode, MFC-2 produced more current (1.77 mA) than MFC-1 (1.40 mA). Among the cathodic electrolyte tested, these can be listed in decreasing order of power density as aerated KMnO(4) solution > KMnO(4) solution without aeration > aerated tap water > aerated tap water with NaCl.

  13. Reinforcement Learning and Dopamine in Schizophrenia: Dimensions of Symptoms or Specific Features of a Disease Group?

    PubMed Central

    Deserno, Lorenz; Boehme, Rebecca; Heinz, Andreas; Schlagenhauf, Florian

    2013-01-01

    Abnormalities in reinforcement learning are a key finding in schizophrenia and have been proposed to be linked to elevated levels of dopamine neurotransmission. Behavioral deficits in reinforcement learning and their neural correlates may contribute to the formation of clinical characteristics of schizophrenia. The ability to form predictions about future outcomes is fundamental for environmental interactions and depends on neuronal teaching signals, like reward prediction errors. While aberrant prediction errors, that encode non-salient events as surprising, have been proposed to contribute to the formation of positive symptoms, a failure to build neural representations of decision values may result in negative symptoms. Here, we review behavioral and neuroimaging research in schizophrenia and focus on studies that implemented reinforcement learning models. In addition, we discuss studies that combined reinforcement learning with measures of dopamine. Thereby, we suggest how reinforcement learning abnormalities in schizophrenia may contribute to the formation of psychotic symptoms and may interact with cognitive deficits. These ideas point toward an interplay of more rigid versus flexible control over reinforcement learning. Pronounced deficits in the flexible or model-based domain may allow for a detailed characterization of well-established cognitive deficits in schizophrenia patients based on computational models of learning. Finally, we propose a framework based on the potentially crucial contribution of dopamine to dysfunctional reinforcement learning on the level of neural networks. Future research may strongly benefit from computational modeling but also requires further methodological improvement for clinical group studies. These research tools may help to improve our understanding of disease-specific mechanisms and may help to identify clinically relevant subgroups of the heterogeneous entity schizophrenia. PMID:24391603

  14. Generalization of value in reinforcement learning by humans.

    PubMed

    Wimmer, G Elliott; Daw, Nathaniel D; Shohamy, Daphna

    2012-04-01

    Research in decision-making has focused on the role of dopamine and its striatal targets in guiding choices via learned stimulus-reward or stimulus-response associations, behavior that is well described by reinforcement learning theories. However, basic reinforcement learning is relatively limited in scope and does not explain how learning about stimulus regularities or relations may guide decision-making. A candidate mechanism for this type of learning comes from the domain of memory, which has highlighted a role for the hippocampus in learning of stimulus-stimulus relations, typically dissociated from the role of the striatum in stimulus-response learning. Here, we used functional magnetic resonance imaging and computational model-based analyses to examine the joint contributions of these mechanisms to reinforcement learning. Humans performed a reinforcement learning task with added relational structure, modeled after tasks used to isolate hippocampal contributions to memory. On each trial participants chose one of four options, but the reward probabilities for pairs of options were correlated across trials. This (uninstructed) relationship between pairs of options potentially enabled an observer to learn about option values based on experience with the other options and to generalize across them. We observed blood oxygen level-dependent (BOLD) activity related to learning in the striatum and also in the hippocampus. By comparing a basic reinforcement learning model to one augmented to allow feedback to generalize between correlated options, we tested whether choice behavior and BOLD activity were influenced by the opportunity to generalize across correlated options. Although such generalization goes beyond standard computational accounts of reinforcement learning and striatal BOLD, both choices and striatal BOLD activity were better explained by the augmented model. Consistent with the hypothesized role for the hippocampus in this generalization, functional connectivity between the ventral striatum and hippocampus was modulated, across participants, by the ability of the augmented model to capture participants' choice. Our results thus point toward an interactive model in which striatal reinforcement learning systems may employ relational representations typically associated with the hippocampus. © 2012 The Authors. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.

  15. Instrumental learning and relearning in individuals with psychopathy and in patients with lesions involving the amygdala or orbitofrontal cortex.

    PubMed

    Mitchell, D G V; Fine, C; Richell, R A; Newman, C; Lumsden, J; Blair, K S; Blair, R J R

    2006-05-01

    Previous work has shown that individuals with psychopathy are impaired on some forms of associative learning, particularly stimulus-reinforcement learning (Blair et al., 2004; Newman & Kosson, 1986). Animal work suggests that the acquisition of stimulus-reinforcement associations requires the amygdala (Baxter & Murray, 2002). Individuals with psychopathy also show impoverished reversal learning (Mitchell, Colledge, Leonard, & Blair, 2002). Reversal learning is supported by the ventrolateral and orbitofrontal cortex (Rolls, 2004). In this paper we present experiments investigating stimulus-reinforcement learning and relearning in patients with lesions of the orbitofrontal cortex or amygdala, and individuals with developmental psychopathy without known trauma. The results are interpreted with reference to current neurocognitive models of stimulus-reinforcement learning, relearning, and developmental psychopathy. Copyright (c) 2006 APA, all rights reserved.

  16. Model-based reinforcement learning with dimension reduction.

    PubMed

    Tangkaratt, Voot; Morimoto, Jun; Sugiyama, Masashi

    2016-12-01

    The goal of reinforcement learning is to learn an optimal policy which controls an agent to acquire the maximum cumulative reward. The model-based reinforcement learning approach learns a transition model of the environment from data, and then derives the optimal policy using the transition model. However, learning an accurate transition model in high-dimensional environments requires a large amount of data which is difficult to obtain. To overcome this difficulty, in this paper, we propose to combine model-based reinforcement learning with the recently developed least-squares conditional entropy (LSCE) method, which simultaneously performs transition model estimation and dimension reduction. We also further extend the proposed method to imitation learning scenarios. The experimental results show that policy search combined with LSCE performs well for high-dimensional control tasks including real humanoid robot control. Copyright © 2016 Elsevier Ltd. All rights reserved.

  17. Reinforcement of Science Learning through Local Culture: A Delphi Study

    ERIC Educational Resources Information Center

    Nuangchalerm, Prasart

    2008-01-01

    This study aims to explore the ways to reinforce science learning through local culture by using Delphi technique. Twenty four participants in various fields of study were selected. The result of study provides a framework for reinforcement of science learning through local culture on the theme life and environment. (Contains 1 table.)

  18. Goal-seeking neural net for recall and recognition

    NASA Astrophysics Data System (ADS)

    Omidvar, Omid M.

    1990-07-01

    Neural networks have been used to mimic cognitive processes which take place in animal brains. The learning capability inherent in neural networks makes them suitable candidates for adaptive tasks such as recall and recognition. The synaptic reinforcements create a proper condition for adaptation, which results in memorization, formation of perception, and higher order information processing activities. In this research a model of a goal seeking neural network is studied and the operation of the network with regard to recall and recognition is analyzed. In these analyses recall is defined as retrieval of stored information where little or no matching is involved. On the other hand recognition is recall with matching; therefore it involves memorizing a piece of information with complete presentation. This research takes the generalized view of reinforcement in which all the signals are potential reinforcers. The neuronal response is considered to be the source of the reinforcement. This local approach to adaptation leads to the goal seeking nature of the neurons as network components. In the proposed model all the synaptic strengths are reinforced in parallel while the reinforcement among the layers is done in a distributed fashion and pipeline mode from the last layer inward. A model of complex neuron with varying threshold is developed to account for inhibitory and excitatory behavior of real neuron. A goal seeking model of a neural network is presented. This network is utilized to perform recall and recognition tasks. The performance of the model with regard to the assigned tasks is presented.

  19. Enhanced bioelectricity harvesting in microbial fuel cells treating food waste leachate produced from biohydrogen fermentation.

    PubMed

    Choi, Jeongdong; Ahn, Youngho

    2015-05-01

    Microbial fuel cells (MFCs) treating the food waste leachate produced from biohydrogen fermentation were examined to enhance power generation and energy recovery. In batch mode, the maximum voltage production was 0.56 V and the power density reached 1540 mW/m(2). The maximum Coulombic efficiency (CEmax) and energy efficiency (EE) in the batch mode were calculated to be 88.8% and 18.8%, respectively. When the organic loading rate in sequencing batch mode varied from 0.75 to 6.2 g COD/L-d (under CEmax), the maximum power density reached 769.2 mW/m(2) in OLR of 3.1 g COD/L-d, whereas higher energy recovery (CE=52.6%, 0.346 Wh/g CODrem) was achieved at 1.51 g COD/L-d. The results demonstrate that readily biodegradable substrates in biohydrogen fermentation can be effectively used for the enhanced bioelectricity harvesting of MFCs and a MFC coupled with biohydrogen fermentation is of great benefit on higher electricity generation and energy efficiency. Copyright © 2015 Elsevier Ltd. All rights reserved.

  20. Batch and fed-batch production of butyric acid by Clostridium butyricum ZJUCB

    PubMed Central

    He, Guo-qing; Kong, Qing; Chen, Qi-he; Ruan, Hui

    2005-01-01

    The production of butyric acid by Clostridium butyricum ZJUCB at various pH values was investigated. In order to study the effect of pH on cell growth, butyric acid biosynthesis and reducing sugar consumption, different cultivation pH values ranging from 6.0 to 7.5 were evaluated in 5-L bioreactor. In controlled pH batch fermentation, the optimum pH for cell growth and butyric acid production was 6.5 with a cell yield of 3.65 g/L and butyric acid yield of 12.25 g/L. Based on these results, this study then compared batch and fed-batch fermentation of butyric acid production at pH 6.5. Maximum value (16.74 g/L) of butyric acid concentration was obtained in fed-batch fermentation compared to 12.25 g/L in batch fermentation. It was concluded that cultivation under fed-batch fermentation mode could enhance butyric acid production significantly (P<0.01) by C. butyricum ZJUCB. PMID:16252341

  1. A parameter control method in reinforcement learning to rapidly follow unexpected environmental changes.

    PubMed

    Murakoshi, Kazushi; Mizuno, Junya

    2004-11-01

    In order to rapidly follow unexpected environmental changes, we propose a parameter control method in reinforcement learning that changes each of learning parameters in appropriate directions. We determine each appropriate direction on the basis of relationships between behaviors and neuromodulators by considering an emergency as a key word. Computer experiments show that the agents using our proposed method could rapidly respond to unexpected environmental changes, not depending on either two reinforcement learning algorithms (Q-learning and actor-critic (AC) architecture) or two learning problems (discontinuous and continuous state-action problems).

  2. A study on using fireclay as a biomass carrier in an activated sludge system.

    PubMed

    Tilaki, Ramazan Ali Dianati

    2011-01-01

    By adding a biomass carrier to an activated sludge system, the biomass concentration will increase, and subsequently the organic removal efficiency will be enhanced. In this study, the possibility of using excess sludge from ceramic and tile manufacturing plants as a biomass carrier was investigated. The aim of this study was to determine the effect of using fireclay as a biomass carrier on biomass concentration, organic removal and nitrification efficiency in an activated sludge system. Experiments were conducted by using a bench scale activated sludge system operating in batch and continuous modes. Artificial simulated wastewater was made by using recirculated water in a ceramic manufacturing plant. In the continuous mode, hydraulic detention time in the aeration reactor was 8 and 22 h. In the batch mode, aeration time was 8 and 16 h. Fireclay doses were 500, 1,400 and 2,250 mg l(-1), and were added to the reactors in each experiment separately. The reactor with added fireclay was called a Hybrid Biological Reactor (HBR). A reactor without added fireclay was used as a control. Efficiency parameters such as COD, MLVSS and nitrate were measured in the control and HBR reactors according to standard methods. The average concentration of biomass in the HBR reactor was greater than in the control reactor. The total biomass concentration in the HBR reactor (2.25 g l(-1) fireclay) in the continuous mode was 3,000 mg l(-1) and in the batch mode was 2,400 mg l(-1). The attached biomass concentration in the HBR reactor (2.25 g l(-1) fireclay) in the continuous mode was 1,500 mg l(-1) and in the batch mode was 980 mg l(-1). Efficiency for COD removal in the HBR and control reactor was 95 and 55%, respectively. In the HBR reactor, nitrification was enhanced, and the concentration of nitrate was increased by 80%. By increasing the fireclay dose, total and attached biomass was increased. By adding fireclay as a biomass carrier, the efficiency of an activated sludge system to treat wastewater from ceramic manufacturing plants was increased.

  3. Easily constructed spectroelectrochemical cell for batch and flow injection analyses.

    PubMed

    Flowers, Paul A; Maynor, Margaret A; Owens, Donald E

    2002-02-01

    The design and performance of an easily constructed spectroelectrochemical cell suitable for batch and flow injection measurements are described. The cell is fabricated from a commercially available 5-mm quartz cuvette and employs 60 ppi reticulated vitreous carbon as the working electrode, resulting in a reasonable compromise between optical sensitivity and thin-layer electrochemical behavior. The spectroelectrochemical traits of the cell in both batch and flow modes were evaluated using aqueous ferricyanide and compare favorably to those reported previously for similar cells.

  4. Partial Planning Reinforcement Learning

    DTIC Science & Technology

    2012-08-31

    Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 15. SUBJECT TERMS Reinforcement Learning, Bayesian Optimization, Active ... Learning , Action Model Learning, Decision Theoretic Assistance Prasad Tadepalli, Alan Fern Oregon State University Office of Sponsored Programs Oregon State

  5. Batch fermentation options for high titer bioethanol production from a SPORL pretreated Douglas-Fir forest residue without detoxification

    Treesearch

    Mingyan Yang; Hairui Ji; Junyong Zhu

    2016-01-01

    This study evaluated batch fermentation modes, namely, separate hydrolysis and fermentation (SHF), quasi-simultaneous saccharification and fermentation (Q-SSF), and simultaneous saccharification and fermentation (SSF), and fermentation conditions, i.e., enzyme and yeast loadings, nutrient supplementation and sterilization, on high titer bioethanol production from SPORL...

  6. Microalgae-mediated bioremediation and valorization of cattle wastewater previously digested in a hybrid anaerobic reactor using a photobioreactor: Comparison between batch and continuous operation.

    PubMed

    de Mendonça, Henrique Vieira; Ometto, Jean Pierre Henry Balbaud; Otenio, Marcelo Henrique; Marques, Isabel Paula Ramos; Dos Reis, Alberto José Delgado

    2018-08-15

    Scenedesmus obliquus (ACOI 204/07) microalgae were cultivated in cattle wastewater in vertical alveolar flat panel photobioreactors, operated in batch and continuous mode, after previous digestion in a hybrid anaerobic reactor. In batch operation, removal efficiencies ranges of 65 to 70% of COD, 98 to 99% of NH 4 + and 69 to 77.5% of PO 4 -3 after 12days were recorded. The corresponding figures for continuous flow were from 57 to 61% of COD, 94 to 96% of NH 4 + and 65 to 70% of PO 4 -3 with mean hidraulic retention time of 12days. Higher rates of CO 2 fixation (327-547mgL -1 d -1 ) and higher biomass volumetric productivity (213-358mgL -1 d -1 ) were obtained in batch mode. This microalgae-mediated process can be considered promising for bioremediation and valorization of effluents produced by cattle breeding yielding a protein-rich microalgal biomass that could be eventually used as cattle feed. Copyright © 2018 Elsevier B.V. All rights reserved.

  7. Brewery and liquid manure wastewaters as potential feedstocks for microbial fuel cells: a performance study.

    PubMed

    Angosto, J M; Fernández-López, J A; Godínez, C

    2015-01-01

    This work aims at the comparison of the electrical and chemical performance of microbial fuel cells (MFCs) fed with several types of brewery and manure industrial wastewaters. Experiments were conducted in a single-cell MFC with the cathode exposed to air operated in batch and fed-batch modes. In fed-batch mode, after 4 days of operation, a standard MFC was refilled with crude wastewater to regenerate the biofilm and recreate initial feeding conditions. Brewery wastewater (CV1) mixed with pig-farm liquid manure (PU sample) gave the highest voltage (199.8 mV) and power density (340 mW/m3) outputs than non-mixed brewery waste water. Also, coulombic efficiency is much larger in the mixture (11%) than in the others (2-3%). However, in terms of chemical oxygen demand removal, the performance showed to be poorer (53%) for the mixed sample than in the pure brewery sample (93%). Fed-batch operation showed to be a good alternate for quasi-continuous operation, with equivalent electrical and chemical yields as compared with normal batchwise operation.

  8. Towards the Integration of Dark- and Photo-Fermentative Waste Treatment. 4. Repeated Batch Sequential Dark- and Photofermentation using Starch as Substrate

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Laurinavichene, T. V.; Belokopytov, B. F.; Laurinavichius, K. S.

    In this study we demonstrated the technical feasibility of a prolonged, sequential two-stage integrated process under a repeated batch mode of starch fermentation. In this durable scheme, the photobioreactor with purple bacteria in the second stage was fed directly with dark culture from the first stage without centrifugation, filtration, or sterilization (not demonstrated previously). After preliminary optimization, both the dark- and the photo-stages were performed under repeated batch modes with different process parameters. Continuous H{sub 2} production in this system was observed at a H{sub 2} yield of up to 1.4 and 3.9 mole mole{sup -1} hexose during the dark-more » and photo-stage, respectively (for a total of 5.3 mole mole{sup -1} hexose), and rates of 0.9 and 0.5 L L{sup -1} d{sup -1}, respectively. Prolonged repeated batch H{sub 2} production was maintained for up to 90 days in each stage and was rather stable under non-aseptic conditions. Potential for improvements in these results are discussed.« less

  9. Bioconversion of Agricultural Waste to Ethanol by SSF Using Recombinant Cellulase from Clostridium thermocellum

    PubMed Central

    Mutreja, Ruchi; Das, Debasish; Goyal, Dinesh; Goyal, Arun

    2011-01-01

    The effect of different pretreatment methods, temperature, and enzyme concentration on ethanol production from 8 lignocellulosic agrowaste by simultaneous saccharification and fermentation (SSF) using recombinant cellulase and Saccharomyces cerevisiae were studied. Recombinant cellulase was isolated from E. coli BL21 cells transformed with CtLic26A-Cel5-CBM11 full-length gene from Clostridium thermocellum and produced in both batch and fed-batch processes. The maximum cell OD and specific activity in batch mode were 1.6 and 1.91 U/mg, respectively, whereas in the fed-batch mode, maximum cell OD and specific activity were 3.8 and 3.5 U/mg, respectively, displaying a 2-fold increase. Eight substrates, Syzygium cumini (jamun), Azadirachta indica (neem), Saracens indica (asoka), bambusa dendrocalmus (bamboo), Populas nigra (poplar), Achnatherum hymenoides (wild grass), Eucalyptus marginata (eucalyptus), and Mangifera indica (mango), were subjected to SSF. Of three pretreatments, acid, alkali, and steam explosion, acid pretreatment Syzygium cumini (Jamun) at 30°C gave maximum ethanol yield of 1.42 g/L. PMID:21811671

  10. Bioconversion of Agricultural Waste to Ethanol by SSF Using Recombinant Cellulase from Clostridium thermocellum.

    PubMed

    Mutreja, Ruchi; Das, Debasish; Goyal, Dinesh; Goyal, Arun

    2011-01-01

    The effect of different pretreatment methods, temperature, and enzyme concentration on ethanol production from 8 lignocellulosic agrowaste by simultaneous saccharification and fermentation (SSF) using recombinant cellulase and Saccharomyces cerevisiae were studied. Recombinant cellulase was isolated from E. coli BL21 cells transformed with CtLic26A-Cel5-CBM11 full-length gene from Clostridium thermocellum and produced in both batch and fed-batch processes. The maximum cell OD and specific activity in batch mode were 1.6 and 1.91 U/mg, respectively, whereas in the fed-batch mode, maximum cell OD and specific activity were 3.8 and 3.5 U/mg, respectively, displaying a 2-fold increase. Eight substrates, Syzygium cumini (jamun), Azadirachta indica (neem), Saracens indica (asoka), bambusa dendrocalmus (bamboo), Populas nigra (poplar), Achnatherum hymenoides (wild grass), Eucalyptus marginata (eucalyptus), and Mangifera indica (mango), were subjected to SSF. Of three pretreatments, acid, alkali, and steam explosion, acid pretreatment Syzygium cumini (Jamun) at 30°C gave maximum ethanol yield of 1.42 g/L.

  11. Elastic properties of rigid fiber-reinforced composites

    NASA Astrophysics Data System (ADS)

    Chen, J.; Thorpe, M. F.; Davis, L. C.

    1995-05-01

    We study the elastic properties of rigid fiber-reinforced composites with perfect bonding between fibers and matrix, and also with sliding boundary conditions. In the dilute region, there exists an exact analytical solution. Around the rigidity threshold we find the elastic moduli and Poisson's ratio by decomposing the deformation into a compression mode and a rotation mode. For perfect bonding, both modes are important, whereas only the compression mode is operative for sliding boundary conditions. We employ the digital-image-based method and a finite element analysis to perform computer simulations which confirm our analytical predictions.

  12. Damage assessment in PRC and RC beams by dynamic tests

    NASA Astrophysics Data System (ADS)

    Capozucca, R.

    2011-07-01

    The present paper reports on damaged prestressed reinforced concrete (PRC) beams and reinforced concrete (RC) beams experimentally investigated through dynamic testing in order to verify damage degree due to reinforcement corrosion or cracking correlated to loading. The experimental program foresaw that PRC beams were subjected to artificial reinforcement corrosion and static loading while RC beams were damaged by increasing applied loads to produce bending cracking. Dynamic investigation was developed both on undamaged and damaged PRC and RC beams measuring natural frequencies and evaluating vibration mode shapes. Dynamic testing allowed the recording of frequency response variations at different vibration modes. The experimental results are compared with theoretical results and discussed.

  13. Modeling of thermal mode of drying special purposes ceramic products in batch action chamber dryers

    NASA Astrophysics Data System (ADS)

    Lukianov, E. S.; Lozovaya, S. Yu; Lozovoy, N. M.

    2018-03-01

    The article is devoted to the modeling of batch action chamber dryers in the processing line for producing shaped ceramic products. At the drying stage, for various reasons, most of these products are warped and cracked due to the occurrence of irregular shrinkage deformations due to the action of capillary forces. The primary cause is an untruly organized drying mode due to imperfection of chamber dryers design specifically because of the heat-transfer agent supply method and the possibility of creating a uniform temperature field in the whole volume of the chamber.

  14. Reinforcement learning in supply chains.

    PubMed

    Valluri, Annapurna; North, Michael J; Macal, Charles M

    2009-10-01

    Effective management of supply chains creates value and can strategically position companies. In practice, human beings have been found to be both surprisingly successful and disappointingly inept at managing supply chains. The related fields of cognitive psychology and artificial intelligence have postulated a variety of potential mechanisms to explain this behavior. One of the leading candidates is reinforcement learning. This paper applies agent-based modeling to investigate the comparative behavioral consequences of three simple reinforcement learning algorithms in a multi-stage supply chain. For the first time, our findings show that the specific algorithm that is employed can have dramatic effects on the results obtained. Reinforcement learning is found to be valuable in multi-stage supply chains with several learning agents, as independent agents can learn to coordinate their behavior. However, learning in multi-stage supply chains using these postulated approaches from cognitive psychology and artificial intelligence take extremely long time periods to achieve stability which raises questions about their ability to explain behavior in real supply chains. The fact that it takes thousands of periods for agents to learn in this simple multi-agent setting provides new evidence that real world decision makers are unlikely to be using strict reinforcement learning in practice.

  15. Reinforcement learning in scheduling

    NASA Technical Reports Server (NTRS)

    Dietterich, Tom G.; Ok, Dokyeong; Zhang, Wei; Tadepalli, Prasad

    1994-01-01

    The goal of this research is to apply reinforcement learning methods to real-world problems like scheduling. In this preliminary paper, we show that learning to solve scheduling problems such as the Space Shuttle Payload Processing and the Automatic Guided Vehicle (AGV) scheduling can be usefully studied in the reinforcement learning framework. We discuss some of the special challenges posed by the scheduling domain to these methods and propose some possible solutions we plan to implement.

  16. Neural Basis of Reinforcement Learning and Decision Making

    PubMed Central

    Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan

    2012-01-01

    Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain. PMID:22462543

  17. Effect of reinforcement learning on coordination of multiangent systems

    NASA Astrophysics Data System (ADS)

    Bukkapatnam, Satish T. S.; Gao, Greg

    2000-12-01

    For effective coordination of distributed environments involving multiagent systems, learning ability of each agent in the environment plays a crucial role. In this paper, we develop a simple group learning method based on reinforcement, and study its effect on coordination through application to a supply chain procurement scenario involving a computer manufacturer. Here, all parties are represented by self-interested, autonomous agents, each capable of performing specific simple tasks. They negotiate with each other to perform complex tasks and thus coordinate supply chain procurement. Reinforcement learning is intended to enable each agent to reach a best negotiable price within a shortest possible time. Our simulations of the application scenario under different learning strategies reveals the positive effects of reinforcement learning on an agent's as well as the system's performance.

  18. Effect of feeding strategies on pharmaceutical removal by subsurface flow constructed wetlands.

    PubMed

    Zhang, Dong Qing; Gersberg, Richard M; Hua, Tao; Zhu, Junfei; Nguyen, Anh Tuan; Law, Wing-Keung; Ng, Wun Jern; Tan, Soon Keat

    2012-01-01

    This study presents findings on an assessment of the effect of continuous and batch feeding strategies on the removal of selected pharmaceuticals from synthetic wastewater. Six mesocosm-scale constructed wetlands, including three horizontal subsurface flow constructed wetlands and three sand filters, were set up at the campus of Nanyang Technological University, Singapore. The findings showed that ibuprofen and diclofenac removal in the wetlands was significantly ( < 0.05) enhanced in the batch versus continuous mode. In contrast, naproxen and carbamazepine showed no significant differences ( > 0.05) in elimination under either feeding strategy. Our results also clearly showed that the presence of plants exerts a stimulatory effect on pharmaceutical removal for ibuprofen, diclofenac, and naproxen in batch and continuous mode. Estimation of the quantitative role of this stimulatory effect on pharmaceutical elimination of batch operation as compared with the effect of the presence of the higher plant alone showed that batch operation may account for 40 to 87% of the contribution conferred by the aquatic plant. The findings of this study imply that where maximal removal of pharmaceutical compounds is desired, periodic draining and filling might be the preferred operational strategy for full-scale, subsurface flow constructed wetlands. Copyright © by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America, Inc.

  19. The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive.

    PubMed

    Otto, A Ross; Gershman, Samuel J; Markman, Arthur B; Daw, Nathaniel D

    2013-05-01

    A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. In these accounts, a flexible but computationally expensive model-based reinforcement-learning system has been contrasted with a less flexible but more efficient model-free reinforcement-learning system. The factors governing which system controls behavior-and under what circumstances-are still unclear. Following the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrated that having human decision makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement-learning strategy. Further, we showed that, across trials, people negotiate the trade-off between the two systems dynamically as a function of concurrent executive-function demands, and people's choice latencies reflect the computational expenses of the strategy they employ. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources.

  20. The Curse of Planning: Dissecting multiple reinforcement learning systems by taxing the central executive

    PubMed Central

    Otto, A. Ross; Gershman, Samuel J.; Markman, Arthur B.; Daw, Nathaniel D.

    2013-01-01

    A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. Along these lines, a flexible but computationally expensive model-based reinforcement learning system has been contrasted with a less flexible but more efficient model-free reinforcement learning system. The factors governing which system controls behavior—and under what circumstances—are still unclear. Based on the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrate that having human decision-makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement learning strategy. Further, we show that across trials, people negotiate this tradeoff dynamically as a function of concurrent executive function demands and their choice latencies reflect the computational expenses of the strategy employed. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources. PMID:23558545

  1. Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.

    PubMed

    Ren, Zhipeng; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Zhipeng Ren; Daoyi Dong; Huaxiong Li; Chunlin Chen; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Ren, Zhipeng

    2018-06-01

    In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.

  2. Sliding mode control of dissolved oxygen in an integrated nitrogen removal process in a sequencing batch reactor (SBR).

    PubMed

    Muñoz, C; Young, H; Antileo, C; Bornhardt, C

    2009-01-01

    This paper presents a sliding mode controller (SMC) for dissolved oxygen (DO) in an integrated nitrogen removal process carried out in a suspended biomass sequencing batch reactor (SBR). The SMC performance was compared against an auto-tuning PI controller with parameters adjusted at the beginning of the batch cycle. A method for cancelling the slow DO sensor dynamics was implemented by using a first order model of the sensor. Tests in a lab-scale reactor showed that the SMC offers a better disturbance rejection capability than the auto-tuning PI controller, furthermore providing reasonable performance in a wide range of operation. Thus, SMC becomes an effective robust nonlinear tool to the DO control in this process, being also simple from a computational point of view, allowing its implementation in devices such as industrial programmable logic controllers (PLCs).

  3. B-tree search reinforcement learning for model based intelligent agent

    NASA Astrophysics Data System (ADS)

    Bhuvaneswari, S.; Vignashwaran, R.

    2013-03-01

    Agents trained by learning techniques provide a powerful approximation of active solutions for naive approaches. In this study using B - Trees implying reinforced learning the data search for information retrieval is moderated to achieve accuracy with minimum search time. The impact of variables and tactics applied in training are determined using reinforcement learning. Agents based on these techniques perform satisfactory baseline and act as finite agents based on the predetermined model against competitors from the course.

  4. Using Fuzzy Logic for Performance Evaluation in Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap S.

    1992-01-01

    Current reinforcement learning algorithms require long training periods which generally limit their applicability to small size problems. A new architecture is described which uses fuzzy rules to initialize its two neural networks: a neural network for performance evaluation and another for action selection. This architecture is applied to control of dynamic systems and it is demonstrated that it is possible to start with an approximate prior knowledge and learn to refine it through experiments using reinforcement learning.

  5. Reinforcement learning in multidimensional environments relies on attention mechanisms.

    PubMed

    Niv, Yael; Daniel, Reka; Geana, Andra; Gershman, Samuel J; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C

    2015-05-27

    In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this "representation learning" process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the "curse of dimensionality" in reinforcement learning. Copyright © 2015 the authors 0270-6474/15/358145-13$15.00/0.

  6. Changes in corticostriatal connectivity during reinforcement learning in humans.

    PubMed

    Horga, Guillermo; Maia, Tiago V; Marsh, Rachel; Hao, Xuejun; Xu, Dongrong; Duan, Yunsuo; Tau, Gregory Z; Graniello, Barbara; Wang, Zhishun; Kangarlu, Alayar; Martinez, Diana; Packard, Mark G; Peterson, Bradley S

    2015-02-01

    Many computational models assume that reinforcement learning relies on changes in synaptic efficacy between cortical regions representing stimuli and striatal regions involved in response selection, but this assumption has thus far lacked empirical support in humans. We recorded hemodynamic signals with fMRI while participants navigated a virtual maze to find hidden rewards. We fitted a reinforcement-learning algorithm to participants' choice behavior and evaluated the neural activity and the changes in functional connectivity related to trial-by-trial learning variables. Activity in the posterior putamen during choice periods increased progressively during learning. Furthermore, the functional connections between the sensorimotor cortex and the posterior putamen strengthened progressively as participants learned the task. These changes in corticostriatal connectivity differentiated participants who learned the task from those who did not. These findings provide a direct link between changes in corticostriatal connectivity and learning, thereby supporting a central assumption common to several computational models of reinforcement learning. © 2014 Wiley Periodicals, Inc.

  7. A common neural circuit mechanism for internally guided and externally reinforced forms of motor learning.

    PubMed

    Hisey, Erin; Kearney, Matthew Gene; Mooney, Richard

    2018-04-01

    The complex skills underlying verbal and musical expression can be learned without external punishment or reward, indicating their learning is internally guided. The neural mechanisms that mediate internally guided learning are poorly understood, but a circuit comprising dopamine-releasing neurons in the midbrain ventral tegmental area (VTA) and their targets in the basal ganglia are important to externally reinforced learning. Juvenile zebra finches copy a tutor song in a process that is internally guided and, in adulthood, can learn to modify the fundamental frequency (pitch) of a target syllable in response to external reinforcement with white noise. Here we combined intersectional genetic ablation of VTA neurons, reversible blockade of dopamine receptors in the basal ganglia, and singing-triggered optogenetic stimulation of VTA terminals to establish that a common VTA-basal ganglia circuit enables internally guided song copying and externally reinforced syllable pitch learning.

  8. Knockout crickets for the study of learning and memory: Dopamine receptor Dop1 mediates aversive but not appetitive reinforcement in crickets.

    PubMed

    Awata, Hiroko; Watanabe, Takahito; Hamanaka, Yoshitaka; Mito, Taro; Noji, Sumihare; Mizunami, Makoto

    2015-11-02

    Elucidation of reinforcement mechanisms in associative learning is an important subject in neuroscience. In mammals, dopamine neurons are thought to play critical roles in mediating both appetitive and aversive reinforcement. Our pharmacological studies suggested that octopamine and dopamine neurons mediate reward and punishment, respectively, in crickets, but recent studies in fruit-flies concluded that dopamine neurons mediates both reward and punishment, via the type 1 dopamine receptor Dop1. To resolve the discrepancy between studies in different insect species, we produced Dop1 knockout crickets using the CRISPR/Cas9 system and found that they are defective in aversive learning with sodium chloride punishment but not appetitive learning with water or sucrose reward. The results suggest that dopamine and octopamine neurons mediate aversive and appetitive reinforcement, respectively, in crickets. We suggest unexpected diversity in neurotransmitters mediating appetitive reinforcement between crickets and fruit-flies, although the neurotransmitter mediating aversive reinforcement is conserved. This study demonstrates usefulness of the CRISPR/Cas9 system for producing knockout animals for the study of learning and memory.

  9. Social Cognition as Reinforcement Learning: Feedback Modulates Emotion Inference.

    PubMed

    Zaki, Jamil; Kallman, Seth; Wimmer, G Elliott; Ochsner, Kevin; Shohamy, Daphna

    2016-09-01

    Neuroscientific studies of social cognition typically employ paradigms in which perceivers draw single-shot inferences about the internal states of strangers. Real-world social inference features much different parameters: People often encounter and learn about particular social targets (e.g., friends) over time and receive feedback about whether their inferences are correct or incorrect. Here, we examined this process and, more broadly, the intersection between social cognition and reinforcement learning. Perceivers were scanned using fMRI while repeatedly encountering three social targets who produced conflicting visual and verbal emotional cues. Perceivers guessed how targets felt and received feedback about whether they had guessed correctly. Visual cues reliably predicted one target's emotion, verbal cues predicted a second target's emotion, and neither reliably predicted the third target's emotion. Perceivers successfully used this information to update their judgments over time. Furthermore, trial-by-trial learning signals-estimated using two reinforcement learning models-tracked activity in ventral striatum and ventromedial pFC, structures associated with reinforcement learning, and regions associated with updating social impressions, including TPJ. These data suggest that learning about others' emotions, like other forms of feedback learning, relies on domain-general reinforcement mechanisms as well as domain-specific social information processing.

  10. Human-level control through deep reinforcement learning.

    PubMed

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-26

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  11. Human-level control through deep reinforcement learning

    NASA Astrophysics Data System (ADS)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-01

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  12. Fumaric Acid Production from Alkali-Pretreated Corncob by Fed-Batch Simultaneous Saccharification and Fermentation Combined with Separated Hydrolysis and Fermentation at High Solids Loading.

    PubMed

    Li, Xin; Zhou, Jin; Ouyang, Shuiping; Ouyang, Jia; Yong, Qiang

    2017-02-01

    Production of fumaric acid from alkali-pretreated corncob (APC) at high solids loading was investigated using a combination of separated hydrolysis and fermentation (SHF) and fed-batch simultaneous saccharification and fermentation (SSF) by Rhizopus oryzae. Four different fermentation modes were tested to maximize fumaric acid concentration at high solids loading. The highest concentration of 41.32 g/L fumaric acid was obtained from 20 % (w/v) APC at 38 °C in the combined SHF and fed-batch SSF process, compared with 19.13 g/L fumaric acid in batch SSF alone. The results indicated that a combination of SHF and fed-batch SSF significantly improved production of fumaric acid from lignocellulose by R. oryzae than that achieved with batch SSF at high solids loading.

  13. Probabilistic Reinforcement Learning in Adults with Autism Spectrum Disorders

    PubMed Central

    Solomon, Marjorie; Smith, Anne C.; Frank, Michael J.; Ly, Stanford; Carter, Cameron S.

    2017-01-01

    Background Autism spectrum disorders (ASDs) can be conceptualized as disorders of learning, however there have been few experimental studies taking this perspective. Methods We examined the probabilistic reinforcement learning performance of 28 adults with ASDs and 30 typically developing adults on a task requiring learning relationships between three stimulus pairs consisting of Japanese characters with feedback that was valid with different probabilities (80%, 70%, and 60%). Both univariate and Bayesian state–space data analytic methods were employed. Hypotheses were based on the extant literature as well as on neurobiological and computational models of reinforcement learning. Results Both groups learned the task after training. However, there were group differences in early learning in the first task block where individuals with ASDs acquired the most frequently accurately reinforced stimulus pair (80%) comparably to typically developing individuals; exhibited poorer acquisition of the less frequently reinforced 70% pair as assessed by state–space learning curves; and outperformed typically developing individuals on the near chance (60%) pair. Individuals with ASDs also demonstrated deficits in using positive feedback to exploit rewarded choices. Conclusions Results support the contention that individuals with ASDs are slower learners. Based on neurobiology and on the results of computational modeling, one interpretation of this pattern of findings is that impairments are related to deficits in flexible updating of reinforcement history as mediated by the orbito-frontal cortex, with spared functioning of the basal ganglia. This hypothesis about the pathophysiology of learning in ASDs can be tested using functional magnetic resonance imaging. PMID:21425243

  14. Titanium dioxide nanotubes addition to self-adhesive resin cement: Effect on physical and biological properties.

    PubMed

    Ramos-Tonello, Carla M; Lisboa-Filho, Paulo N; Arruda, Larisa B; Tokuhara, Cintia K; Oliveira, Rodrigo C; Furuse, Adilson Y; Rubo, José H; Borges, Ana Flávia S

    2017-07-01

    This study has investigated the influence of Titanium dioxide nanotubes (TiO 2 -nt) addition to self-adhesive resin cement on the degree of conversion, water sorption, and water solubility, mechanical and biological properties. A commercially available auto-adhesive resin cement (RelyX U200™, 3M ESPE) was reinforced with varying amounts of nanotubes (0.3, 0.6, 0.9wt%) and evaluated at different curing modes (self- and dual cure). The DC in different times (3, 6, 9, 12 and 15min), water sorption (Ws) and solubility (Sl), 3-point flexural strength (σf), elastic modulus (E), Knoop microhardness (H) and viability of NIH/3T3 fibroblasts were performed to characterize the resin cement. Reinforced self-adhesive resin cement, regardless of concentration, increased the DC for the self- and dual-curing modes at all times studied. The concentration of the TiO 2 -nt and the curing mode did not influence the Ws and Sl. Regarding σf, concentrations of both 0.3 and 0.9wt% for self-curing mode resulted in data similar to that of dual-curing unreinforced cement. The E increased with the addition of 0.9wt% for self-cure mode and H increased with 0.6 and 0.9wt% for both curing modes. Cytotoxicity assays revealed that reinforced cements were biocompatible. TiO 2 -nt reinforced self-adhesive resin cement are promising materials for use in indirect dental restorations. Taken together, self-adhesive resin cement reinforced with TiO 2 -nt exhibited physicochemical and mechanical properties superior to those of unreinforced cements, without compromising their cellular viability. Copyright © 2017 The Academy of Dental Materials. Published by Elsevier Ltd. All rights reserved.

  15. Fracture resistance and primary failure mode of endodontically treated teeth restored with a carbon fiber-reinforced resin post system in vitro.

    PubMed

    Raygot, C G; Chai, J; Jameson, D L

    2001-01-01

    This study was undertaken to characterize the fracture resistance and mode of fracture of endodontically treated incisors restored with cast post-and-core, prefabricated stainless steel post, or carbon fiber-reinforced composite post systems. Ten endodontically treated teeth restored with each technique were subjected to a compressive load delivered at a 130-degree angle to the long axis until the first sign of failure was noted. The fracture load and the mode of fracture were recorded. The failure loads registered in the three groups were not significantly different. Between 70%, and 80% of teeth from any of the three groups displayed fractures that were located above the simulated bone level. The use of carbon fiber-reinforced composite posts did not change the fracture resistance or the failure mode of endodontically treated central incisors compared to the use of metallic posts.

  16. Role of dopamine D2 receptors in human reinforcement learning.

    PubMed

    Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W

    2014-09-01

    Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well.

  17. Role of Dopamine D2 Receptors in Human Reinforcement Learning

    PubMed Central

    Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W

    2014-01-01

    Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well. PMID:24713613

  18. Reinforcement learning in computer vision

    NASA Astrophysics Data System (ADS)

    Bernstein, A. V.; Burnaev, E. V.

    2018-04-01

    Nowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as robots are equipped with visual sensors from which they learn state of surrounding environment by solving corresponding computer vision tasks. Solutions of these tasks are used for making decisions about possible future actions. It is not surprising that when solving computer vision tasks we should take into account special aspects of their subsequent application in model-based predictive control. Reinforcement learning is one of modern machine learning technologies in which learning is carried out through interaction with the environment. In recent years, Reinforcement learning has been used both for solving such applied tasks as processing and analysis of visual information, and for solving specific computer vision problems such as filtering, extracting image features, localizing objects in scenes, and many others. The paper describes shortly the Reinforcement learning technology and its use for solving computer vision problems.

  19. Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms

    PubMed Central

    Daniel, Reka; Geana, Andra; Gershman, Samuel J.; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C.

    2015-01-01

    In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this “representation learning” process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the “curse of dimensionality” in reinforcement learning. PMID:26019331

  20. Investigation of vinegar production using a novel shaken repeated batch culture system.

    PubMed

    Schlepütz, Tino; Büchs, Jochen

    2013-01-01

    Nowadays, bioprocesses are developed or optimized on small scale. Also, vinegar industry is motivated to reinvestigate the established repeated batch fermentation process. As yet, there is no small-scale culture system for optimizing fermentation conditions for repeated batch bioprocesses. Thus, the aim of this study is to propose a new shaken culture system for parallel repeated batch vinegar fermentation. A new operation mode - the flushing repeated batch - was developed. Parallel repeated batch vinegar production could be established in shaken overflow vessels in a completely automated operation with only one pump per vessel. This flushing repeated batch was first theoretically investigated and then empirically tested. The ethanol concentration was online monitored during repeated batch fermentation by semiconductor gas sensors. It was shown that the switch from one ethanol substrate quality to different ethanol substrate qualities resulted in prolonged lag phases and durations of the first batches. In the subsequent batches the length of the fermentations decreased considerably. This decrease in the respective lag phases indicates an adaptation of the acetic acid bacteria mixed culture to the specific ethanol substrate quality. Consequently, flushing repeated batch fermentations on small scale are valuable for screening fermentation conditions and, thereby, improving industrial-scale bioprocesses such as vinegar production in terms of process robustness, stability, and productivity. Copyright © 2013 American Institute of Chemical Engineers.

  1. Transfer of a three step mAb chromatography process from batch to continuous: Optimizing productivity to minimize consumable requirements.

    PubMed

    Gjoka, Xhorxhi; Gantier, Rene; Schofield, Mark

    2017-01-20

    The goal of this study was to adapt a batch mAb purification chromatography platform for continuous operation. The experiments and rationale used to convert from batch to continuous operation are described. Experimental data was used to design chromatography methods for continuous operation that would exceed the threshold for critical quality attributes and minimize the consumables required as compared to batch mode of operation. Four unit operations comprising of Protein A capture, viral inactivation, flow-through anion exchange (AEX), and mixed-mode cation exchange chromatography (MMCEX) were integrated across two Cadence BioSMB PD multi-column chromatography systems in order to process a 25L volume of harvested cell culture fluid (HCCF) in less than 12h. Transfer from batch to continuous resulted in an increase in productivity of the Protein A step from 13 to 50g/L/h and of the MMCEX step from 10 to 60g/L/h with no impact on the purification process performance in term of contaminant removal (4.5 log reduction of host cell proteins, 50% reduction in soluble product aggregates) and overall chromatography process yield of recovery (75%). The increase in productivity, combined with continuous operation, reduced the resin volume required for Protein A and MMCEX chromatography by more than 95% compared to batch. The volume of AEX membrane required for flow through operation was reduced by 74%. Moreover, the continuous process required 44% less buffer than an equivalent batch process. This significant reduction in consumables enables cost-effective, disposable, single-use manufacturing. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  2. Network congestion control algorithm based on Actor-Critic reinforcement learning model

    NASA Astrophysics Data System (ADS)

    Xu, Tao; Gong, Lina; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen

    2018-04-01

    Aiming at the network congestion control problem, a congestion control algorithm based on Actor-Critic reinforcement learning model is designed. Through the genetic algorithm in the congestion control strategy, the network congestion problems can be better found and prevented. According to Actor-Critic reinforcement learning, the simulation experiment of network congestion control algorithm is designed. The simulation experiments verify that the AQM controller can predict the dynamic characteristics of the network system. Moreover, the learning strategy is adopted to optimize the network performance, and the dropping probability of packets is adaptively adjusted so as to improve the network performance and avoid congestion. Based on the above finding, it is concluded that the network congestion control algorithm based on Actor-Critic reinforcement learning model can effectively avoid the occurrence of TCP network congestion.

  3. Bioelectrochemical conversion of CO2 to chemicals: CO2 as a next generation feedstock for electricity-driven bioproduction in batch and continuous modes.

    PubMed

    Bajracharya, Suman; Vanbroekhoven, Karolien; Buisman, Cees J N; Strik, David P B T B; Pant, Deepak

    2017-09-21

    The recent concept of microbial electrosynthesis (MES) has evolved as an electricity-driven production technology for chemicals from low-value carbon dioxide (CO 2 ) using micro-organisms as biocatalysts. MES from CO 2 comprises bioelectrochemical reduction of CO 2 to multi-carbon organic compounds using the reducing equivalents produced at the electrically-polarized cathode. The use of CO 2 as a feedstock for chemicals is gaining much attention, since CO 2 is abundantly available and its use is independent of the food supply chain. MES based on CO 2 reduction produces acetate as a primary product. In order to elucidate the performance of the bioelectrochemical CO 2 reduction process using different operation modes (batch vs. continuous), an investigation was carried out using a MES system with a flow-through biocathode supplied with 20 : 80 (v/v) or 80 : 20 (v/v) CO 2  : N 2 gas. The highest acetate production rate of 149 mg L -1 d -1 was observed with a 3.1 V applied cell-voltage under batch mode. While running in continuous mode, high acetate production was achieved with a maximum rate of 100 mg L -1 d -1 . In the continuous mode, the acetate production was not sustained over long-term operation, likely due to insufficient microbial biocatalyst retention within the biocathode compartment (i.e. suspended micro-organisms were washed out of the system). Restarting batch mode operations resulted in a renewed production of acetate. This showed an apparent domination of suspended biocatalysts over the attached (biofilm forming) biocatalysts. Long term CO 2 reduction at the biocathode resulted in the accumulation of acetate, and more reduced compounds like ethanol and butyrate were also formed. Improvements in the production rate and different biomass retention strategies (e.g. selecting for biofilm forming micro-organisms) should be investigated to enable continuous biochemical production from CO 2 using MES. Certainly, other process optimizations will be required to establish MES as an innovative sustainable technology for manufacturing biochemicals from CO 2 as a next generation feedstock.

  4. From Recurrent Choice to Skill Learning: A Reinforcement-Learning Model

    ERIC Educational Resources Information Center

    Fu, Wai-Tat; Anderson, John R.

    2006-01-01

    The authors propose a reinforcement-learning mechanism as a model for recurrent choice and extend it to account for skill learning. The model was inspired by recent research in neurophysiological studies of the basal ganglia and provides an integrated explanation of recurrent choice behavior and skill learning. The behavior includes effects of…

  5. MBASIC batch processor architectural overview

    NASA Technical Reports Server (NTRS)

    Reynolds, S. M.

    1978-01-01

    The MBASIC (TM) batch processor, a language translator designed to operate in the MBASIC (TM) environment is described. Features include: (1) a CONVERT TO BATCH command, usable from the ready mode; and (2) translation of the users program in stages through several levels of intermediate language and optimization. The processor is to be designed and implemented in both machine-independent and machine-dependent sections. The architecture is planned so that optimization processes are transparent to the rest of the system and need not be included in the first design implementation cycle.

  6. GO, an exec for running the programs: CELL, COLLIDER, MAGIC, PATRICIA, PETROS, TRANSPORT, and TURTLE

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shoaee, H.

    1982-05-01

    An exec has been written and placed on the PEP group's public disk to facilitate the use of several PEP related computer programs available on VM. The exec's program list currently includes: CELL, COLLIDER, MAGIC, PATRICIA, PETROS, TRANSPORT, and TURTLE. In addition, provisions have been made to allow addition of new programs to this list as they become available. The GO exec is directly callable from inside the Wylbur editor (in fact, currently this is the only way to use the GO exec.). It provides the option of running any of the above programs in either interactive or batch mode.more » In the batch mode, the GO exec sends the data in the Wylbur active file along with the information required to run the job to the batch monitor (BMON, a virtual machine that schedules and controls execution of batch jobs). This enables the user to proceed with other VM activities at his/her terminal while the job executes, thus making it of particular interest to the users with jobs requiring much CPU time to execute and/or those wishing to run multiple jobs independently. In the interactive mode, useful for small jobs requiring less CPU time, the job is executed by the user's own Virtual Machine using the data in the active file as input. At the termination of an interactive job, the GO exec facilitates examination of the output by placing it in the Wylbur active file.« less

  7. GO, an exec for running the programs: CELL, COLLIDER, MAGIC, PATRICIA, PETROS, TRANSPORT and TURTLE

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shoaee, H.

    1982-05-01

    An exec has been written and placed on the PEP group's public disk (PUBRL 192) to facilitate the use of several PEP related computer programs available on VM. The exec's program list currently includes: CELL, COLLIDER, MAGIC, PATRICIA, PETROS, TRANSPORT, and TURTLE. In addition, provisions have been made to allow addition of new programs to this list as they become available. The GO exec is directly callable from inside the Wylbur editor (in fact, currently this is the only way to use the GO exec.) It provides the option of running any of the above programs in either interactive ormore » batch mode. In the batch mode, the GO exec sends the data in the Wylbur active file along with the information required to run the job to the batch monitor (BMON, a virtual machine that schedules and controls execution of batch jobs). This enables the user to proceed with other VM activities at his/her terminal while the job executes, thus making it of particular interest to the users with jobs requiring much CPU time to execute and/or those wishing to run multiple jobs independently. In the interactive mode, useful for small jobs requiring less CPU time, the job is executed by the user's own Virtual Machine using the data in the active file as input. At the termination of an interactive job, the GO exec facilitates examination of the output by placing it in the Wylbur active file.« less

  8. Performance Enhancement Using Selective Reinforcement for Metallic Single- and Multi-Pin Loaded Holes

    NASA Technical Reports Server (NTRS)

    Farley, Gary L.; Seshadri, Banavara R.

    2005-01-01

    An analysis based investigation of aluminum with metal matrix composite selectively reinforced single- and multi-hole specimens was performed and their results compared with results from geometrically comparable non-reinforced specimens. All reinforced specimens exhibited a significant increase in performance. Performance increase of up to 170 percent was achieved. Specimen failure modes were consistent with results from reinforced polymeric matrix composite specimens. Localized reinforcement application (circular) proved as effective as a broader area (strip) reinforcement. Also, selective reinforcement is an excellent method of increasing the performance of multi-hole specimens.

  9. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning

    PubMed Central

    Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, BJ

    2014-01-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The current study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents towards action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggests possible explanations for how peers may motivate adolescent behavior. PMID:24550063

  10. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning.

    PubMed

    Jones, Rebecca M; Somerville, Leah H; Li, Jian; Ruberry, Erika J; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, B J

    2014-06-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.

  11. Stochastic Reinforcement Benefits Skill Acquisition

    ERIC Educational Resources Information Center

    Dayan, Eran; Averbeck, Bruno B.; Richmond, Barry J.; Cohen, Leonardo G.

    2014-01-01

    Learning complex skills is driven by reinforcement, which facilitates both online within-session gains and retention of the acquired skills. Yet, in ecologically relevant situations, skills are often acquired when mapping between actions and rewarding outcomes is unknown to the learning agent, resulting in reinforcement schedules of a stochastic…

  12. A reward optimization method based on action subrewards in hierarchical reinforcement learning.

    PubMed

    Fu, Yuchen; Liu, Quan; Ling, Xionghong; Cui, Zhiming

    2014-01-01

    Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are "trial and error" and "related reward." A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of "curse of dimensionality," which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

  13. Multi-agent Reinforcement Learning Model for Effective Action Selection

    NASA Astrophysics Data System (ADS)

    Youk, Sang Jo; Lee, Bong Keun

    Reinforcement learning is a sub area of machine learning concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward. In the case of multi-agent, especially, which state space and action space gets very enormous in compared to single agent, so it needs to take most effective measure available select the action strategy for effective reinforcement learning. This paper proposes a multi-agent reinforcement learning model based on fuzzy inference system in order to improve learning collect speed and select an effective action in multi-agent. This paper verifies an effective action select strategy through evaluation tests based on Robocop Keep away which is one of useful test-beds for multi-agent. Our proposed model can apply to evaluate efficiency of the various intelligent multi-agents and also can apply to strategy and tactics of robot soccer system.

  14. Pragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models

    PubMed Central

    Najnin, Shamima; Banerjee, Bonny

    2018-01-01

    Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The “novel words to novel objects” language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN), neural-fitted Q-network (NFQ), and deep Q-network (DQN)]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1) with joint attention (Attentional models), and (2) with joint attention and prosodic cues (Attentional-prosodic models). Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task. PMID:29441027

  15. Reinforcement learning improves behaviour from evaluative feedback

    NASA Astrophysics Data System (ADS)

    Littman, Michael L.

    2015-05-01

    Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.

  16. Reinforcement learning improves behaviour from evaluative feedback.

    PubMed

    Littman, Michael L

    2015-05-28

    Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.

  17. Quantum-Enhanced Machine Learning

    NASA Astrophysics Data System (ADS)

    Dunjko, Vedran; Taylor, Jacob M.; Briegel, Hans J.

    2016-09-01

    The emerging field of quantum machine learning has the potential to substantially aid in the problems and scope of artificial intelligence. This is only enhanced by recent successes in the field of classical machine learning. In this work we propose an approach for the systematic treatment of machine learning, from the perspective of quantum information. Our approach is general and covers all three main branches of machine learning: supervised, unsupervised, and reinforcement learning. While quantum improvements in supervised and unsupervised learning have been reported, reinforcement learning has received much less attention. Within our approach, we tackle the problem of quantum enhancements in reinforcement learning as well, and propose a systematic scheme for providing improvements. As an example, we show that quadratic improvements in learning efficiency, and exponential improvements in performance over limited time periods, can be obtained for a broad class of learning problems.

  18. The drift diffusion model as the choice rule in reinforcement learning.

    PubMed

    Pedersen, Mads Lund; Frank, Michael J; Biele, Guido

    2017-08-01

    Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyperactivity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups.

  19. Can model-free reinforcement learning explain deontological moral judgments?

    PubMed

    Ayars, Alisabeth

    2016-05-01

    Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations that seek to maximize certain kinds of outcomes. I present three concerns regarding the use of model-free reinforcement learning to explain deontological moral judgment. First, many actions that humans find aversive from model-free learning are not judged to be morally wrong. Moral judgment must require something in addition to model-free learning. Second, there is a dearth of evidence for central predictions of the reinforcement account-e.g., that people with different reinforcement histories will, all else equal, make different moral judgments. Finally, to account for the effect of intention within the framework requires certain assumptions which lack support. These challenges are reasonable foci for future empirical/theoretical work on the model-free/model-based framework. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. The drift diffusion model as the choice rule in reinforcement learning

    PubMed Central

    Frank, Michael J.

    2017-01-01

    Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyper-activity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups. PMID:27966103

  1. Numerical study of aero-excitation of steam-turbine rotor blade self-oscillations

    NASA Astrophysics Data System (ADS)

    Galaev, S. A.; Makhnov, V. Yu.; Ris, V. V.; Smirnov, E. M.

    2018-05-01

    Blade aero-excitation increment is evaluated by numerical solution of the full 3D unsteady Reynolds-averaged Navier-Stokes equations governing wet steam flow in a powerful steam-turbine last stage. The equilibrium wet steam model was adopted. Blade surfaces oscillations are defined by eigen-modes of a row of blades bounded by a shroud. Grid dependency study was performed with a reduced model being a set of blades multiple an eigen-mode nodal diameter. All other computations were carried out for the entire blade row. Two cases are considered, with an original-blade row and with a row of modified (reinforced) blades. Influence of eigen-mode nodal diameter and blade reinforcing on aero-excitation increment is analyzed. It has been established, in particular, that maximum value of the aero-excitation increment for the reinforced-blade row is two times less as compared with the original-blade row. Generally, results of the study point definitely to less probability of occurrence of blade self-oscillations in case of the reinforced blade-row.

  2. General functioning predicts reward and punishment learning in schizophrenia.

    PubMed

    Somlai, Zsuzsanna; Moustafa, Ahmed A; Kéri, Szabolcs; Myers, Catherine E; Gluck, Mark A

    2011-04-01

    Previous studies investigating feedback-driven reinforcement learning in patients with schizophrenia have provided mixed results. In this study, we explored the clinical predictors of reward and punishment learning using a probabilistic classification learning task. Patients with schizophrenia (n=40) performed similarly to healthy controls (n=30) on the classification learning task. However, more severe negative and general symptoms were associated with lower reward-learning performance, whereas poorer general psychosocial functioning was correlated with both lower reward- and punishment-learning performances. Multiple linear regression analyses indicated that general psychosocial functioning was the only significant predictor of reinforcement learning performance when education, antipsychotic dose, and positive, negative and general symptoms were included in the analysis. These results suggest a close relationship between reinforcement learning and general psychosocial functioning in schizophrenia. Published by Elsevier B.V.

  3. Batch calculations in CalcHEP

    NASA Astrophysics Data System (ADS)

    Pukhov, A.

    2003-04-01

    CalcHEP is a clone of the CompHEP project which is developed by the author outside of the CompHEP group. CompHEP/CalcHEP are packages for automatic calculations of elementary particle decay and collision properties in the lowest order of perturbation theory. The main idea prescribed into the packages is to make available passing on from the Lagrangian to the final distributions effectively with a high level of automation. According to this, the packages were created as a menu driven user friendly programs for calculations in the interactive mode. From the other side, long-time calculations should be done in the non-interactive regime. Thus, from the beginning CompHEP has a problem of batch calculations. In CompHEP 33.23 the batch session was realized by mean of interactive menu which allows to the user to formulate the task for batch. After that the not-interactive session was launched. This way is too restricted, not flexible, and leads to doubling in programming. In this article I discuss another approach how one can force an interactive program to work in non-interactive mode. This approach was realized in CalcHEP 2.1 disposed on http://theory.sinp.msu.ru/~pukhov/calchep.html.

  4. Batch and fixed-bed column studies for biosorption of Zn(II) ions onto pongamia oil cake (Pongamia pinnata) from biodiesel oil extraction.

    PubMed

    Shanmugaprakash, M; Sivakumar, V

    2015-12-01

    The present work, analyzes the potential of defatted pongamia oil cake (DPOC) for the biosorption of Zn(II) ions from aqueous solutions in the both batch and column mode. Batch experiments were conducted to evaluate the optimal pH, effect of adsorbent dosage, initial Zn(II) ions concentration and contact time. The biosorption equilibrium and kinetics data for Zn(II) ions onto the DPOC were studied in detail, using several models, among all it was found to be that, Freundlich and the second-order model explained the equilibrium data well. The calculated thermodynamic parameters had shown that the biosorption of Zn(II) ions was exothermic and spontaneous in nature. Batch desorption studies showed that the maximum Zn(II) recovery occurred, using 0.1 M EDTA. The Bed Depth Service Time (BDST) and the Thomas model was successfully employed to evaluate the model parameters in the column mode. The results indicated that the DPOC can be applied as an effective and eco-friendly biosorbent for the removal of Zn(II) ions in polluted wastewater. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Agent-based traffic management and reinforcement learning in congested intersection network.

    DOT National Transportation Integrated Search

    2012-08-01

    This study evaluates the performance of traffic control systems based on reinforcement learning (RL), also called approximate dynamic programming (ADP). Two algorithms have been selected for testing: 1) Q-learning and 2) approximate dynamic programmi...

  6. Enhanced lipid production by Rhodosporidium toruloides using different fed-batch feeding strategies with lignocellulosic hydrolysate as the sole carbon source

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fei, Qiang; O'Brien, Marykate; Nelson, Robert

    Industrial biotechnology that is able to provide environmentally friendly bio-based products has attracted more attention in replacing petroleum-based industries. Currently, most of the carbon sources used for fermentation-based bioprocesses are obtained from agricultural commodities that are used as foodstuff for human beings. Lignocellulose-derived sugars as the non-food, green, and sustainable alternative carbon sources have great potential to avoid this dilemma for producing the renewable, bio-based hydrocarbon fuel precursors, such as microbial lipid. Efficient bioconversion of lignocellulose-based sugars into lipids is one of the critical parameters for industrial application. Therefore, the fed-batch cultivation, which is a common method used in industrialmore » applications, was investigated to achieve a high cell density culture along with high lipid yield and productivity. In this study, several fed-batch strategies were explored to improve lipid production using lignocellulosic hydrolysates derived from corn stover. Compared to the batch culture giving a lipid yield of 0.19 g/g, the dissolved-oxygen-stat feeding mode increased the lipid yield to 0.23 g/g and the lipid productivity to 0.33 g/L/h. The pulse feeding mode further improved lipid productivity to 0.35 g/L/h and the yield to 0.24 g/g. However, the highest lipid yield (0.29 g/g) and productivity (0.4 g/L/h) were achieved using an automated online sugar control feeding mode, which gave a dry cell weight of 54 g/L and lipid content of 59 % (w/w). The major fatty acids of the lipid derived from lignocellulosic hydrolysates were predominately palmitic acid and oleic acid, which are similar to those of conventional oilseed plants. Our results suggest that the fed-batch feeding strategy can strongly influence the lipid production. Lastly, the online sugar control feeding mode was the most appealing strategy for high cell density, lipid yield, and lipid productivity using lignocellulosic hydrolysates as the sole carbon source.« less

  7. Enhanced lipid production by Rhodosporidium toruloides using different fed-batch feeding strategies with lignocellulosic hydrolysate as the sole carbon source

    DOE PAGES

    Fei, Qiang; O'Brien, Marykate; Nelson, Robert; ...

    2016-06-23

    Industrial biotechnology that is able to provide environmentally friendly bio-based products has attracted more attention in replacing petroleum-based industries. Currently, most of the carbon sources used for fermentation-based bioprocesses are obtained from agricultural commodities that are used as foodstuff for human beings. Lignocellulose-derived sugars as the non-food, green, and sustainable alternative carbon sources have great potential to avoid this dilemma for producing the renewable, bio-based hydrocarbon fuel precursors, such as microbial lipid. Efficient bioconversion of lignocellulose-based sugars into lipids is one of the critical parameters for industrial application. Therefore, the fed-batch cultivation, which is a common method used in industrialmore » applications, was investigated to achieve a high cell density culture along with high lipid yield and productivity. In this study, several fed-batch strategies were explored to improve lipid production using lignocellulosic hydrolysates derived from corn stover. Compared to the batch culture giving a lipid yield of 0.19 g/g, the dissolved-oxygen-stat feeding mode increased the lipid yield to 0.23 g/g and the lipid productivity to 0.33 g/L/h. The pulse feeding mode further improved lipid productivity to 0.35 g/L/h and the yield to 0.24 g/g. However, the highest lipid yield (0.29 g/g) and productivity (0.4 g/L/h) were achieved using an automated online sugar control feeding mode, which gave a dry cell weight of 54 g/L and lipid content of 59 % (w/w). The major fatty acids of the lipid derived from lignocellulosic hydrolysates were predominately palmitic acid and oleic acid, which are similar to those of conventional oilseed plants. Our results suggest that the fed-batch feeding strategy can strongly influence the lipid production. Lastly, the online sugar control feeding mode was the most appealing strategy for high cell density, lipid yield, and lipid productivity using lignocellulosic hydrolysates as the sole carbon source.« less

  8. Operant conditioning of enhanced pain sensitivity by heat-pain titration.

    PubMed

    Becker, Susanne; Kleinböhl, Dieter; Klossika, Iris; Hölzl, Rupert

    2008-11-15

    Operant conditioning mechanisms have been demonstrated to be important in the development of chronic pain. Most experimental studies have investigated the operant modulation of verbal pain reports with extrinsic reinforcement, such as verbal reinforcement. Whether this reflects actual changes in the subjective experience of the nociceptive stimulus remained unclear. This study replicates and extends our previous demonstration that enhanced pain sensitivity to prolonged heat-pain stimulation could be learned in healthy participants through intrinsic reinforcement (contingent changes in nociceptive input) independent of verbal pain reports. In addition, we examine whether different magnitudes of reinforcement differentially enhance pain sensitivity using an operant heat-pain titration paradigm. It is based on the previously developed non-verbal behavioral discrimination task for the assessment of sensitization, which uses discriminative down- or up-regulation of stimulus temperatures in response to changes in subjective intensity. In operant heat-pain titration, this discriminative behavior and not verbal pain report was contingently reinforced or punished by acute decreases or increases in heat-pain intensity. The magnitude of reinforcement was varied between three groups: low (N1=13), medium (N2=11) and high reinforcement (N3=12). Continuous reinforcement was applied to acquire and train the operant behavior, followed by partial reinforcement to analyze the underlying learning mechanisms. Results demonstrated that sensitization to prolonged heat-pain stimulation was enhanced by operant learning within 1h. The extent of sensitization was directly dependent on the received magnitude of reinforcement. Thus, operant learning mechanisms based on intrinsic reinforcement may provide an explanation for the gradual development of sustained hypersensitivity during pain that is becoming chronic.

  9. Multimodal Word Meaning Induction From Minimal Exposure to Natural Text.

    PubMed

    Lazaridou, Angeliki; Marelli, Marco; Baroni, Marco

    2017-04-01

    By the time they reach early adulthood, English speakers are familiar with the meaning of thousands of words. In the last decades, computational simulations known as distributional semantic models (DSMs) have demonstrated that it is possible to induce word meaning representations solely from word co-occurrence statistics extracted from a large amount of text. However, while these models learn in batch mode from large corpora, human word learning proceeds incrementally after minimal exposure to new words. In this study, we run a set of experiments investigating whether minimal distributional evidence from very short passages suffices to trigger successful word learning in subjects, testing their linguistic and visual intuitions about the concepts associated with new words. After confirming that subjects are indeed very efficient distributional learners even from small amounts of evidence, we test a DSM on the same multimodal task, finding that it behaves in a remarkable human-like way. We conclude that DSMs provide a convincing computational account of word learning even at the early stages in which a word is first encountered, and the way they build meaning representations can offer new insights into human language acquisition. Copyright © 2017 Cognitive Science Society, Inc.

  10. Seismic Behaviour of Composite Steel Fibre Reinforced Concrete Shear Walls

    NASA Astrophysics Data System (ADS)

    Boita, Ioana-Emanuela; Dan, Daniel; Stoian, Valeriu

    2017-10-01

    In this paper is presented an experimental study conducted at the “Politehnica” University of Timisoara, Romania. This study provides results from a comprehensive experimental investigation on the behaviour of composite steel fibre reinforced concrete shear walls (CSFRCW) with partially or totally encased profiles. Two experimental composite steel fibre reinforced concrete walls (CSFRCW) and, as a reference specimen, a typical reinforced concrete shear wall (RCW), (without structural reinforcement), were fabricated and tested under constant vertical load and quasi-static reversed cyclic lateral loads, in displacement control. The tests were performed until failure. The tested specimens were designed as 1:3 scale steel-concrete composite elements, representing a three storeys and one bay element from the base of a lateral resisting system made by shear walls. Configuration/arrangement of steel profiles in cross section were varied within the specimens. The main objective of this research consisted in identifying innovative solutions for composite steel-concrete shear walls with enhanced performance, as steel fibre reinforced concrete which was used in order to replace traditional reinforced concrete. A first conclusion was that replacing traditional reinforcement with steel fibre changes the failure mode of the elements, as from a flexural mode, in case of element RCW, to a shear failure mode for CSFRCW. The maximum lateral force had almost similar values but test results indicated an improvement in cracking response, and a decrease in ductility. The addition of steel fibres in the concrete mixture can lead to an increase of the initial cracking force, and can change the sudden opening of a crack in a more stable process.

  11. Determination of model parameters for zinc (II) ion biosorption onto powdered waste sludge (PWS) in a fed-batch system.

    PubMed

    Kargi, Fikret; Cikla, Sinem

    2007-12-01

    Biosorption of zinc (II) ions onto pre-treated powdered waste sludge (PWS) was investigated using a completely mixed tank operating in fed-batch mode instead of an adsorption column. Experiments with variable feed flow rate (0.05-0.5 L h(-1)), feed Zn(II) ion concentrations (37.5-275 mg L(-1)) and amount of adsorbent (1-6 g PWS) were performed using fed-batch operation at pH 5 and room temperature (20-25 degrees C). Break-through curves describing variations of aqueous (effluent) zinc ion concentrations with time were determined for different operating conditions. Percent zinc removal from the aqueous phase decreased, but the biosorbed (solid phase) zinc ion concentration increased with increasing feed flow rate and zinc concentration. A modified Bohart-Adams equation was used to determine the biosorption capacity of PWS (q'(s)) and the rate constant (K) for zinc ion biosorption. Biosorption capacity (q'(s)=57.7 g Zn kg(-1) PWS) of PWS in fed-batch operation was found to be comparable with powdered activated carbon (PAC) in column operations. However, the adsorption rate constant (K=9.17 m(3) kg(-1) h(-1)) in fed-batch operation was an order of magnitude larger than those obtained in adsorption columns because of elimination of mass transfer limitations encountered in the column operations. Therefore, a completely mixed tank operated in fed-batch mode was proven to be more advantageous as compared to adsorption columns due to better contact between the phases yielding faster adsorption rates.

  12. Improving Robot Motor Learning with Negatively Valenced Reinforcement Signals

    PubMed Central

    Navarro-Guerrero, Nicolás; Lowe, Robert J.; Wermter, Stefan

    2017-01-01

    Both nociception and punishment signals have been used in robotics. However, the potential for using these negatively valenced types of reinforcement learning signals for robot learning has not been exploited in detail yet. Nociceptive signals are primarily used as triggers of preprogrammed action sequences. Punishment signals are typically disembodied, i.e., with no or little relation to the agent-intrinsic limitations, and they are often used to impose behavioral constraints. Here, we provide an alternative approach for nociceptive signals as drivers of learning rather than simple triggers of preprogrammed behavior. Explicitly, we use nociception to expand the state space while we use punishment as a negative reinforcement learning signal. We compare the performance—in terms of task error, the amount of perceived nociception, and length of learned action sequences—of different neural networks imbued with punishment-based reinforcement signals for inverse kinematic learning. We contrast the performance of a version of the neural network that receives nociceptive inputs to that without such a process. Furthermore, we provide evidence that nociception can improve learning—making the algorithm more robust against network initializations—as well as behavioral performance by reducing the task error, perceived nociception, and length of learned action sequences. Moreover, we provide evidence that punishment, at least as typically used within reinforcement learning applications, may be detrimental in all relevant metrics. PMID:28420976

  13. Place preference and vocal learning rely on distinct reinforcers in songbirds.

    PubMed

    Murdoch, Don; Chen, Ruidong; Goldberg, Jesse H

    2018-04-30

    In reinforcement learning (RL) agents are typically tasked with maximizing a single objective function such as reward. But it remains poorly understood how agents might pursue distinct objectives at once. In machines, multiobjective RL can be achieved by dividing a single agent into multiple sub-agents, each of which is shaped by agent-specific reinforcement, but it remains unknown if animals adopt this strategy. Here we use songbirds to test if navigation and singing, two behaviors with distinct objectives, can be differentially reinforced. We demonstrate that strobe flashes aversively condition place preference but not song syllables. Brief noise bursts aversively condition song syllables but positively reinforce place preference. Thus distinct behavior-generating systems, or agencies, within a single animal can be shaped by correspondingly distinct reinforcement signals. Our findings suggest that spatially segregated vocal circuits can solve a credit assignment problem associated with multiobjective learning.

  14. Dopamine-Dependent Reinforcement of Motor Skill Learning: Evidence from Gilles de la Tourette Syndrome

    ERIC Educational Resources Information Center

    Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias

    2011-01-01

    Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…

  15. Machine Learning Control For Highly Reconfigurable High-Order Systems

    DTIC Science & Technology

    2015-01-02

    develop and flight test a Reinforcement Learning based approach for autonomous tracking of ground targets using a fixed wing Unmanned...Reinforcement Learning - based algorithms are developed for learning agents’ time dependent dynamics while also learning to control them. Three algorithms...to a wide range of engineering- based problems . Implementation of these solutions, however, is often complicated by the hysteretic, non-linear,

  16. Pathology crossword competition: an active and easy way of learning pathology in undergraduate medical education.

    PubMed

    Htwe, T T; Sabaridah, I; Rajyaguru, K M; Mazidah, A M

    2012-02-01

    In line with the trend to engage students in active learning, it is imperative to introduce new strategies that make learning more interesting, especially in undergraduate curricula. This study aimed to determine students' performance and perception in pathology crosswords as an active way of learning and to assess their ability to memorise difficult terms in pathology. A crossword competition in pathology was conducted for two batches (year 2009 and 2010) of Phase 2 medical students in Malaysia. Crossword puzzles were prepared using an online application. Two sets of puzzles were prepared, with 20 questions for the assessment of general pathology and 20 for systemic pathology. The purpose was to compare the students' recent and remote memorising abilities, as general pathology was taught a year before proceeding to systemic pathology teaching. There were 12 groups per batch, with 8-10 students in a group. Survey questionnaires were used to assess the students' perception of the competition. Descriptive analysis was performed for comparison of performance. The mean score of correctly answered questions in general pathology was 12.75 and 11.50 in batch 2009 and 2010, respectively. The mean score for systemic pathology was 14.50 in 2009 and 13.83 in 2010. Students in the 2009 batch performed better, but this was not statistically significant (p-value > 0.05). A positive response was observed from the questionnaires. Applying crossword puzzles as a new strategy is a useful and easy way for undergraduate medical students to learn pathology.

  17. [Learning and memory in Drosophila: physiologic and genetic bases].

    PubMed

    Zhuravlev, A V; Nikitina, E A; Savvateeva-Popova, E V

    2015-01-01

    Elucidation of molecular mechanisms of cognitive functions is one of the major achievements in neurobiology. At most, this is due to the studies on the simple nervous systems, such as the CNS in Drosophila melanogaster. Many of its functional characteristics are pretty similar to higher vertebrates. Among these are: 1) evolutionary conservation of genes and molecular systems involved in the regulation of learning acquisition and memory formation; 2) presence of highly specialized and differentiated sensory, associative and motor centers; 3) utilization of similar modes of informational coding and analysis; 4) availability of major learning forms including non-associative, as well as associative learning; 5) diversity of different memories, including short-term- and protein synthesis- dependent long-term memory; 6) presence of aminergic reinforcement systems in the brain; 7) feed-back loops of circadian clocks, current organism experience and individual organism characters affecting cognitive process per se. In this review the main attention is paid to the two mostly studied Drosophila learning forms, namely to olfactory Iearning and courtship suppression conditioning (CCS). A separate consideration is given to the impacts of kynurenins and metabolite of actin remodeling signal cascade.

  18. Reinforcement and inference in cross-situational word learning.

    PubMed

    Tilles, Paulo F C; Fontanari, José F

    2013-01-01

    Cross-situational word learning is based on the notion that a learner can determine the referent of a word by finding something in common across many observed uses of that word. Here we propose an adaptive learning algorithm that contains a parameter that controls the strength of the reinforcement applied to associations between concurrent words and referents, and a parameter that regulates inference, which includes built-in biases, such as mutual exclusivity, and information of past learning events. By adjusting these parameters so that the model predictions agree with data from representative experiments on cross-situational word learning, we were able to explain the learning strategies adopted by the participants of those experiments in terms of a trade-off between reinforcement and inference. These strategies can vary wildly depending on the conditions of the experiments. For instance, for fast mapping experiments (i.e., the correct referent could, in principle, be inferred in a single observation) inference is prevalent, whereas for segregated contextual diversity experiments (i.e., the referents are separated in groups and are exhibited with members of their groups only) reinforcement is predominant. Other experiments are explained with more balanced doses of reinforcement and inference.

  19. Impact of carbon and nitrogen feeding strategy on high production of biomass and docosahexaenoic acid (DHA) by Schizochytrium sp. LU310.

    PubMed

    Ling, Xueping; Guo, Jing; Liu, Xiaoting; Zhang, Xia; Wang, Nan; Lu, Yinghua; Ng, I-Son

    2015-05-01

    A new isolated Schizochytrium sp. LU310 from the mangrove forest of Wenzhou, China, was found as a high producing microalga of docosahexaenoic acid (DHA). In this study, the significant improvements for DHA fermentation by the batch mode in the baffled flasks (i.e. higher oxygen supply) were achieved. By applied the nitrogen-feeding strategy in 1000 mL baffled flasks, the biomass, DHA concentration and DHA productivity were increased by 110.4%, 117.9% and 110.4%, respectively. Moreover, DHA concentration of 21.06 g/L was obtained by feeding 15 g/L of glucose intermittently, which was an increase of 41.25% over that of the batch mode. Finally, an innovative strategy was carried out by intermittent feeding carbon and simultaneously feeding nitrogen. The maximum DHA concentration and DHA productivity in the fed-batch cultivation reached to 24.74 g/L and 241.5 mg/L/h, respectively. Copyright © 2014 Elsevier Ltd. All rights reserved.

  20. Evolution with Reinforcement Learning in Negotiation

    PubMed Central

    Zou, Yi; Zhan, Wenjie; Shao, Yuan

    2014-01-01

    Adaptive behavior depends less on the details of the negotiation process and makes more robust predictions in the long term as compared to in the short term. However, the extant literature on population dynamics for behavior adjustment has only examined the current situation. To offset this limitation, we propose a synergy of evolutionary algorithm and reinforcement learning to investigate long-term collective performance and strategy evolution. The model adopts reinforcement learning with a tradeoff between historical and current information to make decisions when the strategies of agents evolve through repeated interactions. The results demonstrate that the strategies in populations converge to stable states, and the agents gradually form steady negotiation habits. Agents that adopt reinforcement learning perform better in payoff, fairness, and stableness than their counterparts using classic evolutionary algorithm. PMID:25048108

  1. Evolution with reinforcement learning in negotiation.

    PubMed

    Zou, Yi; Zhan, Wenjie; Shao, Yuan

    2014-01-01

    Adaptive behavior depends less on the details of the negotiation process and makes more robust predictions in the long term as compared to in the short term. However, the extant literature on population dynamics for behavior adjustment has only examined the current situation. To offset this limitation, we propose a synergy of evolutionary algorithm and reinforcement learning to investigate long-term collective performance and strategy evolution. The model adopts reinforcement learning with a tradeoff between historical and current information to make decisions when the strategies of agents evolve through repeated interactions. The results demonstrate that the strategies in populations converge to stable states, and the agents gradually form steady negotiation habits. Agents that adopt reinforcement learning perform better in payoff, fairness, and stableness than their counterparts using classic evolutionary algorithm.

  2. Overcoming Learned Helplessness in Community College Students.

    ERIC Educational Resources Information Center

    Roueche, John E.; Mink, Oscar G.

    1982-01-01

    Reviews research on the effects of repeated experiences of helplessness and on locus of control. Identifies conditions necessary for overcoming learned helplessness; i.e., the potential for learning to occur; consistent reinforcement; relevant, valued reinforcers; and favorable psychological situation. Recommends eight ways for teachers to…

  3. Dopamine D2 Receptor Signaling in the Nucleus Accumbens Comprises a Metabolic-Cognitive Brain Interface Regulating Metabolic Components of Glucose Reinforcement.

    PubMed

    Michaelides, Michael; Miller, Michael L; DiNieri, Jennifer A; Gomez, Juan L; Schwartz, Elizabeth; Egervari, Gabor; Wang, Gene Jack; Mobbs, Charles V; Volkow, Nora D; Hurd, Yasmin L

    2017-11-01

    Appetitive drive is influenced by coordinated interactions between brain circuits that regulate reinforcement and homeostatic signals that control metabolism. Glucose modulates striatal dopamine (DA) and regulates appetitive drive and reinforcement learning. Striatal DA D2 receptors (D2Rs) also regulate reinforcement learning and are implicated in glucose-related metabolic disorders. Nevertheless, interactions between striatal D2R and peripheral glucose have not been previously described. Here we show that manipulations involving striatal D2R signaling coincide with perseverative and impulsive-like responding for sucrose, a disaccharide consisting of fructose and glucose. Fructose conveys orosensory (ie, taste) reinforcement but does not convey metabolic (ie, nutrient-derived) reinforcement. Glucose however conveys orosensory reinforcement but unlike fructose, it is a major metabolic energy source, underlies sustained reinforcement, and activates striatal circuitry. We found that mice with deletion of dopamine- and cAMP-regulated neuronal phosphoprotein (DARPP-32) exclusively in D2R-expressing cells exhibited preferential D2R changes in the nucleus accumbens (NAc), a striatal region that critically regulates sucrose reinforcement. These changes coincided with perseverative and impulsive-like responding for sucrose pellets and sustained reinforcement learning of glucose-paired flavors. These mice were also characterized by significant glucose intolerance (ie, impaired glucose utilization). Systemic glucose administration significantly attenuated sucrose operant responding and D2R activation or blockade in the NAc bidirectionally modulated blood glucose levels and glucose tolerance. Collectively, these results implicate NAc D2R in regulating both peripheral glucose levels and glucose-dependent reinforcement learning behaviors and highlight the notion that glucose metabolic impairments arising from disrupted NAc D2R signaling are involved in compulsive and perseverative feeding behaviors.

  4. Scale-free memory model for multiagent reinforcement learning. Mean field approximation and rock-paper-scissors dynamics

    NASA Astrophysics Data System (ADS)

    Lubashevsky, I.; Kanemoto, S.

    2010-07-01

    A continuous time model for multiagent systems governed by reinforcement learning with scale-free memory is developed. The agents are assumed to act independently of one another in optimizing their choice of possible actions via trial-and-error search. To gain awareness about the action value the agents accumulate in their memory the rewards obtained from taking a specific action at each moment of time. The contribution of the rewards in the past to the agent current perception of action value is described by an integral operator with a power-law kernel. Finally a fractional differential equation governing the system dynamics is obtained. The agents are considered to interact with one another implicitly via the reward of one agent depending on the choice of the other agents. The pairwise interaction model is adopted to describe this effect. As a specific example of systems with non-transitive interactions, a two agent and three agent systems of the rock-paper-scissors type are analyzed in detail, including the stability analysis and numerical simulation. Scale-free memory is demonstrated to cause complex dynamics of the systems at hand. In particular, it is shown that there can be simultaneously two modes of the system instability undergoing subcritical and supercritical bifurcation, with the latter one exhibiting anomalous oscillations with the amplitude and period growing with time. Besides, the instability onset via this supercritical mode may be regarded as “altruism self-organization”. For the three agent system the instability dynamics is found to be rather irregular and can be composed of alternate fragments of oscillations different in their properties.

  5. A junction-tree based learning algorithm to optimize network wide traffic control: A coordinated multi-agent framework

    DOE PAGES

    Zhu, Feng; Aziz, H. M. Abdul; Qian, Xinwu; ...

    2015-01-31

    Our study develops a novel reinforcement learning algorithm for the challenging coordinated signal control problem. Traffic signals are modeled as intelligent agents interacting with the stochastic traffic environment. The model is built on the framework of coordinated reinforcement learning. The Junction Tree Algorithm (JTA) based reinforcement learning is proposed to obtain an exact inference of the best joint actions for all the coordinated intersections. Moreover, the algorithm is implemented and tested with a network containing 18 signalized intersections in VISSIM. Finally, our results show that the JTA based algorithm outperforms independent learning (Q-learning), real-time adaptive learning, and fixed timing plansmore » in terms of average delay, number of stops, and vehicular emissions at the network level.« less

  6. The Effects of Partial Reinforcement in the Acquisition and Extinction of Recurrent Serial Patterns.

    ERIC Educational Resources Information Center

    Dockstader, Steven L.

    The purpose of these 2 experiments was to determine whether sequential response pattern behavior is affected by partial reinforcement in the same way as other behavior systems. The first experiment investigated the partial reinforcement extinction effects (PREE) in a sequential concept learning task where subjects were required to learn a…

  7. Microstimulation of the Human Substantia Nigra Alters Reinforcement Learning

    PubMed Central

    Ramayya, Ashwin G.; Misra, Amrit

    2014-01-01

    Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action–reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action–reward associations rather than stimulus–reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action–reward associations during reinforcement learning. PMID:24828643

  8. An Investigation of Ways to Reduce the Failure Rate of Student Pilots during Flying Training in the Royal Australian Air Force.

    DTIC Science & Technology

    1987-09-01

    Luthans (28) expanded the concept of learning as follows: 1. Learning involves a change, though not necessarily an improvement, in behaviour. Learning...that results in an unpleasant outcome is not likely to be repeated (36:244). Luthans and Kreitner (27) described the various forms of reinforcement as...four 33 alternatives (defined previously on page 24 and taken from Luthans ) of positive reinforcement, negative reinforcement, extinction and punishment

  9. Mastery Learning through Individualized Instruction: A Reinforcement Strategy

    ERIC Educational Resources Information Center

    Sagy, John; Ravi, R.; Ananthasayanam, R.

    2009-01-01

    The present study attempts to gauge the effect of individualized instructional methods as a reinforcement strategy for mastery learning. Among various individualized instructional methods, the study focuses on PIM (Programmed Instructional Method) and CAIM (Computer Assisted Instruction Method). Mastery learning is a process where students achieve…

  10. Working Memory and Reinforcement Schedule Jointly Determine Reinforcement Learning in Children: Potential Implications for Behavioral Parent Training

    PubMed Central

    Segers, Elien; Beckers, Tom; Geurts, Hilde; Claes, Laurence; Danckaerts, Marina; van der Oord, Saskia

    2018-01-01

    Introduction: Behavioral Parent Training (BPT) is often provided for childhood psychiatric disorders. These disorders have been shown to be associated with working memory impairments. BPT is based on operant learning principles, yet how operant principles shape behavior (through the partial reinforcement (PRF) extinction effect, i.e., greater resistance to extinction that is created when behavior is reinforced partially rather than continuously) and the potential role of working memory therein is scarcely studied in children. This study explored the PRF extinction effect and the role of working memory therein using experimental tasks in typically developing children. Methods: Ninety-seven children (age 6–10) completed a working memory task and an operant learning task, in which children acquired a response-sequence rule under either continuous or PRF (120 trials), followed by an extinction phase (80 trials). Data of 88 children were used for analysis. Results: The PRF extinction effect was confirmed: We observed slower acquisition and extinction in the PRF condition as compared to the continuous reinforcement (CRF) condition. Working memory was negatively related to acquisition but not extinction performance. Conclusion: Both reinforcement contingencies and working memory relate to acquisition performance. Potential implications for BPT are that decreasing working memory load may enhance the chance of optimally learning through reinforcement. PMID:29643822

  11. 40 CFR 1065.546 - Validation of minimum dilution ratio for PM batch sampling.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... flows and/or tracer gas concentrations for transient and ramped modal cycles to validate the minimum... mode-average values instead of continuous measurements for discrete mode steady-state duty cycles... molar flow data. This involves determination of at least two of the following three quantities: Raw...

  12. Mechanical and Thermal Properties of Polypropylene Composites Reinforced with Lignocellulose Nanofibers Dried in Melted Ethylene-Butene Copolymer

    PubMed Central

    Iwamoto, Shinichiro; Yamamoto, Shigehiro; Lee, Seung-Hwan; Ito, Hirokazu; Endo, Takashi

    2014-01-01

    Lignocellulose nanofibers were prepared by the wet disk milling of wood flour. First, an ethylene-butene copolymer was pre-compounded with wood flour or lignocellulose nanofibers to prepare master batches. This process involved evaporating the water of the lignocellulose nanofiber suspension during compounding with ethylene-butene copolymer by heating at 105 °C. These master batches were compounded again with polypropylene to obtain the final composites. Since ethylene-butene copolymer is an elastomer, its addition increased the impact strength of polypropylene but decreased the stiffness. In contrast, the wood flour- and lignocellulose nanofiber-reinforced composites showed significantly higher flexural moduli and slightly higher flexural yield stresses than did the ethylene-butene/polypropylene blends. Further, the wood flour composites exhibited brittle fractures during tensile tests and had lower impact strengths than those of the ethylene-butene/polypropylene blends. On the other hand, the addition of the lignocellulose nanofibers did not decrease the impact strength of the ethylene-butene/polypropylene blends. Finally, the addition of wood flour and the lignocellulose nanofibers increased the crystallization temperature and crystallization rate of polypropylene. The increases were more remarkable in the case of the lignocellulose nanofibers than for wood flour. PMID:28788222

  13. Influences of operational parameters on phosphorus removal in batch and continuous electrocoagulation process performance.

    PubMed

    Nguyen, Dinh Duc; Yoon, Yong Soo; Bui, Xuan Thanh; Kim, Sung Su; Chang, Soon Woong; Guo, Wenshan; Ngo, Huu Hao

    2017-11-01

    Performance of an electrocoagulation (EC) process in batch and continuous operating modes was thoroughly investigated and evaluated for enhancing wastewater phosphorus removal under various operating conditions, individually or combined with initial phosphorus concentration, wastewater conductivity, current density, and electrolysis times. The results revealed excellent phosphorus removal (72.7-100%) for both processes within 3-6 min of electrolysis, with relatively low energy requirements, i.e., less than 0.5 kWh/m 3 for treated wastewater. However, the removal efficiency of phosphorus in the continuous EC operation mode was better than that in batch mode within the scope of the study. Additionally, the rate and efficiency of phosphorus removal strongly depended on operational parameters, including wastewater conductivity, initial phosphorus concentration, current density, and electrolysis time. Based on experimental data, statistical model verification of the response surface methodology (RSM) (multiple factor optimization) was also established to provide further insights and accurately describe the interactive relationship between the process variables, thus optimizing the EC process performance. The EC process using iron electrodes is promising for improving wastewater phosphorus removal efficiency, and RSM can be a sustainable tool for predicting the performance of the EC process and explaining the influence of the process variables.

  14. Instructional control of reinforcement learning: A behavioral and neurocomputational investigation

    PubMed Central

    Doll, Bradley B.; Jacobs, W. Jake; Sanfey, Alan G.; Frank, Michael J.

    2011-01-01

    Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is “overridden” at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract “Q-learning” and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a “confirmation bias” in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes. PMID:19595993

  15. How partial reinforcement of food cues affects the extinction and reacquisition of appetitive responses. A new model for dieting success?

    PubMed

    van den Akker, Karolien; Havermans, Remco C; Bouton, Mark E; Jansen, Anita

    2014-10-01

    Animals and humans can easily learn to associate an initially neutral cue with food intake through classical conditioning, but extinction of learned appetitive responses can be more difficult. Intermittent or partial reinforcement of food cues causes especially persistent behaviour in animals: after exposure to such learning schedules, the decline in responding that occurs during extinction is slow. After extinction, increases in responding with renewed reinforcement of food cues (reacquisition) might be less rapid after acquisition with partial reinforcement. In humans, it may be that the eating behaviour of some individuals resembles partial reinforcement schedules to a greater extent, possibly affecting dieting success by interacting with extinction and reacquisition. Furthermore, impulsivity has been associated with less successful dieting, and this association might be explained by impulsivity affecting the learning and extinction of appetitive responses. In the present two studies, the effects of different reinforcement schedules and impulsivity on the acquisition, extinction, and reacquisition of appetitive responses were investigated in a conditioning paradigm involving food rewards in healthy humans. Overall, the results indicate both partial reinforcement schedules and, possibly, impulsivity to be associated with worse extinction performance. A new model of dieting success is proposed: learning histories and, perhaps, certain personality traits (impulsivity) can interfere with the extinction and reacquisition of appetitive responses to food cues and they may be causally related to unsuccessful dieting. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. Free vibration of composite re-bars in reinforced structures

    NASA Astrophysics Data System (ADS)

    Kadioglu, Fethi

    2005-11-01

    The effect of composite rebar's shape in reinforced concrete beam-type structures on the natural frequencies and modes shapes is investigated through finite element analysis in this paper. Steel rebars are being replaced with composite rebars due to their better ability to resist corrosion in reinforced concrete structures for many infrastructure applications. A variety of composite rebar shapes can be obtained through the pultrusion process. It will be interesting to investigate their shape on free vibration characteristics. The results of natural frequencies and mode shapes are presented and compared for the different composite rebar shapes. The effects of various boundary conditions for different rebar shapes are also investigated.

  17. Regulating recognition decisions through incremental reinforcement learning.

    PubMed

    Han, Sanghoon; Dobbins, Ian G

    2009-06-01

    Does incremental reinforcement learning influence recognition memory judgments? We examined this question by subtly altering the relative validity or availability of feedback in order to differentially reinforce old or new recognition judgments. Experiment 1 probabilistically and incorrectly indicated that either misses or false alarms were correct in the context of feedback that was otherwise accurate. Experiment 2 selectively withheld feedback for either misses or false alarms in the context of feedback that was otherwise present. Both manipulations caused prominent shifts of recognition memory decision criteria that remained for considerable periods even after feedback had been altogether removed. Overall, these data demonstrate that incremental reinforcement-learning mechanisms influence the degree of caution subjects exercise when evaluating explicit memories.

  18. Infant Contingency Learning in Different Cultural Contexts

    ERIC Educational Resources Information Center

    Graf, Frauke; Lamm, Bettina; Goertz, Claudia; Kolling, Thorsten; Freitag, Claudia; Spangler, Sibylle; Fassbender, Ina; Teubert, Manuel; Vierhaus, Marc; Keller, Heidi; Lohaus, Arnold; Schwarzer, Gudrun; Knopf, Monika

    2012-01-01

    Three-month-old Cameroonian Nso farmer and German middle-class infants were compared regarding learning and retention in a computerized mobile task. Infants achieving a preset learning criterion during reinforcement were tested for immediate and long-term retention measured in terms of an increased response rate after reinforcement and after a…

  19. Adaptive Educational Software by Applying Reinforcement Learning

    ERIC Educational Resources Information Center

    Bennane, Abdellah

    2013-01-01

    The introduction of the intelligence in teaching software is the object of this paper. In software elaboration process, one uses some learning techniques in order to adapt the teaching software to characteristics of student. Generally, one uses the artificial intelligence techniques like reinforcement learning, Bayesian network in order to adapt…

  20. A Robust Cooperated Control Method with Reinforcement Learning and Adaptive H∞ Control

    NASA Astrophysics Data System (ADS)

    Obayashi, Masanao; Uchiyama, Shogo; Kuremoto, Takashi; Kobayashi, Kunikazu

    This study proposes a robust cooperated control method combining reinforcement learning with robust control to control the system. A remarkable characteristic of the reinforcement learning is that it doesn't require model formula, however, it doesn't guarantee the stability of the system. On the other hand, robust control system guarantees stability and robustness, however, it requires model formula. We employ both the actor-critic method which is a kind of reinforcement learning with minimal amount of computation to control continuous valued actions and the traditional robust control, that is, H∞ control. The proposed system was compared method with the conventional control method, that is, the actor-critic only used, through the computer simulation of controlling the angle and the position of a crane system, and the simulation result showed the effectiveness of the proposed method.

  1. Crack Identification in CFRP Laminated Beams Using Multi-Resolution Modal Teager–Kaiser Energy under Noisy Environments

    PubMed Central

    Xu, Wei; Cao, Maosen; Ding, Keqin; Radzieński, Maciej; Ostachowicz, Wiesław

    2017-01-01

    Carbon fiber reinforced polymer laminates are increasingly used in the aerospace and civil engineering fields. Identifying cracks in carbon fiber reinforced polymer laminated beam components is of considerable significance for ensuring the integrity and safety of the whole structures. With the development of high-resolution measurement technologies, mode-shape-based crack identification in such laminated beam components has become an active research focus. Despite its sensitivity to cracks, however, this method is susceptible to noise. To address this deficiency, this study proposes a new concept of multi-resolution modal Teager–Kaiser energy, which is the Teager–Kaiser energy of a mode shape represented in multi-resolution, for identifying cracks in carbon fiber reinforced polymer laminated beams. The efficacy of this concept is analytically demonstrated by identifying cracks in Timoshenko beams with general boundary conditions; and its applicability is validated by diagnosing cracks in a carbon fiber reinforced polymer laminated beam, whose mode shapes are precisely acquired via non-contact measurement using a scanning laser vibrometer. The analytical and experimental results show that multi-resolution modal Teager–Kaiser energy is capable of designating the presence and location of cracks in these beams under noisy environments. This proposed method holds promise for developing crack identification systems for carbon fiber reinforced polymer laminates. PMID:28773016

  2. A spiking neural network model of model-free reinforcement learning with high-dimensional sensory input and perceptual ambiguity.

    PubMed

    Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji

    2015-01-01

    A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach.

  3. Punishment insensitivity and impaired reinforcement learning in preschoolers.

    PubMed

    Briggs-Gowan, Margaret J; Nichols, Sara R; Voss, Joel; Zobel, Elvira; Carter, Alice S; McCarthy, Kimberly J; Pine, Daniel S; Blair, James; Wakschlag, Lauren S

    2014-01-01

    Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a developmental vulnerability to psychopathic traits. One hundred and fifty-seven preschoolers (mean age 4.7 ± 0.8 years) participated in a substudy that was embedded within a larger project. Children completed the 'Stars-in-Jars' task, which involved learning to select rewarded jars and avoid punished jars. Maternal report of responsiveness to socialization was assessed with the Punishment Insensitivity and Low Concern for Others scales of the Multidimensional Assessment of Preschool Disruptive Behavior (MAP-DB). Punishment Insensitivity, but not Low Concern for Others, was significantly associated with reinforcement learning in multivariate models that accounted for age and sex. Specifically, higher Punishment Insensitivity was associated with significantly lower overall performance and more errors on punished trials ('passive avoidance'). Impairments in reinforcement learning manifest in preschoolers who are high in maternal ratings of Punishment Insensitivity. If replicated, these findings may help to pinpoint the neurodevelopmental antecedents of psychopathic tendencies and suggest novel intervention targets beginning in early childhood. © 2013 The Authors. Journal of Child Psychology and Psychiatry © 2013 Association for Child and Adolescent Mental Health.

  4. A Spiking Neural Network Model of Model-Free Reinforcement Learning with High-Dimensional Sensory Input and Perceptual Ambiguity

    PubMed Central

    Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji

    2015-01-01

    A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach. PMID:25734662

  5. Rats bred for helplessness exhibit positive reinforcement learning deficits which are not alleviated by an antidepressant dose of the MAO-B inhibitor deprenyl.

    PubMed

    Schulz, Daniela; Henn, Fritz A; Petri, David; Huston, Joseph P

    2016-08-04

    Principles of negative reinforcement learning may play a critical role in the etiology and treatment of depression. We examined the integrity of positive reinforcement learning in congenitally helpless (cH) rats, an animal model of depression, using a random ratio schedule and a devaluation-extinction procedure. Furthermore, we tested whether an antidepressant dose of the monoamine oxidase (MAO)-B inhibitor deprenyl would reverse any deficits in positive reinforcement learning. We found that cH rats (n=9) were impaired in the acquisition of even simple operant contingencies, such as a fixed interval (FI) 20 schedule. cH rats exhibited no apparent deficits in appetite or reward sensitivity. They reacted to the devaluation of food in a manner consistent with a dose-response relationship. Reinforcer motivation as assessed by lever pressing across sessions with progressively decreasing reward probabilities was highest in congenitally non-helpless (cNH, n=10) rats as long as the reward probabilities remained relatively high. cNH compared to wild-type (n=10) rats were also more resistant to extinction across sessions. Compared to saline (n=5), deprenyl (n=5) reduced the duration of immobility of cH rats in the forced swimming test, indicative of antidepressant effects, but did not restore any deficits in the acquisition of a FI 20 schedule. We conclude that positive reinforcement learning was impaired in rats bred for helplessness, possibly due to motivational impairments but not deficits in reward sensitivity, and that deprenyl exerted antidepressant effects but did not reverse the deficits in positive reinforcement learning. Copyright © 2016 IBRO. Published by Elsevier Ltd. All rights reserved.

  6. Process model comparison and transferability across bioreactor scales and modes of operation for a mammalian cell bioprocess.

    PubMed

    Craven, Stephen; Shirsat, Nishikant; Whelan, Jessica; Glennon, Brian

    2013-01-01

    A Monod kinetic model, logistic equation model, and statistical regression model were developed for a Chinese hamster ovary cell bioprocess operated under three different modes of operation (batch, bolus fed-batch, and continuous fed-batch) and grown on two different bioreactor scales (3 L bench-top and 15 L pilot-scale). The Monod kinetic model was developed for all modes of operation under study and predicted cell density, glucose glutamine, lactate, and ammonia concentrations well for the bioprocess. However, it was computationally demanding due to the large number of parameters necessary to produce a good model fit. The transferability of the Monod kinetic model structure and parameter set across bioreactor scales and modes of operation was investigated and a parameter sensitivity analysis performed. The experimentally determined parameters had the greatest influence on model performance. They changed with scale and mode of operation, but were easily calculated. The remaining parameters, which were fitted using a differential evolutionary algorithm, were not as crucial. Logistic equation and statistical regression models were investigated as alternatives to the Monod kinetic model. They were less computationally intensive to develop due to the absence of a large parameter set. However, modeling of the nutrient and metabolite concentrations proved to be troublesome due to the logistic equation model structure and the inability of both models to incorporate a feed. The complexity, computational load, and effort required for model development has to be balanced with the necessary level of model sophistication when choosing which model type to develop for a particular application. Copyright © 2012 American Institute of Chemical Engineers (AIChE).

  7. Antipsychotic dose modulates behavioral and neural responses to feedback during reinforcement learning in schizophrenia.

    PubMed

    Insel, Catherine; Reinen, Jenna; Weber, Jochen; Wager, Tor D; Jarskog, L Fredrik; Shohamy, Daphna; Smith, Edward E

    2014-03-01

    Schizophrenia is characterized by an abnormal dopamine system, and dopamine blockade is the primary mechanism of antipsychotic treatment. Consistent with the known role of dopamine in reward processing, prior research has demonstrated that patients with schizophrenia exhibit impairments in reward-based learning. However, it remains unknown how treatment with antipsychotic medication impacts the behavioral and neural signatures of reinforcement learning in schizophrenia. The goal of this study was to examine whether antipsychotic medication modulates behavioral and neural responses to prediction error coding during reinforcement learning. Patients with schizophrenia completed a reinforcement learning task while undergoing functional magnetic resonance imaging. The task consisted of two separate conditions in which participants accumulated monetary gain or avoided monetary loss. Behavioral results indicated that antipsychotic medication dose was associated with altered behavioral approaches to learning, such that patients taking higher doses of medication showed increased sensitivity to negative reinforcement. Higher doses of antipsychotic medication were also associated with higher learning rates (LRs), suggesting that medication enhanced sensitivity to trial-by-trial feedback. Neuroimaging data demonstrated that antipsychotic dose was related to differences in neural signatures of feedback prediction error during the loss condition. Specifically, patients taking higher doses of medication showed attenuated prediction error responses in the striatum and the medial prefrontal cortex. These findings indicate that antipsychotic medication treatment may influence motivational processes in patients with schizophrenia.

  8. Evaluation of Students' Perceptions Towards An Innovative Teaching-Learning Method During Pharmacology Revision Classes: Autobiography of Drugs.

    PubMed

    Joshi, Anuradha; Ganjiwale, Jaishree

    2015-07-01

    Various studies in medical education have shown that active learning strategies should be incorporated into the teaching-learning process to make learning more effective, efficient and meaningful. The aim of this study was to evaluate student's perceptions on an innovative revision method conducted in Pharmacology i.e. in form of Autobiography of Drugs. The main objective of study was to help students revise the core topics in Pharmacology in an interesting way. Questionnaire based survey on a newer method of pharmacology revision in two batches of second year MBBS students of a tertiary care teaching medical college. Various sessions on Autobiography of Drugs were conducted amongst two batches of second year MBBS students, during their Pharmacology revision classes. Student's perceptions were documented with the help of a five point likert scale through a questionnaire regarding quality, content and usefulness of this method. Descriptive analysis. Students of both the batches appreciated the innovative method taken up for revision. The median scores in most of the domains in both batches were four out of five, indicative of good response. Feedback from open-ended questions also revealed that the innovative module on "Autobiography of Drugs" was taken as a positive learning experience by students. Autobiography of drugs has been used to help students recall topics that they have learnt through other teachings methods. Autobiography sessions in Pharmacology during revision slots, can be one of the interesting ways in helping students revise and recall topics which have already been taught in theory classes.

  9. How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

    PubMed

    Krigolson, Olav E; Hassall, Cameron D; Handy, Todd C

    2014-03-01

    Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.

  10. Microstimulation of the human substantia nigra alters reinforcement learning.

    PubMed

    Ramayya, Ashwin G; Misra, Amrit; Baltuch, Gordon H; Kahana, Michael J

    2014-05-14

    Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action-reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action-reward associations rather than stimulus-reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action-reward associations during reinforcement learning. Copyright © 2014 the authors 0270-6474/14/346887-09$15.00/0.

  11. Human reinforcement learning subdivides structured action spaces by learning effector-specific values

    PubMed Central

    Gershman, Samuel J.; Pesaran, Bijan; Daw, Nathaniel D.

    2009-01-01

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable, due to the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning – such as prediction error signals for action valuation associated with dopamine and the striatum – can cope with this “curse of dimensionality.” We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and BOLD activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to “divide and conquer” reinforcement learning over high-dimensional action spaces. PMID:19864565

  12. Human reinforcement learning subdivides structured action spaces by learning effector-specific values.

    PubMed

    Gershman, Samuel J; Pesaran, Bijan; Daw, Nathaniel D

    2009-10-28

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning-such as prediction error signals for action valuation associated with dopamine and the striatum-can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.

  13. Separation of time-based and trial-based accounts of the partial reinforcement extinction effect.

    PubMed

    Bouton, Mark E; Woods, Amanda M; Todd, Travis P

    2014-01-01

    Two appetitive conditioning experiments with rats examined time-based and trial-based accounts of the partial reinforcement extinction effect (PREE). In the PREE, the loss of responding that occurs in extinction is slower when the conditioned stimulus (CS) has been paired with a reinforcer on some of its presentations (partially reinforced) instead of every presentation (continuously reinforced). According to a time-based or "time-accumulation" view (e.g., Gallistel and Gibbon, 2000), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger amount of time has accumulated in the CS over trials. In contrast, according to a trial-based view (e.g., Capaldi, 1967), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger number of CS presentations. Experiment 1 used a procedure that equated partially and continuously reinforced groups on their expected times to reinforcement during conditioning. A PREE was still observed. Experiment 2 then used an extinction procedure that allowed time in the CS and the number of trials to accumulate differentially through extinction. The PREE was still evident when responding was examined as a function of expected time units to the reinforcer, but was eliminated when responding was examined as a function of expected trial units to the reinforcer. There was no evidence that the animal responded according to the ratio of time accumulated during the CS in extinction over the time in the CS expected before the reinforcer. The results thus favor a trial-based account over a time-based account of extinction and the PREE. This article is part of a Special Issue entitled: Associative and Temporal Learning. Copyright © 2013 Elsevier B.V. All rights reserved.

  14. Autonomous reinforcement learning with experience replay.

    PubMed

    Wawrzyński, Paweł; Tanwani, Ajay Kumar

    2013-05-01

    This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. Copyright © 2012 Elsevier Ltd. All rights reserved.

  15. Tensile and tribological properties of a short-carbon-fiber-reinforced peek composite doped with carbon nanotubes

    NASA Astrophysics Data System (ADS)

    Li, J.; Zhang, L. Q.

    2009-09-01

    The main objective of this paper is to develop a high-wear-resistant short-carbon-fiber-reinforced polyetheretherketone (PEEK) composite by introducing additional multiwall carbon nanotubes (MWCNTs) into it. The compounds were mixed in a Haake batch mixer and fabricated into sheets by compression molding. Samples with different aspect ratios and concentrations of fillers were tested for wear resistance. The worn surfaces of the samples were examined by using a scanning electron microscope (SEM), and the photomicrographs revealed a higher wear resistance of the samples containing the additional carbon nanotubes. Also, a better interfacial adhesion between the short carbon fibers and vinyl ester in the composite was observed.

  16. Electrophysiological correlates of reinforcement learning in young people with Tourette syndrome with and without co-occurring ADHD symptoms.

    PubMed

    Shephard, Elizabeth; Jackson, Georgina M; Groom, Madeleine J

    2016-06-01

    Altered reinforcement learning is implicated in the causes of Tourette syndrome (TS) and attention-deficit/hyperactivity disorder (ADHD). TS and ADHD frequently co-occur but how this affects reinforcement learning has not been investigated. We examined the ability of young people with TS (n=18), TS+ADHD (N=17), ADHD (n=13) and typically developing controls (n=20) to learn and reverse stimulus-response (S-R) associations based on positive and negative reinforcement feedback. We used a 2 (TS-yes, TS-no)×2 (ADHD-yes, ADHD-no) factorial design to assess the effects of TS, ADHD, and their interaction on behavioural (accuracy, RT) and event-related potential (stimulus-locked P3, feedback-locked P2, feedback-related negativity, FRN) indices of learning and reversing the S-R associations. TS was associated with intact learning and reversal performance and largely typical ERP amplitudes. ADHD was associated with lower accuracy during S-R learning and impaired reversal learning (significantly reduced accuracy and a trend for smaller P3 amplitude). The results indicate that co-occurring ADHD symptoms impair reversal learning in TS+ADHD. The implications of these findings for behavioural tic therapies are discussed. Copyright © 2016 ISDN. Published by Elsevier Ltd. All rights reserved.

  17. Utilising reinforcement learning to develop strategies for driving auditory neural implants.

    PubMed

    Lee, Geoffrey W; Zambetta, Fabio; Li, Xiaodong; Paolini, Antonio G

    2016-08-01

    In this paper we propose a novel application of reinforcement learning to the area of auditory neural stimulation. We aim to develop a simulation environment which is based off real neurological responses to auditory and electrical stimulation in the cochlear nucleus (CN) and inferior colliculus (IC) of an animal model. Using this simulator we implement closed loop reinforcement learning algorithms to determine which methods are most effective at learning effective acoustic neural stimulation strategies. By recording a comprehensive set of acoustic frequency presentations and neural responses from a set of animals we created a large database of neural responses to acoustic stimulation. Extensive electrical stimulation in the CN and the recording of neural responses in the IC provides a mapping of how the auditory system responds to electrical stimuli. The combined dataset is used as the foundation for the simulator, which is used to implement and test learning algorithms. Reinforcement learning, utilising a modified n-Armed Bandit solution, is implemented to demonstrate the model's function. We show the ability to effectively learn stimulation patterns which mimic the cochlea's ability to covert acoustic frequencies to neural activity. Time taken to learn effective replication using neural stimulation takes less than 20 min under continuous testing. These results show the utility of reinforcement learning in the field of neural stimulation. These results can be coupled with existing sound processing technologies to develop new auditory prosthetics that are adaptable to the recipients current auditory pathway. The same process can theoretically be abstracted to other sensory and motor systems to develop similar electrical replication of neural signals.

  18. Embedded Incremental Feature Selection for Reinforcement Learning

    DTIC Science & Technology

    2012-05-01

    Prior to this work, feature selection for reinforce- ment learning has focused on linear value function ap- proximation ( Kolter and Ng, 2009; Parr et al...InProceed- ings of the the 23rd International Conference on Ma- chine Learning, pages 449–456. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature

  19. Social Learning, Reinforcement and Crime: Evidence from Three European Cities

    ERIC Educational Resources Information Center

    Tittle, Charles R.; Antonaccio, Olena; Botchkovar, Ekaterina

    2012-01-01

    This study reports a cross-cultural test of Social Learning Theory using direct measures of social learning constructs and focusing on the causal structure implied by the theory. Overall, the results strongly confirm the main thrust of the theory. Prior criminal reinforcement and current crime-favorable definitions are highly related in all three…

  20. Novelty and Inductive Generalization in Human Reinforcement Learning

    PubMed Central

    Gershman, Samuel J.; Niv, Yael

    2015-01-01

    In reinforcement learning, a decision maker searching for the most rewarding option is often faced with the question: what is the value of an option that has never been tried before? One way to frame this question is as an inductive problem: how can I generalize my previous experience with one set of options to a novel option? We show how hierarchical Bayesian inference can be used to solve this problem, and describe an equivalence between the Bayesian model and temporal difference learning algorithms that have been proposed as models of reinforcement learning in humans and animals. According to our view, the search for the best option is guided by abstract knowledge about the relationships between different options in an environment, resulting in greater search efficiency compared to traditional reinforcement learning algorithms previously applied to human cognition. In two behavioral experiments, we test several predictions of our model, providing evidence that humans learn and exploit structured inductive knowledge to make predictions about novel options. In light of this model, we suggest a new interpretation of dopaminergic responses to novelty. PMID:25808176

  1. Learning with incomplete information and the mathematical structure behind it.

    PubMed

    Kühn, Reimer; Stamatescu, Ion-Olimpiu

    2007-07-01

    We investigate the problem of learning with incomplete information as exemplified by learning with delayed reinforcement. We study a two phase learning scenario in which a phase of Hebbian associative learning based on momentary internal representations is supplemented by an 'unlearning' phase depending on a graded reinforcement signal. The reinforcement signal quantifies the success-rate globally for a number of learning steps in phase one, and 'unlearning' is indiscriminate with respect to associations learnt in that phase. Learning according to this model is studied via simulations and analytically within a student-teacher scenario for both single layer networks and, for a committee machine. Success and speed of learning depend on the ratio lambda of the learning rates used for the associative Hebbian learning phase and for the unlearning-correction in response to the reinforcement signal, respectively. Asymptotically perfect generalization is possible only, if this ratio exceeds a critical value lambda( c ), in which case the generalization error exhibits a power law decay with the number of examples seen by the student, with an exponent that depends in a non-universal manner on the parameter lambda. We find these features to be robust against a wide spectrum of modifications of microscopic modelling details. Two illustrative applications-one of a robot learning to navigate a field containing obstacles, and the problem of identifying a specific component in a collection of stimuli-are also provided.

  2. Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity, and Performance

    DTIC Science & Technology

    2014-09-29

    Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity , and Performance W. Bradley Knox...positive a trainer’s reward values are; temporal discounting, the extent to which future reward is discounted in value; episodicity , whether task...learning occurs in discrete learning episodes instead of one continuing session; and task performance, the agent’s performance on the task the trainer

  3. Enhanced removal of sulfonamide antibiotics by KOH-activated anthracite coal: Batch and fixed-bed studies.

    PubMed

    Zuo, Linzi; Ai, Jing; Fu, Heyun; Chen, Wei; Zheng, Shourong; Xu, Zhaoyi; Zhu, Dongqiang

    2016-04-01

    The presence of sulfonamide antibiotics in aquatic environments poses potential risks to human health and ecosystems. In the present study, a highly porous activated carbon was prepared by KOH activation of an anthracite coal (Anth-KOH), and its adsorption properties toward two sulfonamides (sulfamethoxazole and sulfapyridine) and three smaller-sized monoaromatics (phenol, 4-nitrophenol and 1,3-dinitrobenzene) were examined in both batch and fixed-bed adsorption experiments to probe the interplay between adsorbate molecular size and adsorbent pore structure. A commercial powder microporous activated carbon (PAC) and a commercial mesoporous carbon (CMK-3) possessing distinct pore properties were included as comparative adsorbents. Among the three adsorbents Anth-KOH exhibited the largest adsorption capacities for all test adsorbates (especially the two sulfonamides) in both batch mode and fixed-bed mode. After being normalized by the adsorbent surface area, the batch adsorption isotherms of sulfonamides on PAC and Anth-KOH were displaced upward relative to the isotherms on CMK-3, likely due to the micropore-filling effect facilitated by the microporosity of adsorbents. In the fixed-bed mode, the surface area-normalized adsorption capacities of Anth-KOH for sulfonamides were close to that of CMK-3, and higher than that of PAC. The irregular, closed micropores of PAC might impede the diffusion of the relatively large-sized sulfonamide molecules and in turn led to lowered fixed-bed adsorption capacities. The overall superior adsorption of sulfonamides on Anth-KOH can be attributed to its large specific surface area (2514 m(2)/g), high pore volume (1.23 cm(3)/g) and large micropore sizes (centered at 2.0 nm). These findings imply that KOH-activated anthracite coal is a promising adsorbent for the removal of sulfonamide antibiotics from aqueous solution. Copyright © 2016 Elsevier Ltd. All rights reserved.

  4. Comparison of micro push-out bond strengths of two fiber posts luted using simplified adhesive approaches.

    PubMed

    Mumcu, Emre; Erdemir, Ugur; Topcu, Fulya Toksoy

    2010-05-01

    By means of a micro push-out test, this study compared the bond strengths of two types of fiber-reinforced posts cemented with luting cements based on two currently available adhesive approaches as well as evaluated their failure modes. Sixty extracted single-rooted human maxillary central incisor and canine teeth were sectioned below the cementoenamel junction, and the roots were endodontically treated. Following standardized post space preparation, the roots were divided into two fiber post groups and then further into three subgroups of 10 specimens each according to the luting cements. A push-out test was performed to measure regional bond strengths, and the fracture modes were evaluated using a stereomicroscope. At the root section, there were no statistically significant differences (p>0.05) in push-out bond strength among the tested luting cements. Nevertheless, the push-out bond strength values of glass fiber-reinforced posts were higher than those of carbon fiber-reinforced posts, irrespective of the adhesive approach used. On failure mode, the predominant failure mode was adhesive failure between dentin and the luting cement.

  5. The reinforcing value and liking of resistance training and aerobic exercise as predictors of adult’s physical actively behavior

    USDA-ARS?s Scientific Manuscript database

    Background: Reinforcing value is a stronger predictor than hedonic value (liking) for engaging in drug use, gambling, and eating. The associations of reinforcing value and liking with physical activity of adults have not yet been studied and may depend on the mode of exercise available during exerc...

  6. Fuzzy Q-Learning for Generalization of Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1996-01-01

    Fuzzy Q-Learning, introduced earlier by the author, is an extension of Q-Learning into fuzzy environments. GARIC is a methodology for fuzzy reinforcement learning. In this paper, we introduce GARIC-Q, a new method for doing incremental Dynamic Programming using a society of intelligent agents which are controlled at the top level by Fuzzy Q-Learning and at the local level, each agent learns and operates based on GARIC. GARIC-Q improves the speed and applicability of Fuzzy Q-Learning through generalization of input space by using fuzzy rules and bridges the gap between Q-Learning and rule based intelligent systems.

  7. Framework for robot skill learning using reinforcement learning

    NASA Astrophysics Data System (ADS)

    Wei, Yingzi; Zhao, Mingyang

    2003-09-01

    Robot acquiring skill is a process similar to human skill learning. Reinforcement learning (RL) is an on-line actor critic method for a robot to develop its skill. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. We present an augmented reward function that provides a new way for RL controller to incorporate prior knowledge and experience into the RL controller. Also, the difference form of augmented reward function is considered carefully. The additional reward beyond conventional reward will provide more heuristic information for RL. In this paper, we present a strategy for the task of complex skill learning. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. The new form of value function is introduced to attain smooth motion switching swiftly. We present a formal, but practical, framework for robot skill learning and also illustrate with an example the utility of method for learning skilled robot control on line.

  8. Effects of Carbon Nanomaterial Reinforcement on Composite Joints Under Cyclic and Impact Loading

    DTIC Science & Technology

    2012-03-01

    prepreg . 2 Figure 1. Composite decks on DDG1000. (From [3]) Figure 2. USV built from nanotube-reinforced carbon fiber composites. (From [2...been proven that the infusion of CNTs enhances the strength and fracture toughness of CFRP laminates under static loading (mode I and mode II...Kostopoulos et al. [5] investigated the influence of the multi-walled carbon nanotubes (MWCNTs) on the impact and after-impact behavior of CFRP laminates

  9. Proactivity and Reinforcement: The Contingency of Social Behavior

    ERIC Educational Resources Information Center

    Williams, J. Sherwood; And Others

    1976-01-01

    This paper analyzes development of group structure in terms of the stimulus-sampling perspective. Learning is the continual sampling of possibilities, with those reinforced possibilities increasing in probability of occurance. This contingency learning approach is tested experimentally. (NG)

  10. Making environmental health interesting for medical students-internet assisted facilitated collaborative learning approach.

    PubMed

    Sudharsanam, Manni Balasubramaniam

    2014-01-01

    Topics on environmental health are usually neglected by students and it is necessary for them to learn this area with a public health perspective as environment plays a vital role in multi-factorial causation of diseases. Hence there is a need for alternative teaching/learning methods to facilitate students in acquiring the required knowledge. To increase the student interest and enhance their participation in acquiring knowledge in public health perspective of environmental health. Teaching Objectives/Learning Were: At the end of the session students should know the importance of air as an environmental factor in disease causation in special reference to public health hazards, the major sources of air pollution, major pollutants causing the health hazards, the way to measure pollutants and control them. The whole class of students was divided into two batches and one session was planned for each batch. Each batch was divided into six small groups. The groups were given task of exploring the internet on the different topics mentioned in the learning objectives. All the students were asked to explore, compile information and collectively prepare a presentation and present their findings based on their reviews. Students' feedback was collected at the end of each session. Eighty five percent of them were clear about the learning objectives and interested about internet learning. Most of them gave a positive opinion about the newer teaching learning method. Internet assisted group study served as a valuable alternative, innovative, and interesting tool to teach and learn the environmental health as revealed by students' feedback.

  11. Lactate production as representative of the fermentation potential of Corynebacterium glutamicum 2262 in a one-step process.

    PubMed

    Khuat, Hoang Bao Truc; Kaboré, Abdoul Karim; Olmos, Eric; Fick, Michel; Boudrant, Joseph; Goergen, Jean-Louis; Delaunay, Stéphane; Guedon, Emmanuel

    2014-01-01

    The fermentative properties of thermo-sensitive strain Corynebacterium glutamicum 2262 were investigated in processes coupling aerobic cell growth and the anaerobic fermentation phase. In particular, the influence of two modes of fermentation on the production of lactate, the fermentation product model, was studied. In both processes, lactate was produced in significant amount, 27 g/L in batch culture, and up to 55.8 g/L in fed-batch culture, but the specific production rate in the fed-batch culture was four times lower than that in the batch culture. Compared to other investigated fermentation processes, our strategy resulted in the highest yield of lactic acid from biomass. Lactate production by C. glutamicum 2262 thus revealed the capability of the strain to produce various fermentation products from pyruvate.

  12. Mobile robots exploration through cnn-based reinforcement learning.

    PubMed

    Tai, Lei; Liu, Ming

    2016-01-01

    Exploration in an unknown environment is an elemental application for mobile robots. In this paper, we outlined a reinforcement learning method aiming for solving the exploration problem in a corridor environment. The learning model took the depth image from an RGB-D sensor as the only input. The feature representation of the depth image was extracted through a pre-trained convolutional-neural-networks model. Based on the recent success of deep Q-network on artificial intelligence, the robot controller achieved the exploration and obstacle avoidance abilities in several different simulated environments. It is the first time that the reinforcement learning is used to build an exploration strategy for mobile robots through raw sensor information.

  13. Altered neural encoding of prediction errors in assault-related posttraumatic stress disorder.

    PubMed

    Ross, Marisa C; Lenow, Jennifer K; Kilts, Clinton D; Cisler, Josh M

    2018-05-12

    Posttraumatic stress disorder (PTSD) is widely associated with deficits in extinguishing learned fear responses, which relies on mechanisms of reinforcement learning (e.g., updating expectations based on prediction errors). However, the degree to which PTSD is associated with impairments in general reinforcement learning (i.e., outside of the context of fear stimuli) remains poorly understood. Here, we investigate brain and behavioral differences in general reinforcement learning between adult women with and without a current diagnosis of PTSD. 29 adult females (15 PTSD with exposure to assaultive violence, 14 controls) underwent a neutral reinforcement-learning task (i.e., two arm bandit task) during fMRI. We modeled participant behavior using different adaptations of the Rescorla-Wagner (RW) model and used Independent Component Analysis to identify timecourses for large-scale a priori brain networks. We found that an anticorrelated and risk sensitive RW model best fit participant behavior, with no differences in computational parameters between groups. Women in the PTSD group demonstrated significantly less neural encoding of prediction errors in both a ventral striatum/mPFC and anterior insula network compared to healthy controls. Weakened encoding of prediction errors in the ventral striatum/mPFC and anterior insula during a general reinforcement learning task, outside of the context of fear stimuli, suggests the possibility of a broader conceptualization of learning differences in PTSD than currently proposed in current neurocircuitry models of PTSD. Copyright © 2018 Elsevier Ltd. All rights reserved.

  14. Automated Inattention and Fatigue Detection System in Distance Education for Elementary School Students

    ERIC Educational Resources Information Center

    Hwang, Kuo-An; Yang, Chia-Hao

    2009-01-01

    Most courses based on distance learning focus on the cognitive domain of learning. Because students are sometimes inattentive or tired, they may neglect the attention goal of learning. This study proposes an auto-detection and reinforcement mechanism for the distance-education system based on the reinforcement teaching strategy. If a student is…

  15. When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition

    ERIC Educational Resources Information Center

    Janssen, Christian P.; Gray, Wayne D.

    2012-01-01

    Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other…

  16. Altered Risk-Based Decision Making following Adolescent Alcohol Use Results from an Imbalance in Reinforcement Learning in Rats

    PubMed Central

    Hart, Andrew S.; Collins, Anne L.; Bernstein, Ilene L.; Phillips, Paul E. M.

    2012-01-01

    Alcohol use during adolescence has profound and enduring consequences on decision-making under risk. However, the fundamental psychological processes underlying these changes are unknown. Here, we show that alcohol use produces over-fast learning for better-than-expected, but not worse-than-expected, outcomes without altering subjective reward valuation. We constructed a simple reinforcement learning model to simulate altered decision making using behavioral parameters extracted from rats with a history of adolescent alcohol use. Remarkably, the learning imbalance alone was sufficient to simulate the divergence in choice behavior observed between these groups of animals. These findings identify a selective alteration in reinforcement learning following adolescent alcohol use that can account for a robust change in risk-based decision making persisting into later life. PMID:22615989

  17. AnSBBR applied to organic matter and sulfate removal: interaction effect between feed strategy and COD/sulfate ratio.

    PubMed

    Friedl, Gregor F; Mockaitis, Gustavo; Rodrigues, José A D; Ratusznei, Suzana M; Zaiat, Marcelo; Foresti, Eugênio

    2009-10-01

    A mechanically stirred anaerobic sequencing batch reactor containing anaerobic biomass immobilized on polyurethane foam cubes, treating low-strength synthetic wastewater (500 mg COD L(-1)), was operated under different operational conditions to assess the removal of organic matter and sulfate. These conditions were related to fill time, defined by the following feed strategies: batch mode of 10 min, fed-batch mode of 3 h and fed-batch mode of 6 h, and COD/[SO(4)(2-)] ratios of 1.34, 0.67, and 0.34 defined by organic matter concentration of 500 mg COD L(-1) and sulfate concentrations of 373, 746, and 1,493 mg SO(4)(2-) L(-1) in the influent. Thus, nine assays were performed to investigate the influence of each of these parameters, as well as the interaction effect, on the performance of the system. The reactor operated with agitation of 400 rpm, total volume of 4.0 L, and treated 2.0 L synthetic wastewater in 8-h cycles at 30 +/- 1 degrees C. During all assays, the reactor showed operational stability in relation to the monitored variables such as COD, sulfate, sulfide, sulfite, volatile acids, bicarbonate alkalinity, and solids, thus demonstrating the potential to apply this technology to the combined removal of organic matter and sulfate. In general, the results showed that the 3-h fed-batch operation with a COD/[SO(4)(2-)] ratio of 0.34 presented the best conditions for organic matter removal (89%). The best efficiency for sulfate removal (71%) was accomplished during the assay with a COD/[SO(4)(2-)] ratio of 1.34 and a fill time of 6 h. It was also observed that as fill time and sulfate concentration in the influent increased, the ratio between removed sulfate load and removed organic load also increased. However, it should be pointed out that the aim of this study was not to optimize the removal of organic matter and sulfate, but rather to analyze the behavior of the reactor during the different feed strategies and applied COD/[SO(4)(2-)] ratios, and mainly to analyze the interaction effect, an aspect that has not yet been explored in the literature for batch reactors.

  18. Evaluation of Students’ Perceptions Towards An Innovative Teaching-Learning Method During Pharmacology Revision Classes: Autobiography of Drugs

    PubMed Central

    Ganjiwale, Jaishree

    2015-01-01

    Introduction Various studies in medical education have shown that active learning strategies should be incorporated into the teaching–learning process to make learning more effective, efficient and meaningful. Objectives The aim of this study was to evaluate student’s perceptions on an innovative revision method conducted in Pharmacology i.e. in form of Autobiography of Drugs. The main objective of study was to help students revise the core topics in Pharmacology in an interesting way. Settings and Design Questionnaire based survey on a newer method of pharmacology revision in two batches of second year MBBS students of a tertiary care teaching medical college. Materials and Methods Various sessions on Autobiography of Drugs were conducted amongst two batches of second year MBBS students, during their Pharmacology revision classes. Student’s perceptions were documented with the help of a five point likert scale through a questionnaire regarding quality, content and usefulness of this method. Statistical analysis used Descriptive analysis. Results Students of both the batches appreciated the innovative method taken up for revision. The median scores in most of the domains in both batches were four out of five, indicative of good response. Feedback from open-ended questions also revealed that the innovative module on “Autobiography of Drugs” was taken as a positive learning experience by students. Conclusions Autobiography of drugs has been used to help students recall topics that they have learnt through other teachings methods. Autobiography sessions in Pharmacology during revision slots, can be one of the interesting ways in helping students revise and recall topics which have already been taught in theory classes. PMID:26393138

  19. A Discussion of Possibility of Reinforcement Learning Using Event-Related Potential in BCI

    NASA Astrophysics Data System (ADS)

    Yamagishi, Yuya; Tsubone, Tadashi; Wada, Yasuhiro

    Recently, Brain computer interface (BCI) which is a direct connecting pathway an external device such as a computer or a robot and a human brain have gotten a lot of attention. Since BCI can control the machines as robots by using the brain activity without using the voluntary muscle, the BCI may become a useful communication tool for handicapped persons, for instance, amyotrophic lateral sclerosis patients. However, in order to realize the BCI system which can perform precise tasks on various environments, it is necessary to design the control rules to adapt to the dynamic environments. Reinforcement learning is one approach of the design of the control rule. If this reinforcement leaning can be performed by the brain activity, it leads to the attainment of BCI that has general versatility. In this research, we paid attention to P300 of event-related potential as an alternative signal of the reward of reinforcement learning. We discriminated between the success and the failure trials from P300 of the EEG of the single trial by using the proposed discrimination algorithm based on Support vector machine. The possibility of reinforcement learning was examined from the viewpoint of the number of discriminated trials. It was shown that there was a possibility to be able to learn in most subjects.

  20. Service tough composite structures using the Z-direction reinforcement process

    NASA Technical Reports Server (NTRS)

    Freitas, Glenn; Magee, Constance; Boyce, Joseph; Bott, Richard

    1992-01-01

    Foster-Miller has developed a new process to provide through thickness reinforcement of composite structures. The process reinforces laminates locally or globally on-tool during standard autoclave processing cycles. Initial test results indicate that the method has the potential to significantly reduce delamination in carbon-epoxy. Laminates reinforced with the z-fiber process have demonstrated significant improvements in mode 1 fracture toughness and compression strength after impact. Unlike alternative methods, in-plane properties are not adversely affected.

  1. Comparative learning theory and its application in the training of horses.

    PubMed

    Cooper, J J

    1998-11-01

    Training can best be explained as a process that occurs through stimulus-response-reinforcement chains, whereby animals are conditioned to associate cues in their environment, with specific behavioural responses and their rewarding consequences. Research into learning in horses has concentrated on their powers of discrimination and on primary positive reinforcement schedules, where the correct response is paired with a desirable consequence such as food. In contrast, a number of other learning processes that are used in training have been widely studied in other species, but have received little scientific investigation in the horse. These include: negative reinforcement, where performance of the correct response is followed by removal of, or decrease in, intensity of a unpleasant stimulus; punishment, where an incorrect response is paired with an undesirable consequence, but without consistent prior warning; secondary conditioning, where a natural primary reinforcer such as food is closely associated with an arbitrary secondary reinforcer such as vocal praise; and variable or partial conditioning, where once the correct response has been learnt, reinforcement is presented according to an intermittent schedule to increase resistance to extinction outside of training.

  2. The nature of sexual reinforcement.

    PubMed Central

    Crawford, L L; Holloway, K S; Domjan, M

    1993-01-01

    Sexual reinforcers are not part of a regulatory system involved in the maintenance of critical metabolic processes, they differ for males and females, they differ as a function of species and mating system, and they show ontogenetic and seasonal changes related to endocrine conditions. Exposure to a member of the opposite sex without copulation can be sufficient for sexual reinforcement. However, copulatory access is a stronger reinforcer, and copulatory opportunity can serve to enhance the reinforcing efficacy of stimulus features of a sexual partner. Conversely, under certain conditions, noncopulatory exposure serves to decrease reinforcer efficacy. Many common learning phenomena such as acquisition, extinction, discrimination learning, second-order conditioning, and latent inhibition have been demonstrated in sexual conditioning. These observations extend the generality of findings obtained with more conventional reinforcers, but the mechanisms of these effects and their gender and species specificity remain to be explored. PMID:8354970

  3. Mesolimbic confidence signals guide perceptual learning in the absence of external feedback

    PubMed Central

    Guggenmos, Matthias; Wilbertz, Gregor; Hebart, Martin N; Sterzer, Philipp

    2016-01-01

    It is well established that learning can occur without external feedback, yet normative reinforcement learning theories have difficulties explaining such instances of learning. Here, we propose that human observers are capable of generating their own feedback signals by monitoring internal decision variables. We investigated this hypothesis in a visual perceptual learning task using fMRI and confidence reports as a measure for this monitoring process. Employing a novel computational model in which learning is guided by confidence-based reinforcement signals, we found that mesolimbic brain areas encoded both anticipation and prediction error of confidence—in remarkable similarity to previous findings for external reward-based feedback. We demonstrate that the model accounts for choice and confidence reports and show that the mesolimbic confidence prediction error modulation derived through the model predicts individual learning success. These results provide a mechanistic neurobiological explanation for learning without external feedback by augmenting reinforcement models with confidence-based feedback. DOI: http://dx.doi.org/10.7554/eLife.13388.001 PMID:27021283

  4. Monitoring corrosion of rebar embedded in mortar using guided ultrasonic waves

    NASA Astrophysics Data System (ADS)

    Ervin, Benjamin Lee

    This thesis investigates the use of guided mechanical waves for monitoring uniform and localized corrosion in steel reinforcing bars embedded in concrete. The main forms of structural deterioration from uniform corrosion in reinforced concrete are the destruction of the bond between steel and concrete, the loss of steel cross-sectional area, and the loss of concrete cross-sectional area from cracking and spalling. Localized corrosion, or pitting, leads to severe loss of steel cross-sectional area, creating a high risk of bar tensile failure and unintended transfer of loads to the surrounding concrete. Reinforcing bars were used to guide the waves, rather than bulk concrete, allowing for longer inspection distances due to lower material absorption, scattering, and divergence. Guided mechanical waves in low frequency ranges (50-200 kHz) and higher frequency ranges (2-8 MHz) were monitored in reinforced mortar specimens undergoing accelerated uniform corrosion. The frequency ranges chosen contain wave modes with varying amounts of interaction, i.e. displacement profile, at the material interface. Lower frequency modes were shown to be sensitive to the accumulation of corrosion product and the level of bond between the surrounding mortar and rebar. This allows for the onset of corrosion and bond deterioration to be monitored. Higher frequency modes were shown to be sensitive to changes in the bar profile surface, allowing for the loss of cross-sectional area to be monitored. Guided mechanical waves in the higher frequency range were also used to monitor reinforced mortar specimens undergoing accelerated localized corrosion. The high frequency modes were sensitive to the localized attack. Also promising was the unique frequency spectrum response for both uniform and localized corrosion, allowing the two corrosion types to be differentiated from through-transmission evaluation. The isolated effects of the reinforcing ribs, simulated debonding, simulated pitting, water surrounding, and mortar surrounding were also investigated using guided mechanical waves. Results are presented and discussed within the framework of a corrosion process degradation model and service life. A thorough review and discussion of the corrosion process, modeling the propagation of corrosion, nondestructive methods for monitoring corrosion in reinforced concrete, and guided mechanical waves have also been presented.

  5. In situ hydrogen utilization for high fraction acetate production in mixed culture hollow-fiber membrane biofilm reactor.

    PubMed

    Zhang, Fang; Ding, Jing; Shen, Nan; Zhang, Yan; Ding, Zhaowei; Dai, Kun; Zeng, Raymond J

    2013-12-01

    Syngas fermentation is a promising route for resource recovery. Acetate is an important industrial chemical product and also an attractive precursor for liquid biofuels production. This study demonstrated high fraction acetate production from syngas (H₂ and CO₂) in a hollow-fiber membrane biofilm reactor, in which the hydrogen utilizing efficiency reached 100% during the operational period. The maximum concentration of acetate in batch mode was 12.5 g/L, while the acetate concentration in continuous mode with a hydraulic retention time of 9 days was 3.6 ± 0.1 g/L. Since butyrate concentration was rather low and below 0.1 g/L, the acetate fraction was higher than 99% in both batch and continuous modes. Microbial community analysis showed that the biofilm was dominated by Clostridium spp., such as Clostridium ljungdahlii and Clostridium drakei, the percentage of which was 70.5%. This study demonstrates a potential technology for the in situ utilization of syngas and valuable chemical production.

  6. Continuous/Batch Mg/MgH2/H2O-Based Hydrogen Generator

    NASA Technical Reports Server (NTRS)

    Kindler, Andrew; Huang, Yuhong

    2010-01-01

    A proposed apparatus for generating hydrogen by means of chemical reactions of magnesium and magnesium hydride with steam would exploit the same basic principles as those discussed in the immediately preceding article, but would be designed to implement a hybrid continuous/batch mode of operation. The design concept would simplify the problem of optimizing thermal management and would help to minimize the size and weight necessary for generating a given amount of hydrogen.

  7. Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: a simulated robotic study.

    PubMed

    Mirolli, Marco; Santucci, Vieri G; Baldassarre, Gianluca

    2013-03-01

    An important issue of recent neuroscientific research is to understand the functional role of the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning. The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward prediction error similar to the computational TD-error, whose function is to guide an animal to maximize future rewards; the other holds that phasic dopamine is a sensory prediction error signal that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis that integrates these two contrasting positions: according to our view phasic dopamine represents a TD-like reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we perform a series of experiments with a simulated robotic system that has to learn different skills in order to get rewards. We compare different versions of the system in which we vary the composition of the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic reinforcements is able to reach high performance in sufficiently complex conditions. Copyright © 2013 Elsevier Ltd. All rights reserved.

  8. Machine Learning Methods for Attack Detection in the Smart Grid.

    PubMed

    Ozay, Mete; Esnaola, Inaki; Yarman Vural, Fatos Tunay; Kulkarni, Sanjeev R; Poor, H Vincent

    2016-08-01

    Attack detection problems in the smart grid are posed as statistical learning problems for different attack scenarios in which the measurements are observed in batch or online settings. In this approach, machine learning algorithms are used to classify measurements as being either secure or attacked. An attack detection framework is provided to exploit any available prior knowledge about the system and surmount constraints arising from the sparse structure of the problem in the proposed approach. Well-known batch and online learning algorithms (supervised and semisupervised) are employed with decision- and feature-level fusion to model the attack detection problem. The relationships between statistical and geometric properties of attack vectors employed in the attack scenarios and learning algorithms are analyzed to detect unobservable attacks using statistical learning methods. The proposed algorithms are examined on various IEEE test systems. Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework.

  9. Ablation behaviors of carbon reinforced polymer composites by laser of different operation modes

    NASA Astrophysics Data System (ADS)

    Wu, Chen-Wu; Wu, Xian-Qian; Huang, Chen-Guang

    2015-10-01

    Laser ablation mechanism of Carbon Fiber Reinforced Polymer (CFRP) composite is of critical meaning for the laser machining process. The ablation behaviors are investigated on the CFRP laminates subject to continuous wave, long duration pulsed wave and short duration pulsed wave lasers. Distinctive ablation phenomena have been observed and the effects of laser operation modes are discussed. The typical temperature patterns resulted from laser irradiation are computed by finite element analysis and thereby the different ablation mechanisms are interpreted.

  10. Within- and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory.

    PubMed

    Collins, Anne G E; Frank, Michael J

    2018-03-06

    Learning from rewards and punishments is essential to survival and facilitates flexible human behavior. It is widely appreciated that multiple cognitive and reinforcement learning systems contribute to decision-making, but the nature of their interactions is elusive. Here, we leverage methods for extracting trial-by-trial indices of reinforcement learning (RL) and working memory (WM) in human electro-encephalography to reveal single-trial computations beyond that afforded by behavior alone. Neural dynamics confirmed that increases in neural expectation were predictive of reduced neural surprise in the following feedback period, supporting central tenets of RL models. Within- and cross-trial dynamics revealed a cooperative interplay between systems for learning, in which WM contributes expectations to guide RL, despite competition between systems during choice. Together, these results provide a deeper understanding of how multiple neural systems interact for learning and decision-making and facilitate analysis of their disruption in clinical populations.

  11. Learning and tuning fuzzy logic controllers through reinforcements.

    PubMed

    Berenji, H R; Khedkar, P

    1992-01-01

    A method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system is presented. It is shown that: the generalized approximate-reasoning-based intelligent control (GARIC) architecture learns and tunes a fuzzy logic controller even when only weak reinforcement, such as a binary failure signal, is available; introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward network, which can then adaptively improve performance by using gradient descent methods. The GARIC architecture is applied to a cart-pole balancing system and demonstrates significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.

  12. Impairments in action-outcome learning in schizophrenia.

    PubMed

    Morris, Richard W; Cyrzon, Chad; Green, Melissa J; Le Pelley, Mike E; Balleine, Bernard W

    2018-03-03

    Learning the causal relation between actions and their outcomes (AO learning) is critical for goal-directed behavior when actions are guided by desire for the outcome. This can be contrasted with habits that are acquired by reinforcement and primed by prevailing stimuli, in which causal learning plays no part. Recently, we demonstrated that goal-directed actions are impaired in schizophrenia; however, whether this deficit exists alongside impairments in habit or reinforcement learning is unknown. The present study distinguished deficits in causal learning from reinforcement learning in schizophrenia. We tested people with schizophrenia (SZ, n = 25) and healthy adults (HA, n = 25) in a vending machine task. Participants learned two action-outcome contingencies (e.g., push left to get a chocolate M&M, push right to get a cracker), and they also learned one contingency was degraded by delivery of noncontingent outcomes (e.g., free M&Ms), as well as changes in value by outcome devaluation. Both groups learned the best action to obtain rewards; however, SZ did not distinguish the more causal action when one AO contingency was degraded. Moreover, action selection in SZ was insensitive to changes in outcome value unless feedback was provided, and this was related to the deficit in AO learning. The failure to encode the causal relation between action and outcome in schizophrenia occurred without any apparent deficit in reinforcement learning. This implies that poor goal-directed behavior in schizophrenia cannot be explained by a more primary deficit in reward learning such as insensitivity to reward value or reward prediction errors.

  13. Study of a trussed girder composed of a reinforced plastic.

    DOT National Transportation Integrated Search

    1974-01-01

    The structural behavior of a series of laboratory test specimens was investigated to determine the ultimate strength, the deformation characteristics, and the mode of failure of a trussed girder composed of glass fiber reinforced polyester resin. Com...

  14. The Effects of a Token Reinforcement System on the Reading and Arithmetic Skills Learnings of Migrant Primary School Pupils.

    ERIC Educational Resources Information Center

    Heitzman, Andrew J.

    The New York State Center for Migrant Studies conducted this 1968 study which investigated effects of token reinforcers on reading and arithmetic skills learnings of migrant primary school students during a 6-week summer school session. Students (Negro and Caucasian) received plastic tokens to reward skills learning responses. Tokens were traded…

  15. The Effects of Observation of Learn Units during Reinforcement and Correction Conditions on the Rate of Learning Math Algorithms by Fifth Grade Students

    ERIC Educational Resources Information Center

    Neu, Jessica Adele

    2013-01-01

    I conducted two studies on the comparative effects of the observation of learn units during (a) reinforcement or (b) correction conditions on the acquisition of math objectives. The dependent variables were the within-session cumulative numbers of correct responses emitted during observational sessions. The independent variables were the…

  16. An Evaluation of Pedagogical Tutorial Tactics for a Natural Language Tutoring System: A Reinforcement Learning Approach

    ERIC Educational Resources Information Center

    Chi, Min; VanLehn, Kurt; Litman, Diane; Jordan, Pamela

    2011-01-01

    Pedagogical strategies are policies for a tutor to decide the next action when there are multiple actions available. When the content is controlled to be the same across experimental conditions, there has been little evidence that tutorial decisions have an impact on students' learning. In this paper, we applied Reinforcement Learning (RL) to…

  17. The Identification and Establishment of Reinforcement for Collaboration in Elementary Students

    ERIC Educational Resources Information Center

    Darcy, Laura

    2017-01-01

    In Experiment 1, I conducted a functional analysis of student rate of learning with and without a peer-yoked contingency for 12 students in Kindergarten through 2nd grade in order to determine if they had conditioned reinforcement for collaboration. Using an ABAB reversal design, I compared rate of learning as measured by learn units to criterion…

  18. Stress enhances model-free reinforcement learning only after negative outcome

    PubMed Central

    Lee, Daeyeol

    2017-01-01

    Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i.e., how quickly information from the new environment is incorporated into choices. Participants were randomly assigned to either a stress or a control condition, and performed a two-stage Markov decision-making task in which the reward probabilities underwent periodic reversals without notice. We found that stress increased the contribution of model-free reinforcement learning only after negative outcome. Furthermore, stress decreased the learning rate. The results suggest that stress diminishes one’s ability to make adaptive choices in multiple aspects of reinforcement learning. This finding has implications for understanding how stress facilitates maladaptive habits, such as addictive behavior, and other dysfunctional behaviors associated with stress in clinical and educational contexts. PMID:28723943

  19. Stress enhances model-free reinforcement learning only after negative outcome.

    PubMed

    Park, Heyeon; Lee, Daeyeol; Chey, Jeanyung

    2017-01-01

    Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i.e., how quickly information from the new environment is incorporated into choices. Participants were randomly assigned to either a stress or a control condition, and performed a two-stage Markov decision-making task in which the reward probabilities underwent periodic reversals without notice. We found that stress increased the contribution of model-free reinforcement learning only after negative outcome. Furthermore, stress decreased the learning rate. The results suggest that stress diminishes one's ability to make adaptive choices in multiple aspects of reinforcement learning. This finding has implications for understanding how stress facilitates maladaptive habits, such as addictive behavior, and other dysfunctional behaviors associated with stress in clinical and educational contexts.

  20. Implicit chaining in cotton-top tamarins (Saguinus oedipus) with elements equated for probability of reinforcement

    PubMed Central

    Dillon, Laura; Collins, Meaghan; Conway, Maura; Cunningham, Kate

    2013-01-01

    Three experiments examined the implicit learning of sequences under conditions in which the elements comprising a sequence were equated in terms of reinforcement probability. In Experiment 1 cotton-top tamarins (Saguinus oedipus) experienced a five-element sequence displayed serially on a touch screen in which reinforcement probability was equated across elements at .16 per element. Tamarins demonstrated learning of this sequence with higher latencies during a random test as compared to baseline sequence training. In Experiments 2 and 3, manipulations of the procedure used in the first experiment were undertaken to rule out a confound owing to the fact that the elements in Experiment 1 bore different temporal relations to the intertrial interval (ITI), an inhibitory period. The results of Experiments 2 and 3 indicated that the implicit learning observed in Experiment 1 was not due to temporal proximity between some elements and the inhibitory ITI. The results taken together support two conclusion: First that tamarins engaged in sequence learning whether or not there was contingent reinforcement for learning the sequence, and second that this learning was not due to subtle differences in associative strength between the elements of the sequence. PMID:23344718

  1. Improving the Science Excursion: An Educational Technologist's View

    ERIC Educational Resources Information Center

    Balson, M.

    1973-01-01

    Analyzes the nature of the learning process and attempts to show how the three components of a reinforcement contingency, the stimulus, the response and the reinforcement can be utilized to increase the efficiency of a typical science learning experience, the excursion. (JR)

  2. Gaze-contingent reinforcement learning reveals incentive value of social signals in young children and adults

    PubMed Central

    Smith, Tim J.; Senju, Atsushi

    2017-01-01

    While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue–reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue–reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. PMID:28250186

  3. Gaze-contingent reinforcement learning reveals incentive value of social signals in young children and adults.

    PubMed

    Vernetti, Angélina; Smith, Tim J; Senju, Atsushi

    2017-03-15

    While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue-reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue-reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. © 2017 The Authors.

  4. Learning and altering behaviours by reinforcement: neurocognitive differences between children and adults.

    PubMed

    Shephard, E; Jackson, G M; Groom, M J

    2014-01-01

    This study examined neurocognitive differences between children and adults in the ability to learn and adapt simple stimulus-response associations through feedback. Fourteen typically developing children (mean age=10.2) and 15 healthy adults (mean age=25.5) completed a simple task in which they learned to associate visually presented stimuli with manual responses based on performance feedback (acquisition phase), and then reversed and re-learned those associations following an unexpected change in reinforcement contingencies (reversal phase). Electrophysiological activity was recorded throughout task performance. We found no group differences in learning-related changes in performance (reaction time, accuracy) or in the amplitude of event-related potentials (ERPs) associated with stimulus processing (P3 ERP) or feedback processing (feedback-related negativity; FRN) during the acquisition phase. However, children's performance was significantly more disrupted by the reversal than adults and FRN amplitudes were significantly modulated by the reversal phase in children but not adults. These findings indicate that children have specific difficulties with reinforcement learning when acquired behaviours must be altered. This may be caused by the added demands on immature executive functioning, specifically response monitoring, created by the requirement to reverse the associations, or a developmental difference in the way in which children and adults approach reinforcement learning. Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights reserved.

  5. Reinforcement Learning with Orthonormal Basis Adaptation Based on Activity-Oriented Index Allocation

    NASA Astrophysics Data System (ADS)

    Satoh, Hideki

    An orthonormal basis adaptation method for function approximation was developed and applied to reinforcement learning with multi-dimensional continuous state space. First, a basis used for linear function approximation of a control function is set to an orthonormal basis. Next, basis elements with small activities are replaced with other candidate elements as learning progresses. As this replacement is repeated, the number of basis elements with large activities increases. Example chaos control problems for multiple logistic maps were solved, demonstrating that the method for adapting an orthonormal basis can modify a basis while holding the orthonormality in accordance with changes in the environment to improve the performance of reinforcement learning and to eliminate the adverse effects of redundant noisy states.

  6. Rapid characterization of chemical markers for discrimination of Moutan Cortex and its processed products by direct injection-based mass spectrometry profiling and metabolomic method.

    PubMed

    Li, Chao-Ran; Li, Meng-Ning; Yang, Hua; Li, Ping; Gao, Wen

    2018-06-01

    Processing of herbal medicines is a characteristic pharmaceutical technique in Traditional Chinese Medicine, which can reduce toxicity and side effect, improve the flavor and efficacy, and even change the pharmacological action entirely. It is significant and crucial to perform a method to find chemical markers for differentiating herbal medicines in different processed degrees. The aim of this study was to perform a rapid and reasonable method to discriminate Moutan Cortex and its processed products, and to reveal the characteristics of chemical components depend on chemical markers. Thirty batches of Moutan Cortex and its processed products, including 11 batches of Raw Moutan Cortex (RMC), 9 batches of Moutan Cortex Tostus (MCT) and 10 batches of Moutan Cortex Carbonisatus (MCC), were directly injected in electrospray ionization quadrupole time-of-flight mass spectrometry (ESI-QTOF MS) for rapid analysis in positive and negative mode. Without chromatographic separation, each run was completed within 3 min. The raw MS data were automatically extracted by background deduction and molecular feature (MF) extraction algorithm. In negative mode, a total of 452 MFs were obtained and then pretreated by data filtration and differential analysis. After that, the filtered 85 MFs were treated by principal component analysis (PCA) to reduce the dimensions. Subsequently, a partial least squares discrimination analysis (PLS-DA) model was constructed for differentiation and chemical markers detection of Moutan Cortex in different processed degrees. The positive mode data were treated as same as those in negative mode. RMC, MCT and MCC were successfully classified. Moreover, 14 and 3 chemical markers from negative and positive mode respectively, were screened by the combination of their relative peak areas and the parameter variable importance in the projection (VIP) values in PLS-DA model. The content changes of these chemical markers were employed in order to illustrate chemical changes of Moutan Cortex after processed. These results showed that the proposed method which combined non-targeted metabolomics analysis with multivariate statistics analysis is reasonable and effective. It could not only be applied to discriminate herbal medicines and their processing products, but also to reveal the characteristics of chemical components during processing. Copyright © 2018. Published by Elsevier GmbH.

  7. Flow Navigation by Smart Microswimmers via Reinforcement Learning

    NASA Astrophysics Data System (ADS)

    Colabrese, Simona; Biferale, Luca; Celani, Antonio; Gustavsson, Kristian

    2017-11-01

    We have numerically modeled active particles which are able to acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. We show that those swimmers can learn effective strategies just by experience, using a reinforcement learning algorithm. As an example, we focus on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, exploiting the underlying flow whenever possible. The reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This work paves the way towards the engineering of smart microswimmers that solve difficult navigation problems. ERC AdG NewTURB 339032.

  8. Compressive Properties of Metal Matrix Syntactic Foams in Free and Constrained Compression

    NASA Astrophysics Data System (ADS)

    Orbulov, Imre Norbert; Májlinger, Kornél

    2014-06-01

    Metal matrix syntactic foam (MMSF) blocks were produced by an inert gas-assisted pressure infiltration technique. MMSFs are advanced hollow sphere reinforced-composite materials having promising application in the fields of aviation, transport, and automotive engineering, as well as in civil engineering. The produced blocks were investigated in free and constrained compression modes, and besides the characteristic mechanical properties, their deformation mechanisms and failure modes were studied. In the tests, the chemical composition of the matrix material, the size of the reinforcing ceramic hollow spheres, the applied heat treatment, and the compression mode were considered as investigation parameters. The monitored mechanical properties were the compressive strength, the fracture strain, the structural stiffness, the fracture energy, and the overall absorbed energy. These characteristics were strongly influenced by the test parameters. By the proper selection of the matrix and the reinforcement and by proper design, the mechanical properties of the MMSFs can be effectively tailored for specific and given applications.

  9. A neural model of hierarchical reinforcement learning.

    PubMed

    Rasmussen, Daniel; Voelker, Aaron; Eliasmith, Chris

    2017-01-01

    We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain's general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model's behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions.

  10. Neural correlates of reinforcement learning and social preferences in competitive bidding.

    PubMed

    van den Bos, Wouter; Talwar, Arjun; McClure, Samuel M

    2013-01-30

    In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.

  11. Production of edible carbohydrates from formaldehyde in a spacecraft. pH variations in the calcium hydroxide catalyzed formose reaction. Final Report, 1 Jul. 1973 - 30 Jun. 1974. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Weiss, A. H.; Kohler, J. T.; John, T.

    1974-01-01

    The study of the calcium hydroxide catalyzed condensation of formaldehyde was extended to a batch reactor system. Decreases in pH were observed, often in the acid regime, when using this basic catalyst. This observation was shown to be similar to results obtained by others using less basic catalysts in the batch mode. The relative rates of these reactions are different in a batch reactor than in a continuous stirred tank reactor. This difference in relative rates is due to the fact that at any degree of advancement in the batch system, the products have a history of previous products, pH, and dissolved catalyst. The relative rate differences can be expected to yield a different nature of product sugars for the two types of reactors.

  12. Attentional Selection Can Be Predicted by Reinforcement Learning of Task-relevant Stimulus Features Weighted by Value-independent Stickiness.

    PubMed

    Balcarras, Matthew; Ardid, Salva; Kaping, Daniel; Everling, Stefan; Womelsdorf, Thilo

    2016-02-01

    Attention includes processes that evaluate stimuli relevance, select the most relevant stimulus against less relevant stimuli, and bias choice behavior toward the selected information. It is not clear how these processes interact. Here, we captured these processes in a reinforcement learning framework applied to a feature-based attention task that required macaques to learn and update the value of stimulus features while ignoring nonrelevant sensory features, locations, and action plans. We found that value-based reinforcement learning mechanisms could account for feature-based attentional selection and choice behavior but required a value-independent stickiness selection process to explain selection errors while at asymptotic behavior. By comparing different reinforcement learning schemes, we found that trial-by-trial selections were best predicted by a model that only represents expected values for the task-relevant feature dimension, with nonrelevant stimulus features and action plans having only a marginal influence on covert selections. These findings show that attentional control subprocesses can be described by (1) the reinforcement learning of feature values within a restricted feature space that excludes irrelevant feature dimensions, (2) a stochastic selection process on feature-specific value representations, and (3) value-independent stickiness toward previous feature selections akin to perseveration in the motor domain. We speculate that these three mechanisms are implemented by distinct but interacting brain circuits and that the proposed formal account of feature-based stimulus selection will be important to understand how attentional subprocesses are implemented in primate brain networks.

  13. Extinction of Pavlovian conditioning: The influence of trial number and reinforcement history.

    PubMed

    Chan, C K J; Harris, Justin A

    2017-08-01

    Pavlovian conditioning is sensitive to the temporal relationship between the conditioned stimulus (CS) and the unconditioned stimulus (US). This has motivated models that describe learning as a process that continuously updates associative strength during the trial or specifically encodes the CS-US interval. These models predict that extinction of responding is also continuous, such that response loss is proportional to the cumulative duration of exposure to the CS without the US. We review evidence showing that this prediction is incorrect, and that extinction is trial-based rather than time-based. We also present two experiments that test the importance of trials versus time on the Partial Reinforcement Extinction Effect (PREE), in which responding extinguishes more slowly for a CS that was inconsistently reinforced with the US than for a consistently reinforced one. We show that increasing the number of extinction trials of the partially reinforced CS, relative to the consistently reinforced CS, overcomes the PREE. However, increasing the duration of extinction trials by the same amount does not overcome the PREE. We conclude that animals learn about the likelihood of the US per trial during conditioning, and learn trial-by-trial about the absence of the US during extinction. Moreover, what they learn about the likelihood of the US during conditioning affects how sensitive they are to the absence of the US during extinction. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Continuous production of monoclonal antibody in a packed-bed bioreactor.

    PubMed

    Golmakany, Naghmeh; Rasaee, Mohammad Javad; Furouzandeh, Mehdi; Shojaosadati, Seyed Abbas; Kashanian, Soheila; Omidfar, Kobra

    2005-06-01

    In the present study the growth and MAb (monoclonal antibody) production of a mouse x mouse hybridoma cell producing anti-digoxin MAb was evaluated. The hybridoma cells entrapped within the support matrix Fibra-Cel were cultured in batch and continuous mode following special protocols. Cell-culture studies were performed in a 1-litre spinner basket containing 3 g.litre-1 support matrix. Batch culture was operated with the cell density of 42x10(6) cells. During the 7 days of culture, the medium was sampled daily in order to assess glucose and MAb concentrations and the lactate dehydrogenase released into the culture medium. After a culture period of 72 h, the cell density and MAb concentration were found to be 10.4x10(7) cells/3 g of NWPF (non-woven polyester fibre) discs and 250 microg/ml respectively. This yield gradually decreased to 0.55x10(6) cells/3 g of packaging material and 60 microg/ml respectively at the end of the batch culture. In the continuous-culture studies, the batch culture was initially operated for 64.5 h and then continuous flow was started at the dilution rates of 0.15, 0.2, 0.25 and 0.3 day-1 and finally stabilized at 0.25 day-1 within 288 h (12 days). The MAb concentration at steady state was found to be 116-120 microg/day per ml, and the yield of operation was 62.5 mg/day per ml, which was 3.5 times higher than that of batch culture. In conclusion, a packed-bed bioreactor with the support matrix Fibra-Cel, operated in continuous-feeding mode, is more efficient for large-scale MAb production than a batch culture. On the other hand, by using a continuous-culture system, a better supply of nutrients and removal of inhibitory metabolites and proteolytic enzymes was obtained.

  15. Can Service Learning Reinforce Social and Cultural Bias? Exploring a Popular Model of Family Involvement for Early Childhood Teacher Candidates

    ERIC Educational Resources Information Center

    Dunn-Kenney, Maylan

    2010-01-01

    Service learning is often used in teacher education as a way to challenge social bias and provide teacher candidates with skills needed to work in partnership with diverse families. Although some literature suggests that service learning could reinforce cultural bias, there is little documentation. In a study of 21 early childhood teacher…

  16. Deep Gate Recurrent Neural Network

    DTIC Science & Technology

    2016-11-22

    Schmidhuber. A system for robotic heart surgery that learns to tie knots using recurrent neural networks. In IEEE International Conference on...tasks, such as Machine Translation (Bahdanau et al. (2015)) or Robot Reinforcement Learning (Bakker (2001)). The main idea behind these networks is to...and J. Peters. Reinforcement learning in robotics : A survey. The International Journal of Robotics Research, 32:1238–1274, 2013. ISSN 0278-3649. doi

  17. Robust iterative learning control for multi-phase batch processes: an average dwell-time method with 2D convergence indexes

    NASA Astrophysics Data System (ADS)

    Wang, Limin; Shen, Yiteng; Yu, Jingxian; Li, Ping; Zhang, Ridong; Gao, Furong

    2018-01-01

    In order to cope with system disturbances in multi-phase batch processes with different dimensions, a hybrid robust control scheme of iterative learning control combined with feedback control is proposed in this paper. First, with a hybrid iterative learning control law designed by introducing the state error, the tracking error and the extended information, the multi-phase batch process is converted into a two-dimensional Fornasini-Marchesini (2D-FM) switched system with different dimensions. Second, a switching signal is designed using the average dwell-time method integrated with the related switching conditions to give sufficient conditions ensuring stable running for the system. Finally, the minimum running time of the subsystems and the control law gains are calculated by solving the linear matrix inequalities. Meanwhile, a compound 2D controller with robust performance is obtained, which includes a robust extended feedback control for ensuring the steady-state tracking error to converge rapidly. The application on an injection molding process displays the effectiveness and superiority of the proposed strategy.

  18. Continuous flow chemistry: a discovery tool for new chemical reactivity patterns.

    PubMed

    Hartwig, Jan; Metternich, Jan B; Nikbin, Nikzad; Kirschning, Andreas; Ley, Steven V

    2014-06-14

    Continuous flow chemistry as a process intensification tool is well known. However, its ability to enable chemists to perform reactions which are not possible in batch is less well studied or understood. Here we present an example, where a new reactivity pattern and extended reaction scope has been achieved by transferring a reaction from batch mode to flow. This new reactivity can be explained by suppressing back mixing and precise control of temperature in a flow reactor set up.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kalugin, A. V., E-mail: Kalugin-AV@nrcki.ru; Tebin, V. V.

    The specific features of calculation of the effective multiplication factor using the Monte Carlo method for weakly coupled and non-asymptotic multiplying systems are discussed. Particular examples are considered and practical recommendations on detection and Monte Carlo calculation of systems typical in numerical substantiation of nuclear safety for VVER fuel management problems are given. In particular, the problems of the choice of parameters for the batch mode and the method for normalization of the neutron batch, as well as finding and interpretation of the eigenvalue spectrum for the integral fission matrix, are discussed.

  20. High stability of yellow fever 17D-204 vaccine: a 12-year restrospective analysis of large-scale production.

    PubMed

    Barban, V; Girerd, Y; Aguirre, M; Gulia, S; Pétiard, F; Riou, P; Barrere, B; Lang, J

    2007-04-12

    We have retrospectively analyzed 12 bulk lots of yellow fever vaccine Stamaril, produced between 1990 and 2002 and prepared from the same seed lot that has been in continuous use since 1990. All vaccine batches displayed identical genome sequence. Only four nucleotide substitutions were observed, compared to previously published sequence, with no incidence at amino-acid level. Fine analysis of viral plaque size distribution was used as an additional marker for genetic stability and demonstrated a remarkable homogeneity of the viral population. The total virus load, measured by qRT-PCR, was also homogeneous pointing out reproducibility of the vaccine production process. Mice inoculated intracerebrally with the different bulks exhibited a similar average survival time, and ratio between in vitro potency and mouse LD(50) titers remained constant from batch-to-batch. Taken together, these data demonstrate the genetic stability of the strain at mass production level over a period of 12 years and reinforce the generally admitted idea of the safety of YF17D-based vaccines.

  1. Microwave heat treating of manufactured components

    DOEpatents

    Ripley, Edward B.

    2007-01-09

    An apparatus for heat treating manufactured components using microwave energy and microwave susceptor material. Heat treating medium such as eutectic salts may be employed. A fluidized bed introduces process gases which may include carburizing or nitriding gases. The process may be operated in a batch mode or continuous process mode. A microwave heating probe may be used to restart a frozen eutectic salt bath.

  2. Strain-level genetic diversity of Methylophaga nitratireducenticrescens confers plasticity to denitrification capacity in a methylotrophic marine denitrifying biofilm.

    PubMed

    Geoffroy, Valérie; Payette, Geneviève; Mauffrey, Florian; Lestin, Livie; Constant, Philippe; Villemur, Richard

    2018-01-01

    The biofilm of a methanol-fed, fluidized denitrification system treating a marine effluent is composed of multi-species microorganisms, among which Hyphomicrobium nitrativorans NL23 and Methylophaga nitratireducenticrescens JAM1 are the principal bacteria involved in the denitrifying activities. Strain NL23 can carry complete nitrate (NO[Formula: see text]) reduction to N 2 , whereas strain JAM1 can perform 3 out of the 4 reduction steps. A small proportion of other denitrifiers exists in the biofilm, suggesting the potential plasticity of the biofilm in adapting to environmental changes. Here, we report the acclimation of the denitrifying biofilm from continuous operating mode to batch operating mode, and the isolation and characterization from the acclimated biofilm of a new denitrifying bacterial strain, named GP59. The denitrifying biofilm was batch-cultured under anoxic conditions. The acclimated biofilm was plated on Methylophaga specific medium to isolate denitrifying Methylophaga isolates. Planktonic cultures of strains GP59 and JAM1 were performed, and the growth and the dynamics of NO[Formula: see text], nitrite (NO[Formula: see text]) and N 2 O were determined. The genomes of strains GP59 and JAM1 were sequenced and compared. The transcriptomes of strains GP59 and JAM1 were derived from anoxic cultures. During batch cultures of the biofilm, we observed the disappearance of H. nitrativorans NL23 without affecting the denitrification performance. From the acclimated biofilm, we isolated strain GP59 that can perform, like H. nitrativorans NL23, the complete denitrification pathway. The GP59 cell concentration in the acclimated biofilm was 2-3 orders of magnitude higher than M. nitratireducenticrescens JAM1 and H. nitrativorans NL23. Genome analyses revealed that strain GP59 belongs to the species M. nitratireducenticrescens . The GP59 genome shares more than 85% of its coding sequences with those of strain JAM1. Based on transcriptomic analyses of anoxic cultures, most of these common genes in strain GP59 were expressed at similar level than their counterparts in strain JAM1. In contrast to strain JAM1, strain GP59 cannot reduce NO[Formula: see text] under oxic culture conditions, and has a 24-h lag time before growth and NO[Formula: see text] reduction start to occur in anoxic cultures, suggesting that both strains regulate differently the expression of their denitrification genes. Strain GP59 has the ability to reduce NO[Formula: see text] as it carries a gene encoding a NirK-type NO[Formula: see text] reductase. Based on the CRISPR sequences, strain GP59 did not emerge from strain JAM1 during the biofilm batch cultures but rather was present in the original biofilm and was enriched during this process. These results reinforce the unique trait of the species M. nitratireducenticrescens among the Methylophaga genus as facultative anaerobic bacterium. These findings also showed the plasticity of denitrifying population of the biofilm in adapting to anoxic marine environments of the bioreactor.

  3. Conceptualizing withdrawal-induced escalation of alcohol self-administration as a learned, plasticity-dependent process

    PubMed Central

    Walker, Brendan M.

    2013-01-01

    This article represents one of five contributions focusing on the topic “Plasticity and neuroadaptive responses within the extended amygdala in response to chronic or excessive alcohol exposure” that were developed by awardees participating in the Young Investigator Award Symposium at the “Alcoholism and Stress: A Framework for Future Treatment Strategies” conference in Volterra, Italy on May 3–6, 2011 that was organized/chaired by Drs. Antonio Noronha and Fulton Crews and sponsored by the National Institute on Alcohol Abuse and Alcoholism. This review discusses the dependence-induced neuroadaptations in affective systems that provide a basis for negative reinforcement learning and presents evidence demonstrating that escalated alcohol consumption during withdrawal is a learned, plasticity-dependent process. The review concludes by identifying changes within extended amygdala dynorphin/kappa-opioid receptor systems that could serve as the foundation for the occurrence of negative reinforcement processes. While some evidence contained herein may be specific to alcohol dependence-related learning and plasticity, much of the information will be of relevance to any addictive disorder involving negative reinforcement mechanisms. Collectively, the information presented within this review provides a framework to assess the negative reinforcing effects of alcohol in a manner that distinguishes neuroadaptations produced by chronic alcohol exposure from the actual plasticity that is associated with negative reinforcement learning in dependent organisms. PMID:22459874

  4. Reinforcement Learning Strategies for Clinical Trials in Non-small Cell Lung Cancer

    PubMed Central

    Zhao, Yufan; Zeng, Donglin; Socinski, Mark A.; Kosorok, Michael R.

    2010-01-01

    Summary Typical regimens for advanced metastatic stage IIIB/IV non-small cell lung cancer (NSCLC) consist of multiple lines of treatment. We present an adaptive reinforcement learning approach to discover optimal individualized treatment regimens from a specially designed clinical trial (a “clinical reinforcement trial”) of an experimental treatment for patients with advanced NSCLC who have not been treated previously with systemic therapy. In addition to the complexity of the problem of selecting optimal compounds for first and second-line treatments based on prognostic factors, another primary goal is to determine the optimal time to initiate second-line therapy, either immediately or delayed after induction therapy, yielding the longest overall survival time. A reinforcement learning method called Q-learning is utilized which involves learning an optimal regimen from patient data generated from the clinical reinforcement trial. Approximating the Q-function with time-indexed parameters can be achieved by using a modification of support vector regression which can utilize censored data. Within this framework, a simulation study shows that the procedure can extract optimal regimens for two lines of treatment directly from clinical data without prior knowledge of the treatment effect mechanism. In addition, we demonstrate that the design reliably selects the best initial time for second-line therapy while taking into account the heterogeneity of NSCLC across patients. PMID:21385164

  5. Hierarchical extreme learning machine based reinforcement learning for goal localization

    NASA Astrophysics Data System (ADS)

    AlDahoul, Nouar; Zaw Htike, Zaw; Akmeliawati, Rini

    2017-03-01

    The objective of goal localization is to find the location of goals in noisy environments. Simple actions are performed to move the agent towards the goal. The goal detector should be capable of minimizing the error between the predicted locations and the true ones. Few regions need to be processed by the agent to reduce the computational effort and increase the speed of convergence. In this paper, reinforcement learning (RL) method was utilized to find optimal series of actions to localize the goal region. The visual data, a set of images, is high dimensional unstructured data and needs to be represented efficiently to get a robust detector. Different deep Reinforcement models have already been used to localize a goal but most of them take long time to learn the model. This long learning time results from the weights fine tuning stage that is applied iteratively to find an accurate model. Hierarchical Extreme Learning Machine (H-ELM) was used as a fast deep model that doesn’t fine tune the weights. In other words, hidden weights are generated randomly and output weights are calculated analytically. H-ELM algorithm was used in this work to find good features for effective representation. This paper proposes a combination of Hierarchical Extreme learning machine and Reinforcement learning to find an optimal policy directly from visual input. This combination outperforms other methods in terms of accuracy and learning speed. The simulations and results were analysed by using MATLAB.

  6. ascii2gdocs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nightingale, Trever

    2011-11-30

    Enables UNIX and Mac OS X command line users to put (individually or batch mode) local ascii files into Google Documents, where the ascii is converted to Google Document format using formatting the user can specify.

  7. Nonlinear Spring Finite Elements for Predicting Mode I-Dominated Delamination Growth in Laminated Structure with Through-Thickness reinforcement

    NASA Technical Reports Server (NTRS)

    Ratcliffe, James G.; Krueger, Ronald

    2006-01-01

    One particular concern of polymer matrix composite laminates is the relatively low resistance to delamination cracking, in particular when the dominant type of failure is mode I opening. One method proposed for alleviating this problem involves the insertion pultruded carbon pins through the laminate thickness. The pins, known as z-pins, are inserted into the prepreg laminate using an ultrasonic hammer prior to the curing process, resulting in a field of pins embedded normal to the laminate plane as illustrated in Figure. 1. Pin diameters range between 0.28-mm to 0.5-mm and standard areal densities range from 0.5% to 4%. The z-pins are provided by the manufacturer, Aztex(Registered TradeMark) , in a low-density foam preform, which acts to stabilize orientation of the pins during the insertion process [1-3]. Typical pin materials include boron and carbon fibers embedded in a polymer matrix. A number of methods have been developed for predicting delamination growth in laminates reinforced with z-pins. During a study on the effect of z-pin reinforcement on mode I delamination resistance, finite element analyses of z-pin reinforced double cantilever beam (DCB) specimens were performed by Cartie and Partridge [4]. The z-pin bridging stresses were modeled by applying equivalent forces at the pin locations. Single z-pin pull-out tests were performed to characterize the traction law of the pins under mode I loading conditions. Analytical solutions for delamination growth in z-pin reinforced DCB specimens were independently derived by Robinson and Das [5] and Ratcliffe and O'Brien [6]. In the former case, pin bridging stresses were modeled using a distributed load and in the latter example the bridging stresses were discretely modeled by way of grounded springs. Additionally, Robinson and Das developed a data reduction strategy for calculating mode I fracture toughness, G(sub Ic), from a z-pin reinforced DCB specimen test [5]. In both cases a traction law similar to that adopted by Cartie and Partridge was used to represent z-pin failure under mode I loading conditions. In the current work spring elements available in most commercial finite element codes were used to model z-pins. The traction law used in previous analyses [4-6] was employed to represent z-pin damage. This method is intended for and is limited to simulating z-pins in composite laminate structure containing mode I-dominated delamination cracking. The current technique differs from previous analyses in that spring finite elements (available in commercial codes) are employed for simulating zpins, reducing the complexity of the analysis construction process. Furthermore, the analysis method can be applied to general structure that experiences mode I-dominated delamination cracking, in contrast to existing analytical solutions that are only applicable to coupon DCB specimens.

  8. Cellulose nanocrystal-reinforced keratin bioadsorbent for effective removal of dyes from aqueous solution.

    PubMed

    Song, Kaili; Xu, Helan; Xu, Lan; Xie, Kongliang; Yang, Yiqi

    2017-05-01

    High-efficiency and recyclable three-dimensional bioadsorbents were prepared by incorporating cellulose nanocrystal (CNC) as reinforcements in keratin sponge matrix to remove dyes from aqueous solution. Adsorption performance of dyes by CNC-reinforced keratin bioadsorbent was improved significantly as a result of adding CNC as filler. Batch adsorption results showed that the adsorption capacities for Reactive Black 5 and Direct Red 80 by the bioadsorbent were 1201 and 1070mgg -1 , respectively. The isotherms and kinetics for adsorption of both dyes on bioadsorbent followed the Langmuir isotherm model and pseudo-second order model, respectively. Desorption and regeneration experiments showed that the removal efficiencies of the bioadsorbent for both dyes could remain above 80% at the fifth recycling cycles. Moreover, the bioadsorbent possessed excellent packed-bed column operation performance. Those results suggested that the adsorbent could be considered as a high-performance and promising candidate for dye wastewater treatment. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. Analysis of the stiffness and load-bearing capacity of glued laminated timber beams reinforced with strands

    NASA Astrophysics Data System (ADS)

    Sardiko, R.; Rocens, K.; Iejavs, J.; Jakovlevs, V.; Ziverts, K.

    2017-10-01

    In this paper a benefit of glulam pinewood beams reinforced strands is discussed. In the first phase, series of pull-out tests were performed on specimens made up of different types of glue (melamine-urea-formaldehyde, epoxy and others) to detect pull-out force and failure mode of a specimens. In the second phase, series of equal cross-section glulam beams with strand and rod reinforcement were theoretically analysed using transformed cross-section method. Additionally, series of experimental testing were made. Benefits of strand reinforcement use as glulam beams’ reinforcement were identified and examined the possibility of one glue type application in all operations of reinforced glulam beams manufacturing.

  10. Depression, Activity, and Evaluation of Reinforcement

    ERIC Educational Resources Information Center

    Hammen, Constance L.; Glass, David R., Jr.

    1975-01-01

    This research attempted to find the causal relation between mood and level of reinforcement. An effort was made to learn what mood change might occur if depressed subjects increased their levels of participation in reinforcing activities. (Author/RK)

  11. What Can Reinforcement Learning Teach Us About Non-Equilibrium Quantum Dynamics

    NASA Astrophysics Data System (ADS)

    Bukov, Marin; Day, Alexandre; Sels, Dries; Weinberg, Phillip; Polkovnikov, Anatoli; Mehta, Pankaj

    Equilibrium thermodynamics and statistical physics are the building blocks of modern science and technology. Yet, our understanding of thermodynamic processes away from equilibrium is largely missing. In this talk, I will reveal the potential of what artificial intelligence can teach us about the complex behaviour of non-equilibrium systems. Specifically, I will discuss the problem of finding optimal drive protocols to prepare a desired target state in quantum mechanical systems by applying ideas from Reinforcement Learning [one can think of Reinforcement Learning as the study of how an agent (e.g. a robot) can learn and perfect a given policy through interactions with an environment.]. The driving protocols learnt by our agent suggest that the non-equilibrium world features possibilities easily defying intuition based on equilibrium physics.

  12. Kinesthetic Reinforcement-Is It a Boon to Learning?

    ERIC Educational Resources Information Center

    Bohrer, Roxilu K.

    1970-01-01

    Language instruction, particularly in the elementary school, should be reinforced through the use of visual aids and through associated physical activity. Kinesthetic experiences provide an opportunity to make use of non-verbal cues to meaning, enliven classroom activities, and maximize learning for pupils. The author discusses the educational…

  13. Reinforcing Basic Skills Through Social Studies. Grades 4-7.

    ERIC Educational Resources Information Center

    Lewis, Teresa Marie

    Arranged into seven parts, this document provides a variety of games and activities, bulletin board ideas, overhead transparencies, student handouts, and learning station ideas to help reinforce basic social studies skills in the intermediate grades. In part 1, students learn about timelines, first constructing their own life timeline, then a…

  14. Effects of Reinforcement on Peer Imitation in a Small Group Play Context

    ERIC Educational Resources Information Center

    Barton, Erin E.; Ledford, Jennifer R.

    2018-01-01

    Children with disabilities often have deficits in imitation skills, particularly in imitating peers. Imitation is considered a behavioral cusp--which, once learned, allows a child to access additional and previously unavailable learning opportunities. In the current study, researchers examined the efficacy of contingent reinforcement delivered…

  15. Neurofeedback in Learning Disabled Children: Visual versus Auditory Reinforcement.

    PubMed

    Fernández, Thalía; Bosch-Bayard, Jorge; Harmony, Thalía; Caballero, María I; Díaz-Comas, Lourdes; Galán, Lídice; Ricardo-Garcell, Josefina; Aubert, Eduardo; Otero-Ojeda, Gloria

    2016-03-01

    Children with learning disabilities (LD) frequently have an EEG characterized by an excess of theta and a deficit of alpha activities. NFB using an auditory stimulus as reinforcer has proven to be a useful tool to treat LD children by positively reinforcing decreases of the theta/alpha ratio. The aim of the present study was to optimize the NFB procedure by comparing the efficacy of visual (with eyes open) versus auditory (with eyes closed) reinforcers. Twenty LD children with an abnormally high theta/alpha ratio were randomly assigned to the Auditory or the Visual group, where a 500 Hz tone or a visual stimulus (a white square), respectively, was used as a positive reinforcer when the value of the theta/alpha ratio was reduced. Both groups had signs consistent with EEG maturation, but only the Auditory Group showed behavioral/cognitive improvements. In conclusion, the auditory reinforcer was more efficacious in reducing the theta/alpha ratio, and it improved the cognitive abilities more than the visual reinforcer.

  16. Reinforcement learning agents providing advice in complex video games

    NASA Astrophysics Data System (ADS)

    Taylor, Matthew E.; Carboni, Nicholas; Fachantidis, Anestis; Vlahavas, Ioannis; Torrey, Lisa

    2014-01-01

    This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

  17. Controlling coupled bending-twisting vibrations of anisotropic composite wing

    NASA Astrophysics Data System (ADS)

    Ryabov, Victor; Yartsev, Boris

    2018-05-01

    The paper discusses the possibility to control coupled bending-twisting vibrations of anisotropic composite wing by means of the monoclinic structures in the reinforcement of the plating. Decomposing the potential straining energy and kinetic energy of natural vibration modes into interacting and non-interacting parts, it became possible to introduce the two coefficients that integrally consider the effect of geometry and reinforcement structure upon the dynamic response parameters of the wing. The first of these coefficients describes the elastic coupling of the natural vibration modes, the second coefficient describes the inertial one. The paper describes the numerical studies showing how the orientation of considerably anisotropic CRP layers in the plating affects natural frequencies, loss factors, coefficients of elastic and inertial coupling for several lower tones of natural bending-twisting vibrations of the wing. Besides, for each vibration mode, partial values of the above mentioned dynamic response parameters were determined by means of the relationships for orthotropic structures where instead of "free" shearing modulus in the reinforcement plant, "pure" shearing modulus is used. Joint analysis of the obtained results has shown that each pair of bending-twisting vibration modes has its orientation angle ranges of the reinforcing layers where the inertial coupling caused by asymmetry of the cross-section profile with respect to the main axes of inertia decreases, down to the complete extinction, due to the generation of the elastic coupling in the plating material. These ranges are characterized by the two main features: 1) the difference in the natural frequencies of the investigated pair of bending-twisting vibration modes is the minimum and 2) natural frequencies of bending-twisting vibrations belong to a stretch restricted by corresponding partial natural frequencies of the investigated pair of vibration modes. This result is of practical importance because it enables approximate analysis of real composite wings with complex geometry in the existing commercial software packages.

  18. Removal of sodium chloride from human urine via batch recirculation electrodialysis at constant applied voltage

    NASA Technical Reports Server (NTRS)

    Gordils-Striker, Nilda E.; Colon, Guillermo

    2003-01-01

    The removal of sodium chloride (NaCl) from human urine using a six-compartment electrodialysis cell with batch recirculation mode of operation for use in advanced life support systems (ALSS) was studied. From the results obtained, batch recirculation at constant applied voltage yields high values (approximately 94% of NaCl removal. Based on the results, the initial rate of NaCl removal was correlated to a power function of the applied voltage: -r=2.0 x 10(-4)E(3.8). With impedance spectroscopy methods, it was also found that the anion membranes were more affected by fouling with an increase of the ohmic resistance of almost 11% compared with 7.4% for the cationic ones.

  19. The probability of reinforcement per trial affects posttrial responding and subsequent extinction but not within-trial responding.

    PubMed

    Harris, Justin A; Kwok, Dorothy W S

    2018-01-01

    During magazine approach conditioning, rats do not discriminate between a conditional stimulus (CS) that is consistently reinforced with food and a CS that is occasionally (partially) reinforced, as long as the CSs have the same overall reinforcement rate per second. This implies that rats are indifferent to the probability of reinforcement per trial. However, in the same rats, the per-trial reinforcement rate will affect subsequent extinction-responding extinguishes more rapidly for a CS that was consistently reinforced than for a partially reinforced CS. Here, we trained rats with consistently and partially reinforced CSs that were matched for overall reinforcement rate per second. We measured conditioned responding both during and immediately after the CSs. Differences in the per-trial probability of reinforcement did not affect the acquisition of responding during the CS but did affect subsequent extinction of that responding, and also affected the post-CS response rates during conditioning. Indeed, CSs with the same probability of reinforcement per trial evoked the same amount of post-CS responding even when they differed in overall reinforcement rate and thus evoked different amounts of responding during the CS. We conclude that reinforcement rate per second controls rats' acquisition of responding during the CS, but at the same time, rats also learn specifically about the probability of reinforcement per trial. The latter learning affects the rats' expectation of reinforcement as an outcome of the trial, which influences their ability to detect retrospectively that an opportunity for reinforcement was missed, and, in turn, drives extinction. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  20. Unobtrusive integration of data management with fMRI analysis.

    PubMed

    Poliakov, Andrew V; Hertzenberg, Xenia; Moore, Eider B; Corina, David P; Ojemann, George A; Brinkley, James F

    2007-01-01

    This note describes a software utility, called X-batch which addresses two pressing issues typically faced by functional magnetic resonance imaging (fMRI) neuroimaging laboratories (1) analysis automation and (2) data management. The first issue is addressed by providing a simple batch mode processing tool for the popular SPM software package (http://www.fil.ion. ucl.ac.uk/spm/; Welcome Department of Imaging Neuroscience, London, UK). The second is addressed by transparently recording metadata describing all aspects of the batch job (e.g., subject demographics, analysis parameters, locations and names of created files, date and time of analysis, and so on). These metadata are recorded as instances of an extended version of the Protégé-based Experiment Lab Book ontology created by the Dartmouth fMRI Data Center. The resulting instantiated ontology provides a detailed record of all fMRI analyses performed, and as such can be part of larger systems for neuroimaging data management, sharing, and visualization. The X-batch system is in use in our own fMRI research, and is available for download at http://X-batch.sourceforge.net/.

  1. Heterotrophic growth and lipid accumulation of Chlorella protothecoides in whey permeate, a dairy by-product stream, for biofuel production.

    PubMed

    Espinosa-Gonzalez, Isabel; Parashar, Archana; Bressler, David C

    2014-03-01

    This study proposes a novel alternative for the utilization of whey permeate, a by-product stream from the dairy industry, as the feedstock for the biomass and lipid production of the microalgae Chlorella protothecoides. Glucose and galactose from the pre-hydrolyzed whey permeate were used as main carbon sources in a base mineral media for establishing batch and fed batch cultures. Batch cultures reached a biomass production of 9.1±0.2g/L with a total lipid accumulation of 42.0±6.6% (dry weight basis), while in the fed batch cultures 17.2±1.3g/L of biomass with 20.5±0.3% lipid accumulation (dry weight basis) were obtained. A third strategy for the direct utilization of whey permeate was investigated by simultaneous saccharification and fermentation (SSF), wherein, 7.3±1.3g/L of biomass with 49.9±3.3% lipid accumulation (dry weight basis) was obtained in batch mode using immobilized enzyme. Copyright © 2013 Elsevier Ltd. All rights reserved.

  2. Adaptation of Timing Behavior to a Regular Change in Criterion

    PubMed Central

    Sanabria, Federico; Oldenburg, Liliana

    2013-01-01

    This study examined how operant behavior adapted to an abrupt but regular change in the timing of reinforcement. Pigeons were trained on a fixed interval (FI) 15-s schedule of reinforcement during half of each experimental session, and on an FI 45-s (Experiment 1), FI 60-s (Experiment 2), or extinction schedule (Experiment 3) during the other half. FI performance was well characterized by a mixture of two gamma-shaped distributions of responses. When a longer FI schedule was in effect in the first half of the session (Experiment 1), a constant interference by the shorter FI was observed. When a shorter FI schedule was in effect in the first half of the session (Experiments 1, 2, and 3), the transition between schedules involved a decline in responding and a progressive rightward shift in the mode of the response distribution initially centered around the short FI. These findings are discussed in terms of the constraints they impose to quantitative models of timing, and in relation to the implications for information-based models of associative learning. PMID:23962672

  3. Lignocellulosic Fermentation of Wild Grass Employing Recombinant Hydrolytic Enzymes and Fermentative Microbes with Effective Bioethanol Recovery

    PubMed Central

    Das, Saprativ P.; Ghosh, Arabinda; Gupta, Ashutosh; Das, Debasish

    2013-01-01

    Simultaneous saccharification and fermentation (SSF) studies of steam exploded and alkali pretreated different leafy biomass were accomplished by recombinant Clostridium thermocellum hydrolytic enzymes and fermentative microbes for bioethanol production. The recombinant C. thermocellum GH5 cellulase and GH43 hemicellulase genes expressed in Escherichia coli cells were grown in repetitive batch mode, with the aim of enhancing the cell biomass production and enzyme activity. In batch mode, the cell biomass (A 600 nm) of E. coli cells and enzyme activities of GH5 cellulase and GH43 hemicellulase were 1.4 and 1.6 with 2.8 and 2.2 U·mg−1, which were augmented to 2.8 and 2.9 with 5.6 and 3.8 U·mg−1 in repetitive batch mode, respectively. Steam exploded wild grass (Achnatherum hymenoides) provided the best ethanol titres as compared to other biomasses. Mixed enzyme (GH5 cellulase, GH43 hemicellulase) mixed culture (Saccharomyces cerevisiae, Candida shehatae) system gave 2-fold higher ethanol titre than single enzyme (GH5 cellulase) single culture (Saccharomyces cerevisiae) system employing 1% (w/v) pretreated substrate. 5% (w/v) substrate gave 11.2 g·L−1 of ethanol at shake flask level which on scaling up to 2 L bioreactor resulted in 23 g·L−1 ethanol. 91.6% (v/v) ethanol was recovered by rotary evaporator with 21.2% purification efficiency. PMID:24089676

  4. Establishment and Maintenance of Socially Learned Conditioned Reinforcement in Young Children: Elimination of the Role of Adults and View of Peers' Faces

    ERIC Educational Resources Information Center

    Zrinzo, Michelle; Greer, R. Douglas

    2013-01-01

    Prior research has demonstrated the establishment of reinforcers for learning and maintenance with young children as a function of social learning where a peer and an adult experimenter were present. The presence of an adult experimenter was eliminated in the present study to test if the effect produced in the prior studies would occur with only…

  5. Structure identification in fuzzy inference using reinforcement learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap

    1993-01-01

    In our previous work on the GARIC architecture, we have shown that the system can start with surface structure of the knowledge base (i.e., the linguistic expression of the rules) and learn the deep structure (i.e., the fuzzy membership functions of the labels used in the rules) by using reinforcement learning. Assuming the surface structure, GARIC refines the fuzzy membership functions used in the consequents of the rules using a gradient descent procedure. This hybrid fuzzy logic and reinforcement learning approach can learn to balance a cart-pole system and to backup a truck to its docking location after a few trials. In this paper, we discuss how to do structure identification using reinforcement learning in fuzzy inference systems. This involves identifying both surface as well as deep structure of the knowledge base. The term set of fuzzy linguistic labels used in describing the values of each control variable must be derived. In this process, splitting a label refers to creating new labels which are more granular than the original label and merging two labels creates a more general label. Splitting and merging of labels directly transform the structure of the action selection network used in GARIC by increasing or decreasing the number of hidden layer nodes.

  6. Spared internal but impaired external reward prediction error signals in major depressive disorder during reinforcement learning.

    PubMed

    Bakic, Jasmina; Pourtois, Gilles; Jepma, Marieke; Duprat, Romain; De Raedt, Rudi; Baeken, Chris

    2017-01-01

    Major depressive disorder (MDD) creates debilitating effects on a wide range of cognitive functions, including reinforcement learning (RL). In this study, we sought to assess whether reward processing as such, or alternatively the complex interplay between motivation and reward might potentially account for the abnormal reward-based learning in MDD. A total of 35 treatment resistant MDD patients and 44 age matched healthy controls (HCs) performed a standard probabilistic learning task. RL was titrated using behavioral, computational modeling and event-related brain potentials (ERPs) data. MDD patients showed comparable learning rate compared to HCs. However, they showed decreased lose-shift responses as well as blunted subjective evaluations of the reinforcers used during the task, relative to HCs. Moreover, MDD patients showed normal internal (at the level of error-related negativity, ERN) but abnormal external (at the level of feedback-related negativity, FRN) reward prediction error (RPE) signals during RL, selectively when additional efforts had to be made to establish learning. Collectively, these results lend support to the assumption that MDD does not impair reward processing per se during RL. Instead, it seems to alter the processing of the emotional value of (external) reinforcers during RL, when additional intrinsic motivational processes have to be engaged. © 2016 Wiley Periodicals, Inc.

  7. Relationship between Reinforcement and Eye Movements during Ocular Motor Training with Learning Disabled Children.

    ERIC Educational Resources Information Center

    Punnett, Audrey F.; Steinhauer, Gene D.

    1984-01-01

    Four reading disabled children were given eight sessions of ocular motor training with reinforcement and eight sessions without reinforcement. Two reading disabled control Ss were treated similarly but received no ocular motor training. Results demonstrated that reinforcement can improve ocular motor skills, which in turn elevates reading…

  8. Learning the specific quality of taste reinforcement in larval Drosophila.

    PubMed

    Schleyer, Michael; Miura, Daisuke; Tanimura, Teiichi; Gerber, Bertram

    2015-01-27

    The only property of reinforcement insects are commonly thought to learn about is its value. We show that larval Drosophila not only remember the value of reinforcement (How much?), but also its quality (What?). This is demonstrated both within the appetitive domain by using sugar vs amino acid as different reward qualities, and within the aversive domain by using bitter vs high-concentration salt as different qualities of punishment. From the available literature, such nuanced memories for the quality of reinforcement are unexpected and pose a challenge to present models of how insect memory is organized. Given that animals as simple as larval Drosophila, endowed with but 10,000 neurons, operate with both reinforcement value and quality, we suggest that both are fundamental aspects of mnemonic processing-in any brain.

  9. The evolution of continuous learning of the structure of the environment

    PubMed Central

    Kolodny, Oren; Edelman, Shimon; Lotem, Arnon

    2014-01-01

    Continuous, ‘always on’, learning of structure from a stream of data is studied mainly in the fields of machine learning or language acquisition, but its evolutionary roots may go back to the first organisms that were internally motivated to learn and represent their environment. Here, we study under what conditions such continuous learning (CL) may be more adaptive than simple reinforcement learning and examine how it could have evolved from the same basic associative elements. We use agent-based computer simulations to compare three learning strategies: simple reinforcement learning; reinforcement learning with chaining (RL-chain) and CL that applies the same associative mechanisms used by the other strategies, but also seeks statistical regularities in the relations among all items in the environment, regardless of the initial association with food. We show that a sufficiently structured environment favours the evolution of both RL-chain and CL and that CL outperforms the other strategies when food is relatively rare and the time for learning is limited. This advantage of internally motivated CL stems from its ability to capture statistical patterns in the environment even before they are associated with food, at which point they immediately become useful for planning. PMID:24402920

  10. The partial-reinforcement extinction effect and the contingent-sampling hypothesis.

    PubMed

    Hochman, Guy; Erev, Ido

    2013-12-01

    The partial-reinforcement extinction effect (PREE) implies that learning under partial reinforcements is more robust than learning under full reinforcements. While the advantages of partial reinforcements have been well-documented in laboratory studies, field research has failed to support this prediction. In the present study, we aimed to clarify this pattern. Experiment 1 showed that partial reinforcements increase the tendency to select the promoted option during extinction; however, this effect is much smaller than the negative effect of partial reinforcements on the tendency to select the promoted option during the training phase. Experiment 2 demonstrated that the overall effect of partial reinforcements varies inversely with the attractiveness of the alternative to the promoted behavior: The overall effect is negative when the alternative is relatively attractive, and positive when the alternative is relatively unattractive. These results can be captured with a contingent-sampling model assuming that people select options that provided the best payoff in similar past experiences. The best fit was obtained under the assumption that similarity is defined by the sequence of the last four outcomes.

  11. A numerical procedure for failure mode detection of masonry arches reinforced with fiber reinforced polymeric materials

    NASA Astrophysics Data System (ADS)

    Galassi, S.

    2018-05-01

    In this paper a mechanical model of masonry arches strengthened with fibre-reinforced composite materials and the relevant numerical procedure for the analysis are proposed. The arch is modelled by using an assemblage of rigid blocks that are connected together and, also to the supporting structures, by mortar joints. The presence of the reinforcement, usually a sheet placed at the intrados or the extrados, prevents the occurrence of cracks that could activate possible collapse mechanisms, due to tensile failure of the mortar joints. Therefore, in a reinforced arch failure generally occurs in a different way from the URM arch. The numerical procedure proposed checks, as a function of an external incremental load, the inner stress state in the arch, in the reinforcement and in the adhesive layer. In so doing, it then provides a prediction of failure modes. Results obtained from experimental tests, carried out on four in-scale models performed in a laboratory, have been compared with those provided by the numerical procedure, implemented in ArchiVAULT, a software developed by the author. In this regard, the numerical procedure is an extension of previous works. Although additional experimental investigations are necessary, these former results confirm that the proposed numerical procedure is promising.

  12. The effects of aging on the interaction between reinforcement learning and attention.

    PubMed

    Radulescu, Angela; Daniel, Reka; Niv, Yael

    2016-11-01

    Reinforcement learning (RL) in complex environments relies on selective attention to uncover those aspects of the environment that are most predictive of reward. Whereas previous work has focused on age-related changes in RL, it is not known whether older adults learn differently from younger adults when selective attention is required. In 2 experiments, we examined how aging affects the interaction between RL and selective attention. Younger and older adults performed a learning task in which only 1 stimulus dimension was relevant to predicting reward, and within it, 1 "target" feature was the most rewarding. Participants had to discover this target feature through trial and error. In Experiment 1, stimuli varied on 1 or 3 dimensions and participants received hints that revealed the target feature, the relevant dimension, or gave no information. Group-related differences in accuracy and RTs differed systematically as a function of the number of dimensions and the type of hint available. In Experiment 2 we used trial-by-trial computational modeling of the learning process to test for age-related differences in learning strategies. Behavior of both young and older adults was explained well by a reinforcement-learning model that uses selective attention to constrain learning. However, the model suggested that older adults restricted their learning to fewer features, employing more focused attention than younger adults. Furthermore, this difference in strategy predicted age-related deficits in accuracy. We discuss these results suggesting that a narrower filter of attention may reflect an adaptation to the reduced capabilities of the reinforcement learning system. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  13. Tiger salamanders' (Ambystoma tigrinum) response learning and usage of visual cues.

    PubMed

    Kundey, Shannon M A; Millar, Roberto; McPherson, Justin; Gonzalez, Maya; Fitz, Aleyna; Allen, Chadbourne

    2016-05-01

    We explored tiger salamanders' (Ambystoma tigrinum) learning to execute a response within a maze as proximal visual cue conditions varied. In Experiment 1, salamanders learned to turn consistently in a T-maze for reinforcement before the maze was rotated. All learned the initial task and executed the trained turn during test, suggesting that they learned to demonstrate the reinforced response during training and continued to perform it during test. In a second experiment utilizing a similar procedure, two visual cues were placed consistently at the maze junction. Salamanders were reinforced for turning towards one cue. Cue placement was reversed during test. All learned the initial task, but executed the trained turn rather than turning towards the visual cue during test, evidencing response learning. In Experiment 3, we investigated whether a compound visual cue could control salamanders' behaviour when it was the only cue predictive of reinforcement in a cross-maze by varying start position and cue placement. All learned to turn in the direction indicated by the compound visual cue, indicating that visual cues can come to control their behaviour. Following training, testing revealed that salamanders attended to stimuli foreground over background features. Overall, these results suggest that salamanders learn to execute responses over learning to use visual cues but can use visual cues if required. Our success with this paradigm offers the potential in future studies to explore salamanders' cognition further, as well as to shed light on how features of the tiger salamanders' life history (e.g. hibernation and metamorphosis) impact cognition.

  14. Finite element analysis-based study of fiber Bragg grating sensor for cracks detection in reinforced concrete

    NASA Astrophysics Data System (ADS)

    Wang, Lili; Xin, Xiangjun; Song, Jun; Wang, Honggang; Sai, Yaozhang

    2018-02-01

    Fiber Bragg sensor is applied for detecting and monitoring the cracks that occur in the reinforced concrete. We use the three-dimensional finite element model to provide the three-axial stresses along the fiber Bragg sensor and then converted the stresses as a wavelength deformation of fiber Bragg grating (FBG) reflected spectrum. For the crack detection, an FBG sensor with 10-mm length is embedded in the reinforced concrete, and its reflection spectrum is measured after loading is applied to the concrete slab. As a result, the main peak wavelength and the ratio of the peak reflectivity to the maximal side-mode reflectivity of the optic-fiber grating represent the fracture severity. The fact that the sharp decreasing of the ratio of the peak reflectivity to the maximal side-mode reflectivity represents the early crack is confirmed by the theoretical calculation. The method can be used to detect the cracks in the reinforced concrete and give safety evaluation of large-scale infrastructure.

  15. Intelligent multiagent coordination based on reinforcement hierarchical neuro-fuzzy models.

    PubMed

    Mendoza, Leonardo Forero; Vellasco, Marley; Figueiredo, Karla

    2014-12-01

    This paper presents the research and development of two hybrid neuro-fuzzy models for the hierarchical coordination of multiple intelligent agents. The main objective of the models is to have multiple agents interact intelligently with each other in complex systems. We developed two new models of coordination for intelligent multiagent systems, which integrates the Reinforcement Learning Hierarchical Neuro-Fuzzy model with two proposed coordination mechanisms: the MultiAgent Reinforcement Learning Hierarchical Neuro-Fuzzy with a market-driven coordination mechanism (MA-RL-HNFP-MD) and the MultiAgent Reinforcement Learning Hierarchical Neuro-Fuzzy with graph coordination (MA-RL-HNFP-CG). In order to evaluate the proposed models and verify the contribution of the proposed coordination mechanisms, two multiagent benchmark applications were developed: the pursuit game and the robot soccer simulation. The results obtained demonstrated that the proposed coordination mechanisms greatly improve the performance of the multiagent system when compared with other strategies.

  16. Refining Linear Fuzzy Rules by Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap S.; Malkani, Anil

    1996-01-01

    Linear fuzzy rules are increasingly being used in the development of fuzzy logic systems. Radial basis functions have also been used in the antecedents of the rules for clustering in product space which can automatically generate a set of linear fuzzy rules from an input/output data set. Manual methods are usually used in refining these rules. This paper presents a method for refining the parameters of these rules using reinforcement learning which can be applied in domains where supervised input-output data is not available and reinforcements are received only after a long sequence of actions. This is shown for a generalization of radial basis functions. The formation of fuzzy rules from data and their automatic refinement is an important step in closing the gap between the application of reinforcement learning methods in the domains where only some limited input-output data is available.

  17. Instructed knowledge shapes feedback-driven aversive learning in striatum and orbitofrontal cortex, but not the amygdala

    PubMed Central

    Atlas, Lauren Y; Doll, Bradley B; Li, Jian; Daw, Nathaniel D; Phelps, Elizabeth A

    2016-01-01

    Socially-conveyed rules and instructions strongly shape expectations and emotions. Yet most neuroscientific studies of learning consider reinforcement history alone, irrespective of knowledge acquired through other means. We examined fear conditioning and reversal in humans to test whether instructed knowledge modulates the neural mechanisms of feedback-driven learning. One group was informed about contingencies and reversals. A second group learned only from reinforcement. We combined quantitative models with functional magnetic resonance imaging and found that instructions induced dissociations in the neural systems of aversive learning. Responses in striatum and orbitofrontal cortex updated with instructions and correlated with prefrontal responses to instructions. Amygdala responses were influenced by reinforcement similarly in both groups and did not update with instructions. Results extend work on instructed reward learning and reveal novel dissociations that have not been observed with punishments or rewards. Findings support theories of specialized threat-detection and may have implications for fear maintenance in anxiety. DOI: http://dx.doi.org/10.7554/eLife.15192.001 PMID:27171199

  18. Flow Navigation by Smart Microswimmers via Reinforcement Learning

    NASA Astrophysics Data System (ADS)

    Colabrese, Simona; Gustavsson, Kristian; Celani, Antonio; Biferale, Luca

    2017-04-01

    Smart active particles can acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. Their goal is to learn the best way to navigate by exploiting the underlying flow whenever possible. As an example, we focus our attention on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, given the constraints enforced by fluid mechanics. By means of numerical experiments, we show that swimmers indeed learn nearly optimal strategies just by experience. A reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This Letter illustrates the potential of reinforcement learning algorithms to model adaptive behavior in complex flows and paves the way towards the engineering of smart microswimmers that solve difficult navigation problems.

  19. Learning and tuning fuzzy logic controllers through reinforcements

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap

    1992-01-01

    A new method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system is presented. In particular, our Generalized Approximate Reasoning-based Intelligent Control (GARIC) architecture: (1) learns and tunes a fuzzy logic controller even when only weak reinforcements, such as a binary failure signal, is available; (2) introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; (3) introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and (4) learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward network, which can then adaptively improve performance by using gradient descent methods. We extend the AHC algorithm of Barto, Sutton, and Anderson to include the prior control knowledge of human operators. The GARIC architecture is applied to a cart-pole balancing system and has demonstrated significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.

  20. Response of Thai hospitals to the tsunami disaster.

    PubMed

    Leiba, Adi; Ashkenasi, Issac; Nakash, Guy; Pelts, Rami; Schwartz, Dagan; Goldberg, Avishay; Levi, Yeheskel; Bar-Dayan, Yaron

    2006-01-01

    The disaster caused by the tsunami of 26 December 2004 was one of the worst that medical systems have faced. The aim of this study was to learn about the medical response of the Thai hospitals to this disaster and to establish guidelines that will help hospitals prepare for future disasters. The Israeli Defense Forces (IDF) Home Front Command (HFC) Medical Department sent a research delegation to Thai hospitals to study: (1) pre-event hospital preparedness; (2) patient evacuation and triage; (3) personnel and equipment reinforcement; (4) modes used for alarm and recruitment of hospital personnel; (5) internal reorganization of hospitals; and (6) admission, discharge, and secondary transfer (forward management) of patients. Thai hospitals were prepared for and drilled for a general mass casualty incident (MCI) involving up to 50 casualties. However, a control system to measure the success of these drills was not identified, and Thai hospitals were not prepared to deal with the unique aspects of a tsunami or to receive thousands of victims. Modes of operation differed between provinces. In Phang Nga and Krabi, many patients were treated in the field. In Phuket, most patients were evacuated early to secondary (district) and tertiary (provincial) hospitals. Hospitals recalled staff rapidly and organized the emergency department for patient triage, treatment, and transfer if needed. Although preparedness was deficient, hospital systems performed well. Disaster management should focus on field-based first aid and triage, and rapid evacuation to secondary hospitals. Additionally, disaster management should reinforce and rely on the existing and well-trusted medical system.

  1. Response of Thai Hospitals to the Tsunami Disaster.

    PubMed

    Leiba, Adi; Ashkenasi, Issac; Nakash, Guy; Pelts, Rami; Schwartz, Dagan; Goldberg, Avishay; Levi, Yeheskel; Bar-Dayan, Yaron

    2006-02-01

    The disaster caused by the Tsunami of 26 December 2004 was one of the worst that medical systems have faced. The aim of this study was to learn about the medical response of the Thai hospitals to this disaster and to establish guidelines that will help hospitals prepare for future disasters. The Israeli Defense Forces (IDF) Home Front Command (HFC) Medical Department sent a research delegation to Thai hospitals to study: (1) pre-event hospital preparedness; (2) patient evacuation and triage; (3) personnel and equipment reinforcement; (4) modes used for alarm and recruitment of hospital personnel; (5) internal reorganization of hospitals; and (6) admission, discharge, and secondary transfer (forward management) of patients. Thai hospitals were prepared for and drilled for a general mass casualty incident (MCI) involving up to 50 casualties. However, a control system to measure the success of these drills was not identified, and Thai hospitals were not prepared to deal with the unique aspects of a tsunami or to receive thousands of victims. Modes of operation differed between provinces. In Phang Nga and Krabi, many patients were treated in the field. In Phuket, most patients were evacuated early to secondary (district) and tertiary (provincial) hospitals. Hospitals recalled staff rapidly and organized the emergency department for patient triage, treatment, and transfer if needed. Although preparedness was deficient, hospital systems performed well. Disaster management should focus on field-based first aid and triage, and rapid evacuation to secondary hospitals. Additionally, disaster management should reinforce and rely on the existing and well-trusted medical system.

  2. Closed-loop and robust control of quantum systems.

    PubMed

    Chen, Chunlin; Wang, Lin-Cheng; Wang, Yuanlong

    2013-01-01

    For most practical quantum control systems, it is important and difficult to attain robustness and reliability due to unavoidable uncertainties in the system dynamics or models. Three kinds of typical approaches (e.g., closed-loop learning control, feedback control, and robust control) have been proved to be effective to solve these problems. This work presents a self-contained survey on the closed-loop and robust control of quantum systems, as well as a brief introduction to a selection of basic theories and methods in this research area, to provide interested readers with a general idea for further studies. In the area of closed-loop learning control of quantum systems, we survey and introduce such learning control methods as gradient-based methods, genetic algorithms (GA), and reinforcement learning (RL) methods from a unified point of view of exploring the quantum control landscapes. For the feedback control approach, the paper surveys three control strategies including Lyapunov control, measurement-based control, and coherent-feedback control. Then such topics in the field of quantum robust control as H(∞) control, sliding mode control, quantum risk-sensitive control, and quantum ensemble control are reviewed. The paper concludes with a perspective of future research directions that are likely to attract more attention.

  3. Aversive Learning and Appetitive Motivation Toggle Feed-Forward Inhibition in the Drosophila Mushroom Body.

    PubMed

    Perisse, Emmanuel; Owald, David; Barnstedt, Oliver; Talbot, Clifford B; Huetteroth, Wolf; Waddell, Scott

    2016-06-01

    In Drosophila, negatively reinforcing dopaminergic neurons also provide the inhibitory control of satiety over appetitive memory expression. Here we show that aversive learning causes a persistent depression of the conditioned odor drive to two downstream feed-forward inhibitory GABAergic interneurons of the mushroom body, called MVP2, or mushroom body output neuron (MBON)-γ1pedc>α/β. However, MVP2 neuron output is only essential for expression of short-term aversive memory. Stimulating MVP2 neurons preferentially inhibits the odor-evoked activity of avoidance-directing MBONs and odor-driven avoidance behavior, whereas their inhibition enhances odor avoidance. In contrast, odor-evoked activity of MVP2 neurons is elevated in hungry flies, and their feed-forward inhibition is required for expression of appetitive memory at all times. Moreover, imposing MVP2 activity promotes inappropriate appetitive memory expression in food-satiated flies. Aversive learning and appetitive motivation therefore toggle alternate modes of a common feed-forward inhibitory MVP2 pathway to promote conditioned odor avoidance or approach. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

  4. A study of the relative effectiveness and cost of computerized information retrieval in the interactive mode

    NASA Technical Reports Server (NTRS)

    Smetana, F. O.; Furniss, M. A.; Potter, T. R.

    1974-01-01

    Results of a number of experiments to illuminate the relative effectiveness and costs of computerized information retrieval in the interactive mode are reported. It was found that for equal time spent in preparing the search strategy, the batch and interactive modes gave approximately equal recall and relevance. The interactive mode however encourages the searcher to devote more time to the task and therefore usually yields improved output. Engineering costs as a result are higher in this mode. Estimates of associated hardware costs also indicate that operation in this mode is more expensive. Skilled RECON users like the rapid feedback and additional features offered by this mode if they are not constrained by considerations of cost.

  5. Apparatus with moderating material for microwave heat treatment of manufactured components

    DOEpatents

    Ripley, Edward B [Knoxville, TN

    2011-05-10

    An apparatus for heat treating manufactured components using microwave energy and microwave susceptor material. Heat treating medium such as eutectic salts may be employed. A fluidized bed introduces process gases which may include carburizing or nitriding gases The process may be operated in a batch mode or continuous process mode. A microwave heating probe may be used to restart a frozen eutectic salt bath.

  6. Apparatus for microwave heat treatment of manufactured components

    DOEpatents

    Babcock & Wilcox Technical Services Y-12, LLC

    2008-04-15

    An apparatus for heat treating manufactured components using microwave energy and microwave susceptor material. Heat treating medium such as eutectic salts may be employed. A fluidized bed introduces process gases which may include carburizing or nitriding gases. The process may be operated in a batch mode or continuous process mode. A microwave heating probe may be used to restart a frozen eutectic salt bath.

  7. Methods for microwave heat treatment of manufactured components

    DOEpatents

    Ripley, Edward B.

    2010-08-03

    An apparatus for heat treating manufactured components using microwave energy and microwave susceptor material. Heat treating medium such as eutectic salts may be employed. A fluidized bed introduces process gases which may include carburizing or nitriding gases. The process may be operated in a batch mode or continuous process mode. A microwave heating probe may be used to restart a frozen eutectic salt bath.

  8. Education Technology Policy for a 21st Century Learning System. Policy Brief 13-3

    ERIC Educational Resources Information Center

    Kerchner, Charles Taylor

    2013-01-01

    Internet-related technology has the capacity to change the learning production system in three important ways. First, it creates the capacity to move from the existing batch processing system of teaching and learning to a much more individualized learning system capable of matching instructional style and pace to a student's needs. Second,…

  9. Perfusion cell culture decreases process and product heterogeneity in a head-to-head comparison with fed-batch.

    PubMed

    Walther, Jason; Lu, Jiuyi; Hollenbach, Myles; Yu, Marcella; Hwang, Chris; McLarty, Jean; Brower, Kevin

    2018-05-30

    In this study, we compared the impacts of fed-batch and perfusion platforms on process and product attributes for IgG1- and IgG4-producing cell lines. A "plug-and-play" approach was applied to both platforms at bench scale, using commercially available basal and feed media, a standard feed strategy for fed-batch, and ATF filtration for perfusion. Product concentration in fed-batch was 2.5 times greater than perfusion, while average productivity in perfusion was 7.5 times greater than fed-batch. PCA revealed more variability in the cell environment and metabolism during the fed-batch run. LDH measurements showed that exposure of product to cell lysate was 7-10 times greater in fed-batch. Product analysis shows larger abundances of neutral species in perfusion, likely due to decreased bioreactor residence times and extracellular exposure. The IgG1 perfusion product also had higher purity and lower half-antibody. Glycosylation was similar across both culture modes. The first perfusion harvest slice for both product types showed different glycosylation than subsequent harvests, suggesting that product quality lags behind metabolism. In conclusion, process and product data indicate that intra-lot heterogeneity is decreased in perfusion cultures. Additional data and discussion is required to understand the developmental, clinical and commercial implications, and in what situations increased uniformity would be beneficial. This article is protected by copyright. All rights reserved.

  10. Exploiting the metabolism of PYC expressing HEK293 cells in fed-batch cultures.

    PubMed

    Vallée, Cédric; Durocher, Yves; Henry, Olivier

    2014-01-01

    The expression of recombinant yeast pyruvate carboxylase (PYC) in animal cell lines was shown in previous studies to reduce significantly the formation of waste metabolites, although it has translated into mixed results in terms of improved cellular growth and productivity. In this work, we demonstrate that the unique phenotype of PYC expressing cells can be exploited through the application of a dynamic fed-batch strategy and lead to significant process enhancements. Metabolically engineered HEK293 cells stably producing human recombinant IFNα2b and expressing the PYC enzyme were cultured in batch and fed-batch modes. Compared to parental cells, the maximum cell density in batch was increased 1.5-fold and the culture duration was extended by 2.5 days, but the product yield was only marginally increased. Further improvements were achieved by developing and implementing a dynamic fed-batch strategy using a concentrated feed solution. The feeding was based on an automatic control-loop to maintain a constant glucose concentration. This strategy led to a further 2-fold increase in maximum cell density (up to 10.7×10(6)cells/ml) and a final product titer of 160mg/l, representing nearly a 3-fold yield increase compared to the batch process with the parental cell clone. Copyright © 2013 Elsevier B.V. All rights reserved.

  11. Improvements in Production of Single-Walled Carbon Nanotubes

    NASA Technical Reports Server (NTRS)

    Balzano, Leandro; Resasco, Daniel E.

    2009-01-01

    A continuing program of research and development has been directed toward improvement of a prior batch process in which single-walled carbon nanotubes are formed by catalytic disproportionation of carbon monoxide in a fluidized-bed reactor. The overall effect of the improvements has been to make progress toward converting the process from a batch mode to a continuous mode and to scaling of production to larger quantities. Efforts have also been made to optimize associated purification and dispersion post processes to make them effective at large scales and to investigate means of incorporating the purified products into composite materials. The ultimate purpose of the program is to enable the production of high-quality single-walled carbon nanotubes in quantities large enough and at costs low enough to foster the further development of practical applications. The fluidized bed used in this process contains mixed-metal catalyst particles. The choice of the catalyst and the operating conditions is such that the yield of single-walled carbon nanotubes, relative to all forms of carbon (including carbon fibers, multi-walled carbon nanotubes, and graphite) produced in the disproportionation reaction is more than 90 weight percent. After the reaction, the nanotubes are dispersed in various solvents in preparation for end use, which typically involves blending into a plastic, ceramic, or other matrix to form a composite material. Notwithstanding the batch nature of the unmodified prior fluidized-bed process, the fluidized-bed reactor operates in a continuous mode during the process. The operation is almost entirely automated, utilizing mass flow controllers, a control computer running software specific to the process, and other equipment. Moreover, an important inherent advantage of fluidized- bed reactors in general is that solid particles can be added to and removed from fluidized beds during operation. For these reasons, the process and equipment were amenable to modification for conversion from batch to continuous production.

  12. Life Span Differences in Electrophysiological Correlates of Monitoring Gains and Losses during Probabilistic Reinforcement Learning

    ERIC Educational Resources Information Center

    Hammerer, Dorothea; Li, Shu-Chen; Muller, Viktor; Lindenberger, Ulman

    2011-01-01

    By recording the feedback-related negativity (FRN) in response to gains and losses, we investigated the contribution of outcome monitoring mechanisms to age-associated differences in probabilistic reinforcement learning. Specifically, we assessed the difference of the monitoring reactions to gains and losses to investigate the monitoring of…

  13. Reinforcement Learning in Young Adults with Developmental Language Impairment

    ERIC Educational Resources Information Center

    Lee, Joanna C.; Tomblin, J. Bruce

    2012-01-01

    The aim of the study was to examine reinforcement learning (RL) in young adults with developmental language impairment (DLI) within the context of a neurocomputational model of the basal ganglia-dopamine system (Frank, Seeberger, & O'Reilly, 2004). Two groups of young adults, one with DLI and the other without, were recruited. A probabilistic…

  14. Effective Reinforcement Techniques in Elementary Physical Education: The Key to Behavior Management

    ERIC Educational Resources Information Center

    Downing, John; Keating, Tedd; Bennett, Carl

    2005-01-01

    The ability to shape appropriate behavior while extinguishing misbehavior is critical to teaching and learning in physical education. The scientific principles that affect student learning in the gymnasium also apply to the methods teachers use to influence social behaviors. Research indicates that reinforcement strategies are more effective than…

  15. Fatigue Assessment for the Failed Bridge Deck Closure Pour at Mile Marker 43 on I-81.

    DOT National Transportation Integrated Search

    2014-04-01

    "Fatigue of reinforcing steel in concrete bridge decks has not been identified as a common failure mode. Generally, the : stress range occurring in reinforcing steel is below the fatigue threshold and infinite fatigue life can be expected. Closure po...

  16. 15. DETAIL VIEW OF SAWMILL POWER HOUSE FOUNDATION (CONSTRUCTED IN ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    15. DETAIL VIEW OF SAWMILL POWER HOUSE FOUNDATION (CONSTRUCTED IN 1909) SHOWING THE WIRE-ROPE AND RAILROAD RAILS USED AS REINFORCEMENT IN THE CONCRETE. THIS SAME MODE OF REINFORCEMENT WAS USED IN THE DAM. - Hume Lake Dam, Sequioa National Forest, Hume, Fresno County, CA

  17. A neural model of hierarchical reinforcement learning

    PubMed Central

    Rasmussen, Daniel; Eliasmith, Chris

    2017-01-01

    We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain’s general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model’s behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions. PMID:28683111

  18. Reinforcement Learning Trees

    PubMed Central

    Zhu, Ruoqing; Zeng, Donglin; Kosorok, Michael R.

    2015-01-01

    In this paper, we introduce a new type of tree-based method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001) under high-dimensional settings. The innovations are three-fold. First, the new method implements reinforcement learning at each selection of a splitting variable during the tree construction processes. By splitting on the variable that brings the greatest future improvement in later splits, rather than choosing the one with largest marginal effect from the immediate split, the constructed tree utilizes the available samples in a more efficient way. Moreover, such an approach enables linear combination cuts at little extra computational cost. Second, we propose a variable muting procedure that progressively eliminates noise variables during the construction of each individual tree. The muting procedure also takes advantage of reinforcement learning and prevents noise variables from being considered in the search for splitting rules, so that towards terminal nodes, where the sample size is small, the splitting rules are still constructed from only strong variables. Last, we investigate asymptotic properties of the proposed method under basic assumptions and discuss rationale in general settings. PMID:26903687

  19. Reinforcement learning state estimator.

    PubMed

    Morimoto, Jun; Doya, Kenji

    2007-03-01

    In this study, we propose a novel use of reinforcement learning for estimating hidden variables and parameters of nonlinear dynamical systems. A critical issue in hidden-state estimation is that we cannot directly observe estimation errors. However, by defining errors of observable variables as a delayed penalty, we can apply a reinforcement learning frame-work to state estimation problems. Specifically, we derive a method to construct a nonlinear state estimator by finding an appropriate feedback input gain using the policy gradient method. We tested the proposed method on single pendulum dynamics and show that the joint angle variable could be successfully estimated by observing only the angular velocity, and vice versa. In addition, we show that we could acquire a state estimator for the pendulum swing-up task in which a swing-up controller is also acquired by reinforcement learning simultaneously. Furthermore, we demonstrate that it is possible to estimate the dynamics of the pendulum itself while the hidden variables are estimated in the pendulum swing-up task. Application of the proposed method to a two-linked biped model is also presented.

  20. Reciprocity Family Counseling: A Multi-Ethnic Model.

    ERIC Educational Resources Information Center

    Penrose, David M.

    The Reciprocity Family Counseling Method involves learning principles of behavior modification including selective reinforcement, behavioral contracting, self-correction, and over-correction. Selective reinforcement refers to the recognition and modification of parent/child responses and reinforcers. Parents and children are asked to identify…

  1. Reinforcement learning: Solving two case studies

    NASA Astrophysics Data System (ADS)

    Duarte, Ana Filipa; Silva, Pedro; dos Santos, Cristina Peixoto

    2012-09-01

    Reinforcement Learning algorithms offer interesting features for the control of autonomous systems, such as the ability to learn from direct interaction with the environment, and the use of a simple reward signalas opposed to the input-outputs pairsused in classic supervised learning. The reward signal indicates the success of failure of the actions executed by the agent in the environment. In this work, are described RL algorithmsapplied to two case studies: the Crawler robot and the widely known inverted pendulum. We explore RL capabilities to autonomously learn a basic locomotion pattern in the Crawler, andapproach the balancing problem of biped locomotion using the inverted pendulum.

  2. Reinforcement active learning in the vibrissae system: optimal object localization.

    PubMed

    Gordon, Goren; Dorfman, Nimrod; Ahissar, Ehud

    2013-01-01

    Rats move their whiskers to acquire information about their environment. It has been observed that they palpate novel objects and objects they are required to localize in space. We analyze whisker-based object localization using two complementary paradigms, namely, active learning and intrinsic-reward reinforcement learning. Active learning algorithms select the next training samples according to the hypothesized solution in order to better discriminate between correct and incorrect labels. Intrinsic-reward reinforcement learning uses prediction errors as the reward to an actor-critic design, such that behavior converges to the one that optimizes the learning process. We show that in the context of object localization, the two paradigms result in palpation whisking as their respective optimal solution. These results suggest that rats may employ principles of active learning and/or intrinsic reward in tactile exploration and can guide future research to seek the underlying neuronal mechanisms that implement them. Furthermore, these paradigms are easily transferable to biomimetic whisker-based artificial sensors and can improve the active exploration of their environment. Copyright © 2012 Elsevier Ltd. All rights reserved.

  3. An intelligent agent for optimal river-reservoir system management

    NASA Astrophysics Data System (ADS)

    Rieker, Jeffrey D.; Labadie, John W.

    2012-09-01

    A generalized software package is presented for developing an intelligent agent for stochastic optimization of complex river-reservoir system management and operations. Reinforcement learning is an approach to artificial intelligence for developing a decision-making agent that learns the best operational policies without the need for explicit probabilistic models of hydrologic system behavior. The agent learns these strategies experientially in a Markov decision process through observational interaction with the environment and simulation of the river-reservoir system using well-calibrated models. The graphical user interface for the reinforcement learning process controller includes numerous learning method options and dynamic displays for visualizing the adaptive behavior of the agent. As a case study, the generalized reinforcement learning software is applied to developing an intelligent agent for optimal management of water stored in the Truckee river-reservoir system of California and Nevada for the purpose of streamflow augmentation for water quality enhancement. The intelligent agent successfully learns long-term reservoir operational policies that specifically focus on mitigating water temperature extremes during persistent drought periods that jeopardize the survival of threatened and endangered fish species.

  4. Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning

    PubMed Central

    Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien

    2015-01-01

    Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction. PMID:26065018

  5. Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning.

    PubMed

    Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien

    2015-01-01

    Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction.

  6. A comparison of differential reinforcement procedures with children with autism.

    PubMed

    Boudreau, Brittany A; Vladescu, Jason C; Kodak, Tiffany M; Argott, Paul J; Kisamore, April N

    2015-12-01

    The current evaluation compared the effects of 2 differential reinforcement arrangements and a nondifferential reinforcement arrangement on the acquisition of tacts for 3 children with autism. Participants learned in all reinforcement-based conditions, and we discuss areas for future research in light of these findings and potential limitations. © Society for the Experimental Analysis of Behavior.

  7. Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis

    PubMed Central

    2013-01-01

    Background Depression is characterised partly by blunted reactions to reward. However, tasks probing this deficiency have not distinguished insensitivity to reward from insensitivity to the prediction errors for reward that determine learning and are putatively reported by the phasic activity of dopamine neurons. We attempted to disentangle these factors with respect to anhedonia in the context of stress, Major Depressive Disorder (MDD), Bipolar Disorder (BPD) and a dopaminergic challenge. Methods Six behavioural datasets involving 392 experimental sessions were subjected to a model-based, Bayesian meta-analysis. Participants across all six studies performed a probabilistic reward task that used an asymmetric reinforcement schedule to assess reward learning. Healthy controls were tested under baseline conditions, stress or after receiving the dopamine D2 agonist pramipexole. In addition, participants with current or past MDD or BPD were evaluated. Reinforcement learning models isolated the contributions of variation in reward sensitivity and learning rate. Results MDD and anhedonia reduced reward sensitivity more than they affected the learning rate, while a low dose of the dopamine D2 agonist pramipexole showed the opposite pattern. Stress led to a pattern consistent with a mixed effect on reward sensitivity and learning rate. Conclusion Reward-related learning reflected at least two partially separable contributions. The first related to phasic prediction error signalling, and was preferentially modulated by a low dose of the dopamine agonist pramipexole. The second related directly to reward sensitivity, and was preferentially reduced in MDD and anhedonia. Stress altered both components. Collectively, these findings highlight the contribution of model-based reinforcement learning meta-analysis for dissecting anhedonic behavior. PMID:23782813

  8. Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.

    PubMed

    Schönberg, Tom; Daw, Nathaniel D; Joel, Daphna; O'Doherty, John P

    2007-11-21

    The computational framework of reinforcement learning has been used to forward our understanding of the neural mechanisms underlying reward learning and decision-making behavior. It is known that humans vary widely in their performance in decision-making tasks. Here, we used a simple four-armed bandit task in which subjects are almost evenly split into two groups on the basis of their performance: those who do learn to favor choice of the optimal action and those who do not. Using models of reinforcement learning we sought to determine the neural basis of these intrinsic differences in performance by scanning both groups with functional magnetic resonance imaging. We scanned 29 subjects while they performed the reward-based decision-making task. Our results suggest that these two groups differ markedly in the degree to which reinforcement learning signals in the striatum are engaged during task performance. While the learners showed robust prediction error signals in both the ventral and dorsal striatum during learning, the nonlearner group showed a marked absence of such signals. Moreover, the magnitude of prediction error signals in a region of dorsal striatum correlated significantly with a measure of behavioral performance across all subjects. These findings support a crucial role of prediction error signals, likely originating from dopaminergic midbrain neurons, in enabling learning of action selection preferences on the basis of obtained rewards. Thus, spontaneously observed individual differences in decision making performance demonstrate the suggested dependence of this type of learning on the functional integrity of the dopaminergic striatal system in humans.

  9. Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.

    PubMed

    Ezaki, Takahiro; Horita, Yutaka; Takezawa, Masanori; Masuda, Naoki

    2016-07-01

    Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody) conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.

  10. Social stress reactivity alters reward and punishment learning

    PubMed Central

    Frank, Michael J.; Allen, John J. B.

    2011-01-01

    To examine how stress affects cognitive functioning, individual differences in trait vulnerability (punishment sensitivity) and state reactivity (negative affect) to social evaluative threat were examined during concurrent reinforcement learning. Lower trait-level punishment sensitivity predicted better reward learning and poorer punishment learning; the opposite pattern was found in more punishment sensitive individuals. Increasing state-level negative affect was directly related to punishment learning accuracy in highly punishment sensitive individuals, but these measures were inversely related in less sensitive individuals. Combined electrophysiological measurement, performance accuracy and computational estimations of learning parameters suggest that trait and state vulnerability to stress alter cortico-striatal functioning during reinforcement learning, possibly mediated via medio-frontal cortical systems. PMID:20453038

  11. Social stress reactivity alters reward and punishment learning.

    PubMed

    Cavanagh, James F; Frank, Michael J; Allen, John J B

    2011-06-01

    To examine how stress affects cognitive functioning, individual differences in trait vulnerability (punishment sensitivity) and state reactivity (negative affect) to social evaluative threat were examined during concurrent reinforcement learning. Lower trait-level punishment sensitivity predicted better reward learning and poorer punishment learning; the opposite pattern was found in more punishment sensitive individuals. Increasing state-level negative affect was directly related to punishment learning accuracy in highly punishment sensitive individuals, but these measures were inversely related in less sensitive individuals. Combined electrophysiological measurement, performance accuracy and computational estimations of learning parameters suggest that trait and state vulnerability to stress alter cortico-striatal functioning during reinforcement learning, possibly mediated via medio-frontal cortical systems.

  12. Predicting psychosis across diagnostic boundaries: Behavioral and computational modeling evidence for impaired reinforcement learning in schizophrenia and bipolar disorder with a history of psychosis.

    PubMed

    Strauss, Gregory P; Thaler, Nicholas S; Matveeva, Tatyana M; Vogel, Sally J; Sutton, Griffin P; Lee, Bern G; Allen, Daniel N

    2015-08-01

    There is increasing evidence that schizophrenia (SZ) and bipolar disorder (BD) share a number of cognitive, neurobiological, and genetic markers. Shared features may be most prevalent among SZ and BD with a history of psychosis. This study extended this literature by examining reinforcement learning (RL) performance in individuals with SZ (n = 29), BD with a history of psychosis (BD+; n = 24), BD without a history of psychosis (BD-; n = 23), and healthy controls (HC; n = 24). RL was assessed through a probabilistic stimulus selection task with acquisition and test phases. Computational modeling evaluated competing accounts of the data. Each participant's trial-by-trial decision-making behavior was fit to 3 computational models of RL: (a) a standard actor-critic model simulating pure basal ganglia-dependent learning, (b) a pure Q-learning model simulating action selection as a function of learned expected reward value, and (c) a hybrid model where an actor-critic is "augmented" by a Q-learning component, meant to capture the top-down influence of orbitofrontal cortex value representations on the striatum. The SZ group demonstrated greater reinforcement learning impairments at acquisition and test phases than the BD+, BD-, and HC groups. The BD+ and BD- groups displayed comparable performance at acquisition and test phases. Collapsing across diagnostic categories, greater severity of current psychosis was associated with poorer acquisition of the most rewarding stimuli as well as poor go/no-go learning at test. Model fits revealed that reinforcement learning in SZ was best characterized by a pure actor-critic model where learning is driven by prediction error signaling alone. In contrast, BD-, BD+, and HC were best fit by a hybrid model where prediction errors are influenced by top-down expected value representations that guide decision making. These findings suggest that abnormalities in the reward system are more prominent in SZ than BD; however, current psychotic symptoms may be associated with reinforcement learning deficits regardless of a Diagnostic and Statistical Manual of Mental Disorders (5th Edition; American Psychiatric Association, 2013) diagnosis. (c) 2015 APA, all rights reserved).

  13. Motor Learning Enhances Use-Dependent Plasticity

    PubMed Central

    2017-01-01

    Motor behaviors are shaped not only by current sensory signals but also by the history of recent experiences. For instance, repeated movements toward a particular target bias the subsequent movements toward that target direction. This process, called use-dependent plasticity (UDP), is considered a basic and goal-independent way of forming motor memories. Most studies consider movement history as the critical component that leads to UDP (Classen et al., 1998; Verstynen and Sabes, 2011). However, the effects of learning (i.e., improved performance) on UDP during movement repetition have not been investigated. Here, we used transcranial magnetic stimulation in two experiments to assess plasticity changes occurring in the primary motor cortex after individuals repeated reinforced and nonreinforced actions. The first experiment assessed whether learning a skill task modulates UDP. We found that a group that successfully learned the skill task showed greater UDP than a group that did not accumulate learning, but made comparable repeated actions. The second experiment aimed to understand the role of reinforcement learning in UDP while controlling for reward magnitude and action kinematics. We found that providing subjects with a binary reward without visual feedback of the cursor led to increased UDP effects. Subjects in the group that received comparable reward not associated with their actions maintained the previously induced UDP. Our findings illustrate how reinforcing consistent actions strengthens use-dependent memories and provide insight into operant mechanisms that modulate plastic changes in the motor cortex. SIGNIFICANCE STATEMENT Performing consistent motor actions induces use-dependent plastic changes in the motor cortex. This plasticity reflects one of the basic forms of human motor learning. Past studies assumed that this form of learning is exclusively affected by repetition of actions. However, here we showed that success-based reinforcement signals could affect the human use-dependent plasticity (UDP) process. Our results indicate that learning augments and interacts with UDP. This effect is important to the understanding of the interplay between the different forms of motor learning and suggests that reinforcement is not only important to learning new behaviors, but can shape our subsequent behavior via its interaction with UDP. PMID:28143961

  14. 75 FR 9634 - Office of Hazardous Materials Safety; Notice of Application for Special Permits

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-03-03

    ... wrapped fiber reinforced composite gas cylinders for the transportation of certain compressed gases. (mode... inert atmosphere in a shipping container to protect the electronic sensors for a satellite. (mode 1...

  15. Advances in industrial biopharmaceutical batch process monitoring: Machine-learning methods for small data problems.

    PubMed

    Tulsyan, Aditya; Garvin, Christopher; Ündey, Cenk

    2018-04-06

    Biopharmaceutical manufacturing comprises of multiple distinct processing steps that require effective and efficient monitoring of many variables simultaneously in real-time. The state-of-the-art real-time multivariate statistical batch process monitoring (BPM) platforms have been in use in recent years to ensure comprehensive monitoring is in place as a complementary tool for continued process verification to detect weak signals. This article addresses a longstanding, industry-wide problem in BPM, referred to as the "Low-N" problem, wherein a product has a limited production history. The current best industrial practice to address the Low-N problem is to switch from a multivariate to a univariate BPM, until sufficient product history is available to build and deploy a multivariate BPM platform. Every batch run without a robust multivariate BPM platform poses risk of not detecting potential weak signals developing in the process that might have an impact on process and product performance. In this article, we propose an approach to solve the Low-N problem by generating an arbitrarily large number of in silico batches through a combination of hardware exploitation and machine-learning methods. To the best of authors' knowledge, this is the first article to provide a solution to the Low-N problem in biopharmaceutical manufacturing using machine-learning methods. Several industrial case studies from bulk drug substance manufacturing are presented to demonstrate the efficacy of the proposed approach for BPM under various Low-N scenarios. © 2018 Wiley Periodicals, Inc.

  16. Specific effect of a dopamine partial agonist on counterfactual learning: evidence from Gilles de la Tourette syndrome.

    PubMed

    Salvador, Alexandre; Worbe, Yulia; Delorme, Cécile; Coricelli, Giorgio; Gaillard, Raphaël; Robbins, Trevor W; Hartmann, Andreas; Palminteri, Stefano

    2017-07-24

    The dopamine partial agonist aripiprazole is increasingly used to treat pathologies for which other antipsychotics are indicated because it displays fewer side effects, such as sedation and depression-like symptoms, than other dopamine receptor antagonists. Previously, we showed that aripiprazole may protect motivational function by preserving reinforcement-related signals used to sustain reward-maximization. However, the effect of aripiprazole on more cognitive facets of human reinforcement learning, such as learning from the forgone outcomes of alternative courses of action (i.e., counterfactual learning), is unknown. To test the influence of aripiprazole on counterfactual learning, we administered a reinforcement learning task that involves both direct learning from obtained outcomes and indirect learning from forgone outcomes to two groups of Gilles de la Tourette (GTS) patients, one consisting of patients who were completely unmedicated and the other consisting of patients who were receiving aripiprazole monotherapy, and to healthy subjects. We found that whereas learning performance improved in the presence of counterfactual feedback in both healthy controls and unmedicated GTS patients, this was not the case in aripiprazole-medicated GTS patients. Our results suggest that whereas aripiprazole preserves direct learning of action-outcome associations, it may impair more complex inferential processes, such as counterfactual learning from forgone outcomes, in GTS patients treated with this medication.

  17. Somatosensory Contribution to the Initial Stages of Human Motor Learning

    PubMed Central

    Bernardi, Nicolò F.; Darainy, Mohammad

    2015-01-01

    The early stages of motor skill acquisition are often marked by uncertainty about the sensory and motor goals of the task, as is the case in learning to speak or learning the feel of a good tennis serve. Here we present an experimental model of this early learning process, in which targets are acquired by exploration and reinforcement rather than sensory error. We use this model to investigate the relative contribution of motor and sensory factors to human motor learning. Participants make active reaching movements or matched passive movements to an unseen target using a robot arm. We find that learning through passive movements paired with reinforcement is comparable with learning associated with active movement, both in terms of magnitude and durability, with improvements due to training still observable at a 1 week retest. Motor learning is also accompanied by changes in somatosensory perceptual acuity. No stable changes in motor performance are observed for participants that train, actively or passively, in the absence of reinforcement, or for participants who are given explicit information about target position in the absence of somatosensory experience. These findings indicate that the somatosensory system dominates learning in the early stages of motor skill acquisition. SIGNIFICANCE STATEMENT The research focuses on the initial stages of human motor learning, introducing a new experimental model that closely approximates the key features of motor learning outside of the laboratory. The finding indicates that it is the somatosensory system rather than the motor system that dominates learning in the early stages of motor skill acquisition. This is important given that most of our computational models of motor learning are based on the idea that learning is motoric in origin. This is also a valuable finding for rehabilitation of patients with limited mobility as it shows that reinforcement in conjunction with passive movement results in benefits to motor learning that are as great as those observed for active movement training. PMID:26490869

  18. Effect of the Microstructure on the Fracture Mode of Short-Fiber Reinforced Plastic Composites

    NASA Astrophysics Data System (ADS)

    Nishikawa, Masaaki; Okabe, Tomonaga; Takeda, Nobuo

    A numerical simulation was presented to discuss the microscopic damage and its influence on the strength and energy-absorbing capability of short-fiber reinforced plastic composites. The dominant damage includes matrix crack and/or interfacial debonding, when the fibers are shorter than the critical length for fiber breakage. The simulation addressed the matrix crack with a continuum damage mechanics (CDM) model and the interfacial debonding with an embedded process zone (EPZ) model. Fictitious free-edge effects on the fracture modes were successfully eliminated with the periodic-cell simulation. The advantage of our simulation was pointed out by demonstrating that the simulation with edge effects significantly overestimates the dissipative energy of the composites. We then investigated the effect of the material microstructure on the fracture modes in the composites. The simulated results clarified that the inter-fiber distance affects the breaking strain of the composites and the fiber-orientation angle affects the position of the damage initiation. These factors influence the strength and energy-absorbing capability of short fiber-reinforced composites.

  19. Kinetic modelling of anaerobic hydrolysis of solid wastes, including disintegration processes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    García-Gen, Santiago; Sousbie, Philippe; Rangaraj, Ganesh

    2015-01-15

    Highlights: • Fractionation of solid wastes into readily and slowly biodegradable fractions. • Kinetic coefficients estimation from mono-digestion batch assays. • Validation of kinetic coefficients with a co-digestion continuous experiment. • Simulation of batch and continuous experiments with an ADM1-based model. - Abstract: A methodology to estimate disintegration and hydrolysis kinetic parameters of solid wastes and validate an ADM1-based anaerobic co-digestion model is presented. Kinetic parameters of the model were calibrated from batch reactor experiments treating individually fruit and vegetable wastes (among other residues) following a new protocol for batch tests. In addition, decoupled disintegration kinetics for readily and slowlymore » biodegradable fractions of solid wastes was considered. Calibrated parameters from batch assays of individual substrates were used to validate the model for a semi-continuous co-digestion operation treating simultaneously 5 fruit and vegetable wastes. The semi-continuous experiment was carried out in a lab-scale CSTR reactor for 15 weeks at organic loading rate ranging between 2.0 and 4.7 g VS/L d. The model (built in Matlab/Simulink) fit to a large extent the experimental results in both batch and semi-continuous mode and served as a powerful tool to simulate the digestion or co-digestion of solid wastes.« less

  20. A Case Study on Attribute Recognition of Heated Metal Mark Image Using Deep Convolutional Neural Networks.

    PubMed

    Mao, Keming; Lu, Duo; E, Dazhi; Tan, Zhenhua

    2018-06-07

    Heated metal mark is an important trace to identify the cause of fire. However, traditional methods mainly focus on the knowledge of physics and chemistry for qualitative analysis and make it still a challenging problem. This paper presents a case study on attribute recognition of the heated metal mark image using computer vision and machine learning technologies. The proposed work is composed of three parts. Material is first generated. According to national standards, actual needs and feasibility, seven attributes are selected for research. Data generation and organization are conducted, and a small size benchmark dataset is constructed. A recognition model is then implemented. Feature representation and classifier construction methods are introduced based on deep convolutional neural networks. Finally, the experimental evaluation is carried out. Multi-aspect testings are performed with various model structures, data augments, training modes, optimization methods and batch sizes. The influence of parameters, recognitio efficiency and execution time are also analyzed. The results show that with a fine-tuned model, the recognition rate of attributes metal type, heating mode, heating temperature, heating duration, cooling mode, placing duration and relative humidity are 0.925, 0.908, 0.835, 0.917, 0.928, 0.805 and 0.92, respectively. The proposed method recognizes the attribute of heated metal mark with preferable effect, and it can be used in practical application.

  1. Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning.

    PubMed

    Sidarta, Ananda; Vahdat, Shahabeddin; Bernardi, Nicolò F; Ostry, David J

    2016-11-16

    As one learns to dance or play tennis, the desired somatosensory state is typically unknown. Trial and error is important as motor behavior is shaped by successful and unsuccessful movements. As an experimental model, we designed a task in which human participants make reaching movements to a hidden target and receive positive reinforcement when successful. We identified somatic and reinforcement-based sources of plasticity on the basis of changes in functional connectivity using resting-state fMRI before and after learning. The neuroimaging data revealed reinforcement-related changes in both motor and somatosensory brain areas in which a strengthening of connectivity was related to the amount of positive reinforcement during learning. Areas of prefrontal cortex were similarly altered in relation to reinforcement, with connectivity between sensorimotor areas of putamen and the reward-related ventromedial prefrontal cortex strengthened in relation to the amount of successful feedback received. In other analyses, we assessed connectivity related to changes in movement direction between trials, a type of variability that presumably reflects exploratory strategies during learning. We found that connectivity in a network linking motor and somatosensory cortices increased with trial-to-trial changes in direction. Connectivity varied as well with the change in movement direction following incorrect movements. Here the changes were observed in a somatic memory and decision making network involving ventrolateral prefrontal cortex and second somatosensory cortex. Our results point to the idea that the initial stages of motor learning are not wholly motor but rather involve plasticity in somatic and prefrontal networks related both to reward and exploration. In the initial stages of motor learning, the placement of the limbs is learned primarily through trial and error. In an experimental analog, participants make reaching movements to a hidden target and receive positive feedback when successful. We identified sources of plasticity based on changes in functional connectivity using resting-state fMRI. The main finding is that there is a strengthening of connectivity between reward-related prefrontal areas and sensorimotor areas in the basal ganglia and frontal cortex. There is also a strengthening of connectivity related to movement exploration in sensorimotor circuits involved in somatic memory and decision making. The results indicate that initial stages of motor learning depend on plasticity in somatic and prefrontal networks related to reward and exploration. Copyright © 2016 the authors 0270-6474/16/3611682-11$15.00/0.

  2. Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning

    PubMed Central

    Sidarta, Ananda; Vahdat, Shahabeddin; Bernardi, Nicolò F.

    2016-01-01

    As one learns to dance or play tennis, the desired somatosensory state is typically unknown. Trial and error is important as motor behavior is shaped by successful and unsuccessful movements. As an experimental model, we designed a task in which human participants make reaching movements to a hidden target and receive positive reinforcement when successful. We identified somatic and reinforcement-based sources of plasticity on the basis of changes in functional connectivity using resting-state fMRI before and after learning. The neuroimaging data revealed reinforcement-related changes in both motor and somatosensory brain areas in which a strengthening of connectivity was related to the amount of positive reinforcement during learning. Areas of prefrontal cortex were similarly altered in relation to reinforcement, with connectivity between sensorimotor areas of putamen and the reward-related ventromedial prefrontal cortex strengthened in relation to the amount of successful feedback received. In other analyses, we assessed connectivity related to changes in movement direction between trials, a type of variability that presumably reflects exploratory strategies during learning. We found that connectivity in a network linking motor and somatosensory cortices increased with trial-to-trial changes in direction. Connectivity varied as well with the change in movement direction following incorrect movements. Here the changes were observed in a somatic memory and decision making network involving ventrolateral prefrontal cortex and second somatosensory cortex. Our results point to the idea that the initial stages of motor learning are not wholly motor but rather involve plasticity in somatic and prefrontal networks related both to reward and exploration. SIGNIFICANCE STATEMENT In the initial stages of motor learning, the placement of the limbs is learned primarily through trial and error. In an experimental analog, participants make reaching movements to a hidden target and receive positive feedback when successful. We identified sources of plasticity based on changes in functional connectivity using resting-state fMRI. The main finding is that there is a strengthening of connectivity between reward-related prefrontal areas and sensorimotor areas in the basal ganglia and frontal cortex. There is also a strengthening of connectivity related to movement exploration in sensorimotor circuits involved in somatic memory and decision making. The results indicate that initial stages of motor learning depend on plasticity in somatic and prefrontal networks related to reward and exploration. PMID:27852776

  3. Online blind source separation using incremental nonnegative matrix factorization with volume constraint.

    PubMed

    Zhou, Guoxu; Yang, Zuyuan; Xie, Shengli; Yang, Jun-Mei

    2011-04-01

    Online blind source separation (BSS) is proposed to overcome the high computational cost problem, which limits the practical applications of traditional batch BSS algorithms. However, the existing online BSS methods are mainly used to separate independent or uncorrelated sources. Recently, nonnegative matrix factorization (NMF) shows great potential to separate the correlative sources, where some constraints are often imposed to overcome the non-uniqueness of the factorization. In this paper, an incremental NMF with volume constraint is derived and utilized for solving online BSS. The volume constraint to the mixing matrix enhances the identifiability of the sources, while the incremental learning mode reduces the computational cost. The proposed method takes advantage of the natural gradient based multiplication updating rule, and it performs especially well in the recovery of dependent sources. Simulations in BSS for dual-energy X-ray images, online encrypted speech signals, and high correlative face images show the validity of the proposed method.

  4. HgCdTe liquid phase epitaxy - An overview

    NASA Astrophysics Data System (ADS)

    Castro, C. A.; Korenstein, R.

    1982-08-01

    Techniques and results of using liquid phase epitaxy (LPE) to form crystalline thin HgCdTe films for industrial-scale applications in IR detectors and focal plane arrays are discussed. Varying the mole fraction of CdTe in HgCdTe is noted to permit control of the bandwidth. LPE-grown films are noted to have a low carrier concentration, on the order of 4 x 10 to the 14th to 5 x 10 to the 15th/cu cm, a good surface morphology and be amenable to production scale-up. Details of the isothermal, equilibrium cooling, and supersaturation cooling LPE growth modes are reviewed, noting the necessity of developing a reliable method for determining the liquidus temperature for all modes to maintain uniformity of film growth from batch to batch. Mechanical steps can be either dipping the substrate into the melt or the slider boat approach, which is used in the production of compound semiconductors.

  5. Inactivation of Bacteria in Oil Field Injected Water by a Pulsed Plasma Discharge Process

    NASA Astrophysics Data System (ADS)

    Xin, Qing; Li, Zhongjian; Lei, Lecheng; Yang, Bin

    2016-09-01

    Pulsed plasma discharge was employed to inactivate bacteria in the injection water for an oil field. The effects of water conductivity and initial concentration of bacteria on elimination efficiency were investigated in the batch and continuous flow modes. It was demonstrated that Fe2+ contained in injection water could enhance the elimination efficiency greatly. The addition of reducing agent glutathione (GSH) indicated that active radicals generated by pulsed plasma discharges played an important role in the inactivation of bacteria. Moreover, it was found that the microbial inactivation process for both batch and continuous flow mode well fitted the model based on the Weibull's survival function. supported by Zhejiang Province Welfare Technology Applied Research Project of China (No. 2014C31137), National Natural Science Foundation of China (Nos. 21436007 and U1462201), and the Fundamental Research Funds for the Central Universities of China (No. 2015QNA4032)

  6. Favouring butyrate production for a new generation biofuel by acidogenic glucose fermentation using cells immobilised on γ-alumina.

    PubMed

    Syngiridis, Kostas; Bekatorou, Argyro; Kandylis, Panagiotis; Larroche, Christian; Kanellaki, Maria; Koutinas, Athanasios A

    2014-06-01

    The effect of γ-alumina as a fermentation advancing tool and as carrier for culture immobilisation, regarding VFAs and ethanol production during acidogenic fermentation of glucose, was examined at various process conditions (sugar concentration, pH) and operation modes (continuous with and without effluent recirculation and batch). The results showed that at high initial pH (8.9) the continuous acidogenic fermentation of glucose led to high yields of VFAs and favoured the accumulation of butyric acid. The batch process on the other hand at pH 6.5, favoured the ethanol-type fermentation. The results indicate that in the frame of technology development for new generation biofuels, using γ-alumina as a process advancing tool at optimum process conditions (pH, initial glucose concentration and mode of operation), the produced VFAs profile and ethanol concentration may be manipulated. Copyright © 2014. Published by Elsevier Ltd.

  7. What is the optimal task difficulty for reinforcement learning of brain self-regulation?

    PubMed

    Bauer, Robert; Vukelić, Mathias; Gharabaghi, Alireza

    2016-09-01

    The balance between action and reward during neurofeedback may influence reinforcement learning of brain self-regulation. Eleven healthy volunteers participated in three runs of motor imagery-based brain-machine interface feedback where a robot passively opened the hand contingent to β-band modulation. For each run, the β-desynchronization threshold to initiate the hand robot movement increased in difficulty (low, moderate, and demanding). In this context, the incentive to learn was estimated by the change of reward per action, operationalized as the change in reward duration per movement onset. Variance analysis revealed a significant interaction between threshold difficulty and the relationship between reward duration and number of movement onsets (p<0.001), indicating a negative learning incentive for low difficulty, but a positive learning incentive for moderate and demanding runs. Exploration of different thresholds in the same data set indicated that the learning incentive peaked at higher thresholds than the threshold which resulted in maximum classification accuracy. Specificity is more important than sensitivity of neurofeedback for reinforcement learning of brain self-regulation. Learning efficiency requires adequate challenge by neurofeedback interventions. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  8. Kinetic modelling of anaerobic hydrolysis of solid wastes, including disintegration processes.

    PubMed

    García-Gen, Santiago; Sousbie, Philippe; Rangaraj, Ganesh; Lema, Juan M; Rodríguez, Jorge; Steyer, Jean-Philippe; Torrijos, Michel

    2015-01-01

    A methodology to estimate disintegration and hydrolysis kinetic parameters of solid wastes and validate an ADM1-based anaerobic co-digestion model is presented. Kinetic parameters of the model were calibrated from batch reactor experiments treating individually fruit and vegetable wastes (among other residues) following a new protocol for batch tests. In addition, decoupled disintegration kinetics for readily and slowly biodegradable fractions of solid wastes was considered. Calibrated parameters from batch assays of individual substrates were used to validate the model for a semi-continuous co-digestion operation treating simultaneously 5 fruit and vegetable wastes. The semi-continuous experiment was carried out in a lab-scale CSTR reactor for 15 weeks at organic loading rate ranging between 2.0 and 4.7 gVS/Ld. The model (built in Matlab/Simulink) fit to a large extent the experimental results in both batch and semi-continuous mode and served as a powerful tool to simulate the digestion or co-digestion of solid wastes. Copyright © 2014 Elsevier Ltd. All rights reserved.

  9. The Computational Development of Reinforcement Learning during Adolescence

    PubMed Central

    Palminteri, Stefano; Coricelli, Giorgio; Blakemore, Sarah-Jayne

    2016-01-01

    Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents’ behaviour was better explained by a basic reinforcement learning algorithm, adults’ behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence. PMID:27322574

  10. Crack free concrete made with nanofiber reinforcement

    DOT National Transportation Integrated Search

    2011-05-10

    The aviation community has recently expressed significant interest in a broadcast mode of data link services. A broadcast mode of delivery is well suited for applications of a general interest to many users and for applications that require periodic ...

  11. Robust reinforcement learning.

    PubMed

    Morimoto, Jun; Doya, Kenji

    2005-02-01

    This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both offline learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H(infinity) control, we consider a differential game in which a "disturbing" agent tries to make the worst possible disturbance while a "control" agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H(infinity) control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.

  12. Coexistence of Reward and Unsupervised Learning During the Operant Conditioning of Neural Firing Rates

    PubMed Central

    Kerr, Robert R.; Grayden, David B.; Thomas, Doreen A.; Gilson, Matthieu; Burkitt, Anthony N.

    2014-01-01

    A fundamental goal of neuroscience is to understand how cognitive processes, such as operant conditioning, are performed by the brain. Typical and well studied examples of operant conditioning, in which the firing rates of individual cortical neurons in monkeys are increased using rewards, provide an opportunity for insight into this. Studies of reward-modulated spike-timing-dependent plasticity (RSTDP), and of other models such as R-max, have reproduced this learning behavior, but they have assumed that no unsupervised learning is present (i.e., no learning occurs without, or independent of, rewards). We show that these models cannot elicit firing rate reinforcement while exhibiting both reward learning and ongoing, stable unsupervised learning. To fix this issue, we propose a new RSTDP model of synaptic plasticity based upon the observed effects that dopamine has on long-term potentiation and depression (LTP and LTD). We show, both analytically and through simulations, that our new model can exhibit unsupervised learning and lead to firing rate reinforcement. This requires that the strengthening of LTP by the reward signal is greater than the strengthening of LTD and that the reinforced neuron exhibits irregular firing. We show the robustness of our findings to spike-timing correlations, to the synaptic weight dependence that is assumed, and to changes in the mean reward. We also consider our model in the differential reinforcement of two nearby neurons. Our model aligns more strongly with experimental studies than previous models and makes testable predictions for future experiments. PMID:24475240

  13. Oculomotor learning revisited: a model of reinforcement learning in the basal ganglia incorporating an efference copy of motor actions

    PubMed Central

    Fee, Michale S.

    2012-01-01

    In its simplest formulation, reinforcement learning is based on the idea that if an action taken in a particular context is followed by a favorable outcome, then, in the same context, the tendency to produce that action should be strengthened, or reinforced. While reinforcement learning forms the basis of many current theories of basal ganglia (BG) function, these models do not incorporate distinct computational roles for signals that convey context, and those that convey what action an animal takes. Recent experiments in the songbird suggest that vocal-related BG circuitry receives two functionally distinct excitatory inputs. One input is from a cortical region that carries context information about the current “time” in the motor sequence. The other is an efference copy of motor commands from a separate cortical brain region that generates vocal variability during learning. Based on these findings, I propose here a general model of vertebrate BG function that combines context information with a distinct motor efference copy signal. The signals are integrated by a learning rule in which efference copy inputs gate the potentiation of context inputs (but not efference copy inputs) onto medium spiny neurons in response to a rewarded action. The hypothesis is described in terms of a circuit that implements the learning of visually guided saccades. The model makes testable predictions about the anatomical and functional properties of hypothesized context and efference copy inputs to the striatum from both thalamic and cortical sources. PMID:22754501

  14. Oculomotor learning revisited: a model of reinforcement learning in the basal ganglia incorporating an efference copy of motor actions.

    PubMed

    Fee, Michale S

    2012-01-01

    In its simplest formulation, reinforcement learning is based on the idea that if an action taken in a particular context is followed by a favorable outcome, then, in the same context, the tendency to produce that action should be strengthened, or reinforced. While reinforcement learning forms the basis of many current theories of basal ganglia (BG) function, these models do not incorporate distinct computational roles for signals that convey context, and those that convey what action an animal takes. Recent experiments in the songbird suggest that vocal-related BG circuitry receives two functionally distinct excitatory inputs. One input is from a cortical region that carries context information about the current "time" in the motor sequence. The other is an efference copy of motor commands from a separate cortical brain region that generates vocal variability during learning. Based on these findings, I propose here a general model of vertebrate BG function that combines context information with a distinct motor efference copy signal. The signals are integrated by a learning rule in which efference copy inputs gate the potentiation of context inputs (but not efference copy inputs) onto medium spiny neurons in response to a rewarded action. The hypothesis is described in terms of a circuit that implements the learning of visually guided saccades. The model makes testable predictions about the anatomical and functional properties of hypothesized context and efference copy inputs to the striatum from both thalamic and cortical sources.

  15. Histidine-decarboxylase knockout mice show deficient nonreinforced episodic object memory, improved negatively reinforced water-maze performance, and increased neo- and ventro-striatal dopamine turnover.

    PubMed

    Dere, Ekrem; De Souza-Silva, Maria A; Topic, Bianca; Spieler, Richard E; Haas, Helmut L; Huston, Joseph P

    2003-01-01

    The brain's histaminergic system has been implicated in hippocampal synaptic plasticity, learning, and memory, as well as brain reward and reinforcement. Our past pharmacological and lesion studies indicated that the brain's histamine system exerts inhibitory effects on the brain's reinforcement respective reward system reciprocal to mesolimbic dopamine systems, thereby modulating learning and memory performance. Given the close functional relationship between brain reinforcement and memory processes, the total disruption of brain histamine synthesis via genetic disruption of its synthesizing enzyme, histidine decarboxylase (HDC), in the mouse might have differential effects on learning dependent on the task-inherent reinforcement contingencies. Here, we investigated the effects of an HDC gene disruption in the mouse in a nonreinforced object exploration task and a negatively reinforced water-maze task as well as on neo- and ventro-striatal dopamine systems known to be involved in brain reward and reinforcement. Histidine decarboxylase knockout (HDC-KO) mice had higher dihydrophenylacetic acid concentrations and a higher dihydrophenylacetic acid/dopamine ratio in the neostriatum. In the ventral striatum, dihydrophenylacetic acid/dopamine and 3-methoxytyramine/dopamine ratios were higher in HDC-KO mice. Furthermore, the HDC-KO mice showed improved water-maze performance during both hidden and cued platform tasks, but deficient object discrimination based on temporal relationships. Our data imply that disruption of brain histamine synthesis can have both memory promoting and suppressive effects via distinct and independent mechanisms and further indicate that these opposed effects are related to the task-inherent reinforcement contingencies.

  16. Dorsal Striatal-Midbrain Connectivity in Humans Predicts How Reinforcements Are Used to Guide Decisions

    ERIC Educational Resources Information Center

    Kahnt, Thorsten; Park, Soyoung Q.; Cohen, Michael X.; Beck, Anne; Heinz, Andreas; Wrase, Jana

    2009-01-01

    It has been suggested that the target areas of dopaminergic midbrain neurons, the dorsal (DS) and ventral striatum (VS), are differently involved in reinforcement learning especially as actor and critic. Whereas the critic learns to predict rewards, the actor maintains action values to guide future decisions. The different midbrain connections to…

  17. Autonomous Inter-Task Transfer in Reinforcement Learning Domains

    DTIC Science & Technology

    2008-08-01

    Twentieth International Joint Conference on Artificial Intelli - gence, 2007. 304 Fumihide Tanaka and Masayuki Yamamura. Multitask reinforcement learning...Functions . . . . . . . . . . . . . . . . . . . . . . 17 2.2.3 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . 18 2.2.4 Instance-based...tures [Laird et al., 1986, Choi et al., 2007]. However, TL for RL tasks has only recently been gaining attention in the artificial intelligence

  18. A look at Behaviourism and Perceptual Control Theory in Interface Design

    DTIC Science & Technology

    1998-02-01

    behaviours such as response variability, instinctive drift, autoshaping , etc. Perceptual Control Theory (PCT) postulates that behaviours result from the...internal variables. Behaviourism, on the other hand, can not account for variability in responses, instinctive drift, autoshaping , etc. Researchers... Autoshaping . Animals appear to learn without reinforcement. However, conditioning theory speculates that learning results only when reinforcement

  19. Interactive Electronic Circuit Simulation on Small Computer Systems

    DTIC Science & Technology

    1979-11-01

    longer needed. Do not return it to the originator. UNCLASSIFIED SECURITY CLASSIFICATION OF THIS PAGE (Whan Dots Entered) REPORT DOCUMENTATION PAGE... CLASSIFICATION OF THIS PAGE(H7i»n Data Entend) Interactive-mode circuit simulation and batch-mode circuit simulation on minicomputers are compared...on the circuit Q. For circuits with Q less than 1, this ratio is typically 10:1. UNCLASSIFIED 2 SECURITY CLASSIFICATION OF THIS PAGEflWiim Data

  20. BEHAVIORAL MECHANISMS UNDERLYING NICOTINE REINFORCEMENT

    PubMed Central

    Rupprecht, Laura E.; Smith, Tracy T.; Schassburger, Rachel L.; Buffalari, Deanne M.; Sved, Alan F.; Donny, Eric C.

    2015-01-01

    Cigarette smoking is the leading cause of preventable deaths worldwide and nicotine, the primary psychoactive constituent in tobacco, drives sustained use. The behavioral actions of nicotine are complex and extend well beyond the actions of the drug as a primary reinforcer. Stimuli that are consistently paired with nicotine can, through associative learning, take on reinforcing properties as conditioned stimuli. These conditioned stimuli can then impact the rate and probability of behavior and even function as conditioning reinforcers that maintain behavior in the absence of nicotine. Nicotine can also act as a conditioned stimulus, predicting the delivery of other reinforcers, which may allow nicotine to acquire value as a conditioned reinforcer. These associative effects, establishing non-nicotine stimuli as conditioned stimuli with discriminative stimulus and conditioned reinforcing properties as well as establishing nicotine as a conditioned stimulus, are predicted by basic conditioning principles. However, nicotine can also act non-associatively. Nicotine directly enhances the reinforcing efficacy of other reinforcing stimuli in the environment, an effect that does not require a temporal or predictive relationship between nicotine and either the stimulus or the behavior. Hence, the reinforcing actions of nicotine stem both from the primary reinforcing actions of the drug (and the subsequent associative learning effects) as well as the reinforcement enhancement action of nicotine which is non-associative in nature. Gaining a better understanding of how nicotine impacts behavior will allow for maximally effective tobacco control efforts aimed at reducing the harm associated with tobacco use by reducing and/or treating its addictiveness. PMID:25638333

  1. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia.

    PubMed

    Markou, Athina; Salamone, John D; Bussey, Timothy J; Mar, Adam C; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-11-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu) meeting. A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. Copyright © 2013 Elsevier Ltd. All rights reserved.

  2. Goal-Directed and Habit-Like Modulations of Stimulus Processing during Reinforcement Learning.

    PubMed

    Luque, David; Beesley, Tom; Morris, Richard W; Jack, Bradley N; Griffiths, Oren; Whitford, Thomas J; Le Pelley, Mike E

    2017-03-15

    Recent research has shown that perceptual processing of stimuli previously associated with high-value rewards is automatically prioritized even when rewards are no longer available. It has been hypothesized that such reward-related modulation of stimulus salience is conceptually similar to an "attentional habit." Recording event-related potentials in humans during a reinforcement learning task, we show strong evidence in favor of this hypothesis. Resistance to outcome devaluation (the defining feature of a habit) was shown by the stimulus-locked P1 component, reflecting activity in the extrastriate visual cortex. Analysis at longer latencies revealed a positive component (corresponding to the P3b, from 550-700 ms) sensitive to outcome devaluation. Therefore, distinct spatiotemporal patterns of brain activity were observed corresponding to habitual and goal-directed processes. These results demonstrate that reinforcement learning engages both attentional habits and goal-directed processes in parallel. Consequences for brain and computational models of reinforcement learning are discussed. SIGNIFICANCE STATEMENT The human attentional network adapts to detect stimuli that predict important rewards. A recent hypothesis suggests that the visual cortex automatically prioritizes reward-related stimuli, driven by cached representations of reward value; that is, stimulus-response habits. Alternatively, the neural system may track the current value of the predicted outcome. Our results demonstrate for the first time that visual cortex activity is increased for reward-related stimuli even when the rewarding event is temporarily devalued. In contrast, longer-latency brain activity was specifically sensitive to transient changes in reward value. Therefore, we show that both habit-like attention and goal-directed processes occur in the same learning episode at different latencies. This result has important consequences for computational models of reinforcement learning. Copyright © 2017 the authors 0270-6474/17/373009-09$15.00/0.

  3. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia

    PubMed Central

    Markou, Athina; Salamone, John D.; Bussey, Timothy; Mar, Adam; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-01-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu). A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. PMID:23994273

  4. Feature Reinforcement Learning: Part I. Unstructured MDPs

    NASA Astrophysics Data System (ADS)

    Hutter, Marcus

    2009-12-01

    General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part II (Hutter, 2009c). The role of POMDPs is also considered there.

  5. The role of within-compound associations in learning about absent cues.

    PubMed

    Witnauer, James E; Miller, Ralph R

    2011-05-01

    When two cues are reinforced together (in compound), most associative models assume that animals learn an associative network that includes direct cue-outcome associations and a within-compound association. All models of associative learning subscribe to the importance of cue-outcome associations, but most models assume that within-compound associations are irrelevant to each cue's subsequent behavioral control. In the present article, we present an extension of Van Hamme and Wasserman's (Learning and Motivation 25:127-151, 1994) model of retrospective revaluation based on learning about absent cues that are retrieved through within-compound associations. The model was compared with a model lacking retrieval through within-compound associations. Simulations showed that within-compound associations are necessary for the model to explain higher-order retrospective revaluation and the observed greater retrospective revaluation after partial reinforcement than after continuous reinforcement alone. These simulations suggest that the associability of an absent stimulus is determined by the extent to which the stimulus is activated through the within-compound association.

  6. The role of within-compound associations in learning about absent cues

    PubMed Central

    Witnauer, James E.

    2011-01-01

    When two cues are reinforced together (in compound), most associative models assume that animals learn an associative network that includes direct cue–outcome associations and a within-compound association. All models of associative learning subscribe to the importance of cue–outcome associations, but most models assume that within-compound associations are irrelevant to each cue's subsequent behavioral control. In the present article, we present an extension of Van Hamme and Wasserman's (Learning and Motivation 25:127–151, 1994) model of retrospective revaluation based on learning about absent cues that are retrieved through within-compound associations. The model was compared with a model lacking retrieval through within-compound associations. Simulations showed that within-compound associations are necessary for the model to explain higher-order retrospective revaluation and the observed greater retrospective revaluation after partial reinforcement than after continuous reinforcement alone. These simulations suggest that the associability of an absent stimulus is determined by the extent to which the stimulus is activated through the within-compound association. PMID:21264569

  7. Assessing the effect of cognitive styles with different learning modes on learning outcome.

    PubMed

    Liao, Chechen; Chuang, Shu-Hui

    2007-08-01

    In this study, similarities and differences in learning outcome associated with individual differences in cognitive styles are examined using the traditional (face-to-face) and web-based learning modes. 140 undergraduate students were categorized as having analytic or holistic cognitive styles by their scores on the Style of Learning and Thinking questionnaire. Four different conditions were studies; students with analytic cognitive style in a traditional learning mode, analytic cognitive style in a web-based learning mode, holistic cognitive style in a traditional learning mode, and holistic cognitive style in a web-based learning mode. Analysis of the data show that analytic style in traditional mode lead to significantly higher performance and perceived satisfaction than in other conditions. Satisfaction did not differ significantly between students with analytic style in web-based learning and those with holistic style in traditional learning. This suggest that integrating different learning modes into the learning environment may be insufficient to improve learners' satisfaction.

  8. Pleasurable music affects reinforcement learning according to the listener

    PubMed Central

    Gold, Benjamin P.; Frank, Michael J.; Bogert, Brigitte; Brattico, Elvira

    2013-01-01

    Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875

  9. Network Supervision of Adult Experience and Learning Dependent Sensory Cortical Plasticity.

    PubMed

    Blake, David T

    2017-06-18

    The brain is capable of remodeling throughout life. The sensory cortices provide a useful preparation for studying neuroplasticity both during development and thereafter. In adulthood, sensory cortices change in the cortical area activated by behaviorally relevant stimuli, by the strength of response within that activated area, and by the temporal profiles of those responses. Evidence supports forms of unsupervised, reinforcement, and fully supervised network learning rules. Studies on experience-dependent plasticity have mostly not controlled for learning, and they find support for unsupervised learning mechanisms. Changes occur with greatest ease in neurons containing α-CamKII, which are pyramidal neurons in layers II/III and layers V/VI. These changes use synaptic mechanisms including long term depression. Synaptic strengthening at NMDA-containing synapses does occur, but its weak association with activity suggests other factors also initiate changes. Studies that control learning find support of reinforcement learning rules and limited evidence of other forms of supervised learning. Behaviorally associating a stimulus with reinforcement leads to a strengthening of cortical response strength and enlarging of response area with poor selectivity. Associating a stimulus with omission of reinforcement leads to a selective weakening of responses. In some preparations in which these associations are not as clearly made, neurons with the most informative discharges are relatively stronger after training. Studies analyzing the temporal profile of responses associated with omission of reward, or of plasticity in studies with different discriminanda but statistically matched stimuli, support the existence of limited supervised network learning. © 2017 American Physiological Society. Compr Physiol 7:977-1008, 2017. Copyright © 2017 John Wiley & Sons, Inc.

  10. Feedback from the heart: Emotional learning and memory is controlled by cardiac cycle, interoceptive accuracy and personality.

    PubMed

    Pfeifer, Gaby; Garfinkel, Sarah N; Gould van Praag, Cassandra D; Sahota, Kuljit; Betka, Sophie; Critchley, Hugo D

    2017-05-01

    Feedback processing is critical to trial-and-error learning. Here, we examined whether interoceptive signals concerning the state of cardiovascular arousal influence the processing of reinforcing feedback during the learning of 'emotional' face-name pairs, with subsequent effects on retrieval. Participants (N=29) engaged in a learning task of face-name pairs (fearful, neutral, happy faces). Correct and incorrect learning decisions were reinforced by auditory feedback, which was delivered either at cardiac systole (on the heartbeat, when baroreceptors signal the contraction of the heart to the brain), or at diastole (between heartbeats during baroreceptor quiescence). We discovered a cardiac influence on feedback processing that enhanced the learning of fearful faces in people with heightened interoceptive ability. Individuals with enhanced accuracy on a heartbeat counting task learned fearful face-name pairs better when feedback was given at systole than at diastole. This effect was not present for neutral and happy faces. At retrieval, we also observed related effects of personality: First, individuals scoring higher for extraversion showed poorer retrieval accuracy. These individuals additionally manifested lower resting heart rate and lower state anxiety, suggesting that attenuated levels of cardiovascular arousal in extraverts underlies poorer performance. Second, higher extraversion scores predicted higher emotional intensity ratings of fearful faces reinforced at systole. Third, individuals scoring higher for neuroticism showed higher retrieval confidence for fearful faces reinforced at diastole. Our results show that cardiac signals shape feedback processing to influence learning of fearful faces, an effect underpinned by personality differences linked to psychophysiological arousal. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. A reinforcement learning-based architecture for fuzzy logic control

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1992-01-01

    This paper introduces a new method for learning to refine a rule-based fuzzy logic controller. A reinforcement learning technique is used in conjunction with a multilayer neural network model of a fuzzy controller. The approximate reasoning based intelligent control (ARIC) architecture proposed here learns by updating its prediction of the physical system's behavior and fine tunes a control knowledge base. Its theory is related to Sutton's temporal difference (TD) method. Because ARIC has the advantage of using the control knowledge of an experienced operator and fine tuning it through the process of learning, it learns faster than systems that train networks from scratch. The approach is applied to a cart-pole balancing system.

  12. A cholinergic feedback circuit to regulate striatal population uncertainty and optimize reinforcement learning.

    PubMed

    Franklin, Nicholas T; Frank, Michael J

    2015-12-25

    Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments.

  13. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning.

    PubMed

    Pilarski, Patrick M; Dawson, Michael R; Degris, Thomas; Fahimi, Farbod; Carey, Jason P; Sutton, Richard S

    2011-01-01

    As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis. © 2011 IEEE

  14. Testing of containers made of glass-fiber reinforced plastic with the aid of acoustic emission analysis

    NASA Technical Reports Server (NTRS)

    Wolitz, K.; Brockmann, W.; Fischer, T.

    1979-01-01

    Acoustic emission analysis as a quasi-nondestructive test method makes it possible to differentiate clearly, in judging the total behavior of fiber-reinforced plastic composites, between critical failure modes (in the case of unidirectional composites fiber fractures) and non-critical failure modes (delamination processes or matrix fractures). A particular advantage is that, for varying pressure demands on the composites, the emitted acoustic pulses can be analyzed with regard to their amplitude distribution. In addition, definite indications as to how the damages occurred can be obtained from the time curves of the emitted acoustic pulses as well as from the particular frequency spectrum. Distinct analogies can be drawn between the various analytical methods with respect to whether the failure modes can be classified as critical or non-critical.

  15. Fatigue damage accumulation in various metal matrix composites

    NASA Technical Reports Server (NTRS)

    Johnson, W. S.

    1987-01-01

    The purpose of this paper is to review some of the latest understanding of the fatigue behavior of continuous fiber reinforced metal matrix composites. The emphasis is on the development of an understanding of different fatigue damage mechanisms and why and how they occur. The fatigue failure modes in continuous fiber reinforced metal matrix composites are controlled by the three constituents of the system: fiber, matrix, and fiber/matrix interface. The relative strains to fatigue failure of the fiber and matrix will determine the failure mode. Several examples of matrix, fiber, and self-similar damage growth dominated fatigue damage are given for several metal matrix composite systems. Composite analysis, failure modes, and damage modeling are discussed. Boron/aluminum, silicon-carbide/aluminum, FP/aluminum, and borsic/titanium metal matrix composites are discussed.

  16. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning.

    PubMed

    Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane

    2017-01-01

    Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning.

  17. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

    PubMed Central

    Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane

    2017-01-01

    Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning. PMID:29163004

  18. Soft sensor modeling based on variable partition ensemble method for nonlinear batch processes

    NASA Astrophysics Data System (ADS)

    Wang, Li; Chen, Xiangguang; Yang, Kai; Jin, Huaiping

    2017-01-01

    Batch processes are always characterized by nonlinear and system uncertain properties, therefore, the conventional single model may be ill-suited. A local learning strategy soft sensor based on variable partition ensemble method is developed for the quality prediction of nonlinear and non-Gaussian batch processes. A set of input variable sets are obtained by bootstrapping and PMI criterion. Then, multiple local GPR models are developed based on each local input variable set. When a new test data is coming, the posterior probability of each best performance local model is estimated based on Bayesian inference and used to combine these local GPR models to get the final prediction result. The proposed soft sensor is demonstrated by applying to an industrial fed-batch chlortetracycline fermentation process.

  19. Application of fuzzy logic-neural network based reinforcement learning to proximity and docking operations: Translational controller results

    NASA Technical Reports Server (NTRS)

    Jani, Yashvant

    1992-01-01

    The reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Maximum Mission (SMM) satellite simulation. In utilizing these fuzzy learning techniques, we also use the Approximate Reasoning based Intelligent Control (ARIC) architecture, and so we use two terms interchangeable to imply the same. This activity is carried out in the Software Technology Laboratory utilizing the Orbital Operations Simulator (OOS). This report is the deliverable D3 in our project activity and provides the test results of the fuzzy learning translational controller. This report is organized in six sections. Based on our experience and analysis with the attitude controller, we have modified the basic configuration of the reinforcement learning algorithm in ARIC as described in section 2. The shuttle translational controller and its implementation in fuzzy learning architecture is described in section 3. Two test cases that we have performed are described in section 4. Our results and conclusions are discussed in section 5, and section 6 provides future plans and summary for the project.

  20. Learning and tuning fuzzy logic controllers through reinforcements

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap

    1992-01-01

    This paper presents a new method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system. In particular, our generalized approximate reasoning-based intelligent control (GARIC) architecture (1) learns and tunes a fuzzy logic controller even when only weak reinforcement, such as a binary failure signal, is available; (2) introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; (3) introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and (4) learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward neural network, which can then adaptively improve performance by using gradient descent methods. We extend the AHC algorithm of Barto et al. (1983) to include the prior control knowledge of human operators. The GARIC architecture is applied to a cart-pole balancing system and demonstrates significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.

  1. Bioreactors for high cell density and continuous multi-stage cultivations: options for process intensification in cell culture-based viral vaccine production.

    PubMed

    Tapia, Felipe; Vázquez-Ramírez, Daniel; Genzel, Yvonne; Reichl, Udo

    2016-03-01

    With an increasing demand for efficacious, safe, and affordable vaccines for human and animal use, process intensification in cell culture-based viral vaccine production demands advanced process strategies to overcome the limitations of conventional batch cultivations. However, the use of fed-batch, perfusion, or continuous modes to drive processes at high cell density (HCD) and overextended operating times has so far been little explored in large-scale viral vaccine manufacturing. Also, possible reductions in cell-specific virus yields for HCD cultivations have been reported frequently. Taking into account that vaccine production is one of the most heavily regulated industries in the pharmaceutical sector with tough margins to meet, it is understandable that process intensification is being considered by both academia and industry as a next step toward more efficient viral vaccine production processes only recently. Compared to conventional batch processes, fed-batch and perfusion strategies could result in ten to a hundred times higher product yields. Both cultivation strategies can be implemented to achieve cell concentrations exceeding 10(7) cells/mL or even 10(8) cells/mL, while keeping low levels of metabolites that potentially inhibit cell growth and virus replication. The trend towards HCD processes is supported by development of GMP-compliant cultivation platforms, i.e., acoustic settlers, hollow fiber bioreactors, and hollow fiber-based perfusion systems including tangential flow filtration (TFF) or alternating tangential flow (ATF) technologies. In this review, these process modes are discussed in detail and compared with conventional batch processes based on productivity indicators such as space-time yield, cell concentration, and product titers. In addition, options for the production of viral vaccines in continuous multi-stage bioreactors such as two- and three-stage systems are addressed. While such systems have shown similar virus titers compared to batch cultivations, keeping high yields for extended production times is still a challenge. Overall, we demonstrate that process intensification of cell culture-based viral vaccine production can be realized by the consequent application of fed-batch, perfusion, and continuous systems with a significant increase in productivity. The potential for even further improvements is high, considering recent developments in establishment of new (designer) cell lines, better characterization of host cell metabolism, advances in media design, and the use of mathematical models as a tool for process optimization and control.

  2. Learning the specific quality of taste reinforcement in larval Drosophila

    PubMed Central

    Schleyer, Michael; Miura, Daisuke; Tanimura, Teiichi; Gerber, Bertram

    2015-01-01

    The only property of reinforcement insects are commonly thought to learn about is its value. We show that larval Drosophila not only remember the value of reinforcement (How much?), but also its quality (What?). This is demonstrated both within the appetitive domain by using sugar vs amino acid as different reward qualities, and within the aversive domain by using bitter vs high-concentration salt as different qualities of punishment. From the available literature, such nuanced memories for the quality of reinforcement are unexpected and pose a challenge to present models of how insect memory is organized. Given that animals as simple as larval Drosophila, endowed with but 10,000 neurons, operate with both reinforcement value and quality, we suggest that both are fundamental aspects of mnemonic processing—in any brain. DOI: http://dx.doi.org/10.7554/eLife.04711.001 PMID:25622533

  3. Evidence for a neural law of effect.

    PubMed

    Athalye, Vivek R; Santos, Fernando J; Carmena, Jose M; Costa, Rui M

    2018-03-02

    Thorndike's law of effect states that actions that lead to reinforcements tend to be repeated more often. Accordingly, neural activity patterns leading to reinforcement are also reentered more frequently. Reinforcement relies on dopaminergic activity in the ventral tegmental area (VTA), and animals shape their behavior to receive dopaminergic stimulation. Seeking evidence for a neural law of effect, we found that mice learn to reenter more frequently motor cortical activity patterns that trigger optogenetic VTA self-stimulation. Learning was accompanied by gradual shaping of these patterns, with participating neurons progressively increasing and aligning their covariance to that of the target pattern. Motor cortex patterns that lead to phasic dopaminergic VTA activity are progressively reinforced and shaped, suggesting a mechanism by which animals select and shape actions to reliably achieve reinforcement. Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.

  4. Active-learning strategies: the use of a game to reinforce learning in nursing education. A case study.

    PubMed

    Boctor, Lisa

    2013-03-01

    The majority of nursing students are kinesthetic learners, preferring a hands-on, active approach to education. Research shows that active-learning strategies can increase student learning and satisfaction. This study looks at the use of one active-learning strategy, a Jeopardy-style game, 'Nursopardy', to reinforce Fundamentals of Nursing material, aiding in students' preparation for a standardized final exam. The game was created keeping students varied learning styles and the NCLEX blueprint in mind. The blueprint was used to create 5 categories, with 26 total questions. Student survey results, using a five-point Likert scale showed that they did find this learning method enjoyable and beneficial to learning. More research is recommended regarding learning outcomes, when using active-learning strategies, such as games. Copyright © 2012 Elsevier Ltd. All rights reserved.

  5. The "proactive" model of learning: Integrative framework for model-free and model-based reinforcement learning utilizing the associative learning-based proactive brain concept.

    PubMed

    Zsuga, Judit; Biro, Klara; Papp, Csaba; Tajti, Gabor; Gesztelyi, Rudolf

    2016-02-01

    Reinforcement learning (RL) is a powerful concept underlying forms of associative learning governed by the use of a scalar reward signal, with learning taking place if expectations are violated. RL may be assessed using model-based and model-free approaches. Model-based reinforcement learning involves the amygdala, the hippocampus, and the orbitofrontal cortex (OFC). The model-free system involves the pedunculopontine-tegmental nucleus (PPTgN), the ventral tegmental area (VTA) and the ventral striatum (VS). Based on the functional connectivity of VS, model-free and model based RL systems center on the VS that by integrating model-free signals (received as reward prediction error) and model-based reward related input computes value. Using the concept of reinforcement learning agent we propose that the VS serves as the value function component of the RL agent. Regarding the model utilized for model-based computations we turned to the proactive brain concept, which offers an ubiquitous function for the default network based on its great functional overlap with contextual associative areas. Hence, by means of the default network the brain continuously organizes its environment into context frames enabling the formulation of analogy-based association that are turned into predictions of what to expect. The OFC integrates reward-related information into context frames upon computing reward expectation by compiling stimulus-reward and context-reward information offered by the amygdala and hippocampus, respectively. Furthermore we suggest that the integration of model-based expectations regarding reward into the value signal is further supported by the efferent of the OFC that reach structures canonical for model-free learning (e.g., the PPTgN, VTA, and VS). (c) 2016 APA, all rights reserved).

  6. Catalytic wet oxidation of phenol in a trickle bed reactor over a Pt/TiO2 catalyst.

    PubMed

    Maugans, Clayton B; Akgerman, Aydin

    2003-01-01

    Catalytic wet oxidation of phenol was studied in a batch and a trickle bed reactor using 4.45% Pt/TiO2 catalyst in the temperature range 150-205 degrees C. Kinetic data were obtained from batch reactor studies and used to model the reaction kinetics for phenol disappearance and for total organic carbon disappearance. Trickle bed experiments were then performed to generate data from a heterogeneous flow reactor. Catalyst deactivation was observed in the trickle bed reactor, although the exact cause was not determined. Deactivation was observed to linearly increase with the cumulative amount of phenol that had passed over the catalyst bed. Trickle bed reactor modeling was performed using a three-phase heterogeneous model. Model parameters were determined from literature correlations, batch derived kinetic data, and trickle bed derived catalyst deactivation data. The model equations were solved using orthogonal collocations on finite elements. Trickle bed performance was successfully predicted using the batch derived kinetic model and the three-phase reactor model. Thus, using the kinetics determined from limited data in the batch mode, it is possible to predict continuous flow multiphase reactor performance.

  7. Implementation of a repeated fed-batch process for the production of chitin-glucan complex by Komagataella pastoris.

    PubMed

    Farinha, Inês; Freitas, Filomena; Reis, Maria A M

    2017-07-25

    The yeast Komagataella pastoris was cultivated under different fed-batch strategies for the production of chitin-glucan complex (CGC), a co-polymer of chitin and β-glucan. The tested fed-batch strategies included DO-stat mode, predefined feeding profile and repeated fed-batch operation. Although high cell dry mass and high CGC production were obtained under the tested DO-stat strategy in a 94h cultivation (159 and 29g/L, respectively), the overall biomass and CGC productivities were low (41 and 7.4g/Lday, respectively). Cultivation with a predefined profile significantly improved both biomass and CGC volumetric productivity (87 and 10.8g/Lday, respectively). Hence, this strategy was used to implement a repeated fed-batch process comprising 7 consecutive cycles. A daily production of 119-126g/L of biomass with a CGC content of 11-16wt% was obtained, thus proving this cultivation strategy is adequate to reach a high CGC productivity that ranged between 11 and 18g/Lday. The process was stable and reproducible in terms of CGC productivity and polymer composition, making it a promising strategy for further process development. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Field Testing of Rapid Electrokinetic Nanoparticle Treatment for Corrosion Control of Steel in Concrete

    NASA Technical Reports Server (NTRS)

    Cardenas, Henry E.; Alexander, Joshua B.; Kupwade-Patil,Kunal; Calle, Luz Marina

    2009-01-01

    This work field tested the use of electrokinetics for delivery of concrete sealing nanoparticles concurrent with the extraction of chlorides. Several cylinders of concrete were batched and placed in immersion at the Kennedy Space Center Beach Corrosion Test Site. The specimens were batched with steel reinforcement and a 4.5 wt.% (weight percent) content of sodium chloride. Upon arrival at Kennedy Space Center, the specimens were placed in the saltwater immersion pool at the Beach Corrosion Test Site. Following 30 days of saltwater exposure, the specimens were subjected to rapid chloride extraction concurrent with electrokinetic nanoparticle treatment. The treatments were operated at up to eight times the typical current density in order to complete the treatment in 7 days. The findings indicated that the short-term corrosion resistance of the concrete specimens was significantly enhanced as was the strength of the concrete.

  9. Incorporating Dispositional Traits into the Treatment of Anorexia Nervosa

    PubMed Central

    Herzog, David; Moskovich, Ashley; Merwin, Rhonda; Lin, Tammy

    2014-01-01

    We provide a general framework to guide the development of interventions that aim to address persistent features in eating disorders that may preclude effective treatment. Using perfectionism as an exemplar, we draw from research in cognitive neuroscience regarding attention and reinforcement learning, from learning theory and social psychology regarding vicarious learning and implications for the role modeling of significant others, and from clinical psychology on the importance of verbal narratives as barriers that may influence expectations and shape reinforcement schedules. PMID:21243482

  10. Hybrid learning in signalling games

    NASA Astrophysics Data System (ADS)

    Barrett, Jeffrey A.; Cochran, Calvin T.; Huttegger, Simon; Fujiwara, Naoki

    2017-09-01

    Lewis-Skyrms signalling games have been studied under a variety of low-rationality learning dynamics. Reinforcement dynamics are stable but slow and prone to evolving suboptimal signalling conventions. A low-inertia trial-and-error dynamical like win-stay/lose-randomise is fast and reliable at finding perfect signalling conventions but unstable in the context of noise or agent error. Here we consider a low-rationality hybrid of reinforcement and win-stay/lose-randomise learning that exhibits the virtues of both. This hybrid dynamics is reliable, stable and exceptionally fast.

  11. D-lactic acid production by Sporolactobacillus inulinus Y2-8 immobilized in fibrous bed bioreactor using corn flour hydrolyzate.

    PubMed

    Zhao, Ting; Liu, Dong; Ren, Hengfei; Shi, Xinchi; Zhao, Nan; Chen, Yong; Ying, Hanjie

    2014-12-28

    In this study, a fibrous bed bioreactor (FBB) was used for D-lactic acid (D-LA) production by Sporolactobacillus inulinus Y2-8. Corn flour hydrolyzed with α-amylase and saccharifying enzyme was used as a cost-efficient and nutrient-rich substrate for D-LA production. A maximal starch conversion rate of 93.78% was obtained. The optimum pH for D-LA production was determined to be 6.5. Ammonia water was determined to be an ideal neutralizing agent, which improved the D-LA production and purification processes. Batch fermentation and fedbatch fermentation, with both free cells and immobilized cells, were compared to highlight the advantages of FBB fermentation. In batch mode, the D-LA production rate of FBB fermentation was 1.62 g/l/h, which was 37.29% higher than that of free-cell fermentation, and the D-LA optical purities of the two fermentation methods were above 99.00%. In fed-batch mode, the maximum D-LA concentration attained by FBB fermentation was 218.8 g/l, which was 37.67% higher than that of free-cell fermentation. Repeated-batch fermentation was performed to determine the long-term performance of the FBB system, and the data indicated that the average D-LA production rate was 1.62 g/l/h and the average yield was 0.98 g/g. Thus, hydrolyzed corn flour fermented by S. inulinus Y2-8 in a FBB may be used for improving D-LA fermentation by using ammonia water as the neutralizing agent.

  12. Discrete Serotonin Systems Mediate Memory Enhancement and Escape Latencies after Unpredicted Aversive Experience in Drosophila Place Memory

    PubMed Central

    Sitaraman, Divya; Kramer, Elizabeth F.; Kahsai, Lily; Ostrowski, Daniela; Zars, Troy

    2017-01-01

    Feedback mechanisms in operant learning are critical for animals to increase reward or reduce punishment. However, not all conditions have a behavior that can readily resolve an event. Animals must then try out different behaviors to better their situation through outcome learning. This form of learning allows for novel solutions and with positive experience can lead to unexpected behavioral routines. Learned helplessness, as a type of outcome learning, manifests in part as increases in escape latency in the face of repeated unpredicted shocks. Little is known about the mechanisms of outcome learning. When fruit fly Drosophila melanogaster are exposed to unpredicted high temperatures in a place learning paradigm, flies both increase escape latencies and have a higher memory when given control of a place/temperature contingency. Here we describe discrete serotonin neuronal circuits that mediate aversive reinforcement, escape latencies, and memory levels after place learning in the presence and absence of unexpected aversive events. The results show that two features of learned helplessness depend on the same modulatory system as aversive reinforcement. Moreover, changes in aversive reinforcement and escape latency depend on local neural circuit modulation, while memory enhancement requires larger modulation of multiple behavioral control circuits. PMID:29321732

  13. Enhanced production of GDP-L-fucose by overexpression of NADPH regenerator in recombinant Escherichia coli.

    PubMed

    Lee, Won-Heong; Chin, Young-Wook; Han, Nam Soo; Kim, Myoung-Dong; Seo, Jin-Ho

    2011-08-01

    Biosynthesis of guanosine 5'-diphosphate-L-fucose (GDP-L-fucose) requires NADPH as a reducing cofactor. In this study, endogenous NADPH regenerating enzymes such as glucose-6-phosphate dehydrogenase (G6PDH), isocitrate dehydrogenase (Icd), and NADP(+)-dependent malate dehydrogenase (MaeB) were overexpressed to increase GDP-L-fucose production in recombinant Escherichia coli. The effects of overexpression of each NADPH regenerating enzyme on GDP-L-fucose production were investigated in a series of batch and fed-batch fermentations. Batch fermentations showed that overexpression of G6PDH was the most effective for GDP-L-fucose production. However, GDP-L-fucose production was not enhanced by overexpression of G6PDH in the glucose-limited fed-batch fermentation. Hence, a glucose feeding strategy was optimized to enhance GDP-L-fucose production. Fed-batch fermentation with a pH-stat feeding mode for sufficient supply of glucose significantly enhanced GDP-L-fucose production compared with glucose-limited fed-batch fermentation. A maximum GDP-L-fucose concentration of 235.2 ± 3.3 mg l(-1), corresponding to a 21% enhancement in the GDP-L-fucose production compared with the control strain overexpressing GDP-L-fucose biosynthetic enzymes only, was achieved in the pH-stat fed-batch fermentation of the recombinant E. coli overexpressing G6PDH. It was concluded that sufficient glucose supply and efficient NADPH regeneration are crucial for NADPH-dependent GDP-L-fucose production in recombinant E. coli.

  14. Tonic or Phasic Stimulation of Dopaminergic Projections to Prefrontal Cortex Causes Mice to Maintain or Deviate from Previously Learned Behavioral Strategies

    PubMed Central

    Ellwood, Ian T.; Patel, Tosha; Wadia, Varun; Lee, Anthony T.; Liptak, Alayna T.

    2017-01-01

    Dopamine neurons in the ventral tegmental area (VTA) encode reward prediction errors and can drive reinforcement learning through their projections to striatum, but much less is known about their projections to prefrontal cortex (PFC). Here, we studied these projections and observed phasic VTA–PFC fiber photometry signals after the delivery of rewards. Next, we studied how optogenetic stimulation of these projections affects behavior using conditioned place preference and a task in which mice learn associations between cues and food rewards and then use those associations to make choices. Neither phasic nor tonic stimulation of dopaminergic VTA–PFC projections elicited place preference. Furthermore, substituting phasic VTA–PFC stimulation for food rewards was not sufficient to reinforce new cue–reward associations nor maintain previously learned ones. However, the same patterns of stimulation that failed to reinforce place preference or cue–reward associations were able to modify behavior in other ways. First, continuous tonic stimulation maintained previously learned cue–reward associations even after they ceased being valid. Second, delivering phasic stimulation either continuously or after choices not previously associated with reward induced mice to make choices that deviated from previously learned associations. In summary, despite the fact that dopaminergic VTA–PFC projections exhibit phasic increases in activity that are time locked to the delivery of rewards, phasic activation of these projections does not necessarily reinforce specific actions. Rather, dopaminergic VTA–PFC activity can control whether mice maintain or deviate from previously learned cue–reward associations. SIGNIFICANCE STATEMENT Dopaminergic inputs from ventral tegmental area (VTA) to striatum encode reward prediction errors and reinforce specific actions; however, it is currently unknown whether dopaminergic inputs to prefrontal cortex (PFC) play similar or distinct roles. Here, we used bulk Ca2+ imaging to show that unexpected rewards or reward-predicting cues elicit phasic increases in the activity of dopaminergic VTA–PFC fibers. However, in multiple behavioral paradigms, we failed to observe reinforcing effects after stimulation of these fibers. In these same experiments, we did find that tonic or phasic patterns of stimulation caused mice to maintain or deviate from previously learned cue–reward associations, respectively. Therefore, although they may exhibit similar patterns of activity, dopaminergic inputs to striatum and PFC can elicit divergent behavioral effects. PMID:28739583

  15. The Effect of a Token Reinforcement Program on the Reading Comprehension of a Learning Disabled Student.

    ERIC Educational Resources Information Center

    Galbreath, Joy; Feldman, David

    The relationship of reading comprehension accuracy and a contingently administered token reinforcement program used with an elementary level learning disabled student in the classroom was examined. The S earned points for each correct answer made after oral reading sessions. At the conclusion of the class he could exchange his points for rewards.…

  16. Cocaine addiction as a homeostatic reinforcement learning disorder.

    PubMed

    Keramati, Mehdi; Durand, Audrey; Girardeau, Paul; Gutkin, Boris; Ahmed, Serge H

    2017-03-01

    Drug addiction implicates both reward learning and homeostatic regulation mechanisms of the brain. This has stimulated 2 partially successful theoretical perspectives on addiction. Many important aspects of addiction, however, remain to be explained within a single, unified framework that integrates the 2 mechanisms. Building upon a recently developed homeostatic reinforcement learning theory, the authors focus on a key transition stage of addiction that is well modeled in animals, escalation of drug use, and propose a computational theory of cocaine addiction where cocaine reinforces behavior due to its rapid homeostatic corrective effect, whereas its chronic use induces slow and long-lasting changes in homeostatic setpoint. Simulations show that our new theory accounts for key behavioral and neurobiological features of addiction, most notably, escalation of cocaine use, drug-primed craving and relapse, individual differences underlying dose-response curves, and dopamine D2-receptor downregulation in addicts. The theory also generates unique predictions about cocaine self-administration behavior in rats that are confirmed by new experimental results. Viewing addiction as a homeostatic reinforcement learning disorder coherently explains many behavioral and neurobiological aspects of the transition to cocaine addiction, and suggests a new perspective toward understanding addiction. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  17. Challenges in the Verification of Reinforcement Learning Algorithms

    NASA Technical Reports Server (NTRS)

    Van Wesel, Perry; Goodloe, Alwyn E.

    2017-01-01

    Machine learning (ML) is increasingly being applied to a wide array of domains from search engines to autonomous vehicles. These algorithms, however, are notoriously complex and hard to verify. This work looks at the assumptions underlying machine learning algorithms as well as some of the challenges in trying to verify ML algorithms. Furthermore, we focus on the specific challenges of verifying reinforcement learning algorithms. These are highlighted using a specific example. Ultimately, we do not offer a solution to the complex problem of ML verification, but point out possible approaches for verification and interesting research opportunities.

  18. Refining fuzzy logic controllers with machine learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1994-01-01

    In this paper, we describe the GARIC (Generalized Approximate Reasoning-Based Intelligent Control) architecture, which learns from its past performance and modifies the labels in the fuzzy rules to improve performance. It uses fuzzy reinforcement learning which is a hybrid method of fuzzy logic and reinforcement learning. This technology can simplify and automate the application of fuzzy logic control to a variety of systems. GARIC has been applied in simulation studies of the Space Shuttle rendezvous and docking experiments. It has the potential of being applied in other aerospace systems as well as in consumer products such as appliances, cameras, and cars.

  19. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning.

    PubMed

    Feng, Yuntian; Zhang, Hongjun; Hao, Wenning; Chen, Gang

    2017-01-01

    We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q -Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.

  20. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning

    PubMed Central

    Zhang, Hongjun; Chen, Gang

    2017-01-01

    We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q-Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score. PMID:28894463

  1. Investigation of a Reinforcement-Based Toilet Training Procedure for Children with Autism.

    ERIC Educational Resources Information Center

    Cicero, Frank R.; Pfadt, Al

    2002-01-01

    This study evaluated the effectiveness of a reinforcement-based toilet training intervention with three children with autism. Procedures included positive reinforcement, graduated guidance, scheduled practice trials, and forward prompting. All three children reduced urination accidents to zero and learned to request bathroom use spontaneously…

  2. Sex Differences in Reinforcement and Punishment on Prime-Time Television.

    ERIC Educational Resources Information Center

    Downs, A. Chris; Gowan, Darryl C.

    1980-01-01

    Television programs were analyzed for frequencies of positive reinforcement and punishment exchanged among performers varying in age and sex. Females were found to more often exhibit and receive reinforcement, whereas males more often exhibited and received punishment. These findings have implications for children's learning of positive and…

  3. The use of an active learning approach in a SCALE-UP learning space improves academic performance in undergraduate General Biology.

    PubMed

    Hacisalihoglu, Gokhan; Stephens, Desmond; Johnson, Lewis; Edington, Maurice

    2018-01-01

    Active learning is a pedagogical approach that involves students engaging in collaborative learning, which enables them to take more responsibility for their learning and improve their critical thinking skills. While prior research examined student performance at majority universities, this study focuses on specifically Historically Black Colleges and Universities (HBCUs) for the first time. Here we present work that focuses on the impact of active learning interventions at Florida A&M University, where we measured the impact of active learning strategies coupled with a SCALE-UP (Student Centered Active Learning Environment with Upside-down Pedagogies) learning environment on student success in General Biology. In biology sections where active learning techniques were employed, students watched online videos and completed specific activities before class covering information previously presented in a traditional lecture format. In-class activities were then carefully planned to reinforce critical concepts and enhance critical thinking skills through active learning techniques such as the one-minute paper, think-pair-share, and the utilization of clickers. Students in the active learning and control groups covered the same topics, took the same summative examinations and completed identical homework sets. In addition, the same instructor taught all of the sections included in this study. Testing demonstrated that these interventions increased learning gains by as much as 16%, and students reported an increase in their positive perceptions of active learning and biology. Overall, our results suggest that active learning approaches coupled with the SCALE-UP environment may provide an added opportunity for student success when compared with the standard modes of instruction in General Biology.

  4. Separation of Time-Based and Trial-Based Accounts of the Partial Reinforcement Extinction Effect

    PubMed Central

    Bouton, Mark E.; Woods, Amanda M.; Todd, Travis P.

    2013-01-01

    Two appetitive conditioning experiments with rats examined time-based and trial-based accounts of the partial reinforcement extinction effect (PREE). In the PREE, the loss of responding that occurs in extinction is slower when the conditioned stimulus (CS) has been paired with a reinforcer on some of its presentations (partially reinforced) instead of every presentation (continuously reinforced). According to a time-based or “time-accumulation” view (e.g., Gallistel & Gibbon, 2000), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger amount of time has accumulated in the CS over trials. In contrast, according to a trial-based view (e.g., Capaldi, 1967), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger number of CS presentations. Experiment 1 used a procedure that equated partially- and continuously-reinforced groups on their expected times to reinforcement during conditioning. A PREE was still observed. Experiment 2 then used an extinction procedure that allowed time in the CS and the number of trials to accumulate differentially through extinction. The PREE was still evident when responding was examined as a function of expected time units to the reinforcer, but was eliminated when responding was examined as a function of expected trial units to the reinforcer. There was no evidence that the animal responded according to the ratio of time accumulated during the CS in extinction over the time in the CS expected before the reinforcer. The results thus favor a trial-based account over a time-based account of extinction and the PREE. PMID:23962669

  5. Oscillations in the reduction of permanganate by hydrogen peroxide or by ninhydrin in a batch reactor and mixed-mode oscillations in a continuous-flow stirred tank reactor

    NASA Astrophysics Data System (ADS)

    Tóthová, Mária; Nagy, Arpád; Treindl, Ľudovít.

    1999-01-01

    The periodical reduction of permanganate by hydrogen peroxide or by ninhydrin with transient oscillations in a closed system has been observed and discussed in relation to the first two permanganate oscillators described earlier. The mixed-mode oscillations of the permanganate-H 2O 2 oscillating system in a continuous-flow stirred tank reactor have been described.

  6. Assessment of the mechanical properties of sisal fiber-reinforced silty clay using triaxial shear tests.

    PubMed

    Wu, Yankai; Li, Yanbin; Niu, Bin

    2014-01-01

    Fiber reinforcement is widely used in construction engineering to improve the mechanical properties of soil because it increases the soil's strength and improves the soil's mechanical properties. However, the mechanical properties of fiber-reinforced soils remain controversial. The present study investigated the mechanical properties of silty clay reinforced with discrete, randomly distributed sisal fibers using triaxial shear tests. The sisal fibers were cut to different lengths, randomly mixed with silty clay in varying percentages, and compacted to the maximum dry density at the optimum moisture content. The results indicate that with a fiber length of 10 mm and content of 1.0%, sisal fiber-reinforced silty clay is 20% stronger than nonreinforced silty clay. The fiber-reinforced silty clay exhibited crack fracture and surface shear fracture failure modes, implying that sisal fiber is a good earth reinforcement material with potential applications in civil engineering, dam foundation, roadbed engineering, and ground treatment.

  7. Novel reinforcement learning paradigm based on response patterning under interval schedules of reinforcement.

    PubMed

    Schifani, Christin; Sukhanov, Ilya; Dorofeikova, Mariia; Bespalov, Anton

    2017-07-28

    There is a need to develop cognitive tasks that address valid neuropsychological constructs implicated in disease mechanisms and can be used in animals and humans to guide novel drug discovery. Present experiments aimed to characterize a novel reinforcement learning task based on a classical operant behavioral phenomenon observed in multiple species - differences in response patterning under variable (VI) vs fixed interval (FI) schedules of reinforcement. Wistar rats were trained to press a lever for food under VI30s and later weekly test sessions were introduced with reinforcement schedule switched to FI30s. During the FI30s test session, post-reinforcement pauses (PRPs) gradually grew towards the end of the session reaching 22-43% of the initial values. Animals could be retrained under VI30s conditions, and FI30s test sessions were repeated over a period of several months without appreciable signs of a practice effect. Administration of the non-competitive N-methyl-d-aspartate (NMDA) receptor antagonist MK-801 ((5S,10R)-(+)-5-Methyl-10,11-dihydro-5H-dibenzo[a,d]cyclohepten-5,10-imine maleate) prior to FI30s sessions prevented adjustment of PRPs associated with the change from VI to FI schedule. This effect was most pronounced at the highest tested dose of MK-801 and appeared to be independent of the effects of this dose on response rates. These results provide initial evidence for the possibility to use different response patterning under VI and FI schedules with equivalent reinforcement density for studying effects of drug treatment on reinforcement learning. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Preliminary Work for Examining the Scalability of Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Clouse, Jeff

    1998-01-01

    Researchers began studying automated agents that learn to perform multiple-step tasks early in the history of artificial intelligence (Samuel, 1963; Samuel, 1967; Waterman, 1970; Fikes, Hart & Nilsonn, 1972). Multiple-step tasks are tasks that can only be solved via a sequence of decisions, such as control problems, robotics problems, classic problem-solving, and game-playing. The objective of agents attempting to learn such tasks is to use the resources they have available in order to become more proficient at the tasks. In particular, each agent attempts to develop a good policy, a mapping from states to actions, that allows it to select actions that optimize a measure of its performance on the task; for example, reducing the number of steps necessary to complete the task successfully. Our study focuses on reinforcement learning, a set of learning techniques where the learner performs trial-and-error experiments in the task and adapts its policy based on the outcome of those experiments. Much of the work in reinforcement learning has focused on a particular, simple representation, where every problem state is represented explicitly in a table, and associated with each state are the actions that can be chosen in that state. A major advantage of this table lookup representation is that one can prove that certain reinforcement learning techniques will develop an optimal policy for the current task. The drawback is that the representation limits the application of reinforcement learning to multiple-step tasks with relatively small state-spaces. There has been a little theoretical work that proves that convergence to optimal solutions can be obtained when using generalization structures, but the structures are quite simple. The theory says little about complex structures, such as multi-layer, feedforward artificial neural networks (Rumelhart & McClelland, 1986), but empirical results indicate that the use of reinforcement learning with such structures is promising. These empirical results make no theoretical claims, nor compare the policies produced to optimal policies. A goal of our work is to be able to make the comparison between an optimal policy and one stored in an artificial neural network. A difficulty of performing such a study is finding a multiple-step task that is small enough that one can find an optimal policy using table lookup, yet large enough that, for practical purposes, an artificial neural network is really required. We have identified a limited form of the game OTHELLO as satisfying these requirements. The work we report here is in the very preliminary stages of research, but this paper provides background for the problem being studied and a description of our initial approach to examining the problem. In the remainder of this paper, we first describe reinforcement learning in more detail. Next, we present the game OTHELLO. Finally we argue that a restricted form of the game meets the requirements of our study, and describe our preliminary approach to finding an optimal solution to the problem.

  9. Brain Research: Implications for Learning.

    ERIC Educational Resources Information Center

    Soares, Louise M.; Soares, Anthony T.

    Brain research has illuminated several areas of the learning process: (1) learning as association; (2) learning as reinforcement; (3) learning as perception; (4) learning as imitation; (5) learning as organization; (6) learning as individual style; and (7) learning as brain activity. The classic conditioning model developed by Pavlov advanced…

  10. Deep imitation learning for 3D navigation tasks.

    PubMed

    Hussein, Ahmed; Elyan, Eyad; Gaber, Mohamed Medhat; Jayne, Chrisina

    2018-01-01

    Deep learning techniques have shown success in learning from raw high-dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in imitation learning has been scarcely explored. Imitation learning can be an efficient method to teach intelligent agents by providing a set of demonstrations to learn from. However, generalizing to situations that are not represented in the demonstrations can be challenging, especially in 3D environments. In this paper, we propose a deep imitation learning method to learn navigation tasks from demonstrations in a 3D environment. The supervised policy is refined using active learning in order to generalize to unseen situations. This approach is compared to two popular deep reinforcement learning techniques: deep-Q-networks and Asynchronous actor-critic (A3C). The proposed method as well as the reinforcement learning methods employ deep convolutional neural networks and learn directly from raw visual input. Methods for combining learning from demonstrations and experience are also investigated. This combination aims to join the generalization ability of learning by experience with the efficiency of learning by imitation. The proposed methods are evaluated on 4 navigation tasks in a 3D simulated environment. Navigation tasks are a typical problem that is relevant to many real applications. They pose the challenge of requiring demonstrations of long trajectories to reach the target and only providing delayed rewards (usually terminal) to the agent. The experiments show that the proposed method can successfully learn navigation tasks from raw visual input while learning from experience methods fail to learn an effective policy. Moreover, it is shown that active learning can significantly improve the performance of the initially learned policy using a small number of active samples.

  11. Feasibility of nitrification/denitrification in a sequencing batch biofilm reactor with liquid circulation applied to post-treatment.

    PubMed

    Andrade do Canto, Catarina Simone; Rodrigues, José Alberto Domingues; Ratusznei, Suzana Maria; Zaiat, Marcelo; Foresti, Eugênio

    2008-02-01

    An investigation was performed on the biological removal of ammonium nitrogen from synthetic wastewater by the simultaneous nitrification/denitrification (SND) process, using a sequencing batch biofilm reactor (SBBR). System behavior was analyzed as to the effects of sludge type used as inoculum (autotrophic/heterotrophic), wastewater feed strategy (batch/fed-batch) and aeration strategy (continuous/intermittent). The presence of an autotrophic aerobic sludge showed to be essential for nitrification startup, despite publications stating the existence of heterotrophic organisms capable of nitrifying organic and inorganic nitrogen compounds at low dissolved oxygen concentrations. As to feed strategy, batch operation (synthetic wastewater containing 100 mg COD/L and 50 mg N-NH(4)(+)/L) followed by fed-batch (synthetic wastewater with 100 mg COD/L) during a whole cycle seemed to be the most adequate, mainly during the denitrification phase. Regarding aeration strategy, an intermittent mode, with dissolved oxygen concentration of 2.0mg/L in the aeration phase, showed the best results. Under these optimal conditions, 97% of influent ammonium nitrogen (80% of total nitrogen) was removed at a rate of 86.5 mg N-NH(4)(+)/Ld. In the treated effluent only 0.2 mg N-NO(2)(-)/L,4.6 mg N-NO(3)(-)/L and 1.0 mg N-NH(4)(+)/L remained, demonstrating the potential viability of this process in post-treatment of wastewaters containing ammonium nitrogen.

  12. Kinetic characterization and fed-batch fermentation for maximal simultaneous production of esterase and protease from Lysinibacillus fusiformis AU01.

    PubMed

    Divakar, K; Suryia Prabha, M; Nandhinidevi, G; Gautam, P

    2017-04-21

    The simultaneous production of intracellular esterase and extracellular protease from the strain Lysinibacillus fusiformis AU01 was studied in detail. The production was performed both under batch and fed-batch modes. The maximum yield of intracellular esterase and protease was obtained under full oxygen saturation at the beginning of the fermentation. The data were fitted to the Luedeking-Piret model and it was shown that the enzyme (both esterase and protease) production was growth associated. A decrease in intracellular esterase and increase in the extracellular esterase were observed during late stationary phase. The appearance of intracellular proteins in extracellular media and decrease in viable cell count and biomass during late stationary phase confirmed that the presence of extracellular esterase is due to cell lysis. Even though the fed-batch fermentation with different feeding strategies showed improved productivity, feeding yeast extract under DO-stat fermentation conditions showed highest intracellular esterase and protease production. Under DO-stat fed-batch cultivation, maximum intracellular esterase activity of 820 × 10 3 U/L and extracellular protease activity of 172 × 10 3 U/L were obtained at the 16th hr. Intracellular esterase and extracellular protease production were increased fivefold and fourfold, respectively, when compared to batch fermentation performed under shake flask conditions.

  13. A computational psychiatry approach identifies how alpha-2A noradrenergic agonist Guanfacine affects feature-based reinforcement learning in the macaque

    PubMed Central

    Hassani, S. A.; Oemisch, M.; Balcarras, M.; Westendorff, S.; Ardid, S.; van der Meer, M. A.; Tiesinga, P.; Womelsdorf, T.

    2017-01-01

    Noradrenaline is believed to support cognitive flexibility through the alpha 2A noradrenergic receptor (a2A-NAR) acting in prefrontal cortex. Enhanced flexibility has been inferred from improved working memory with the a2A-NA agonist Guanfacine. But it has been unclear whether Guanfacine improves specific attention and learning mechanisms beyond working memory, and whether the drug effects can be formalized computationally to allow single subject predictions. We tested and confirmed these suggestions in a case study with a healthy nonhuman primate performing a feature-based reversal learning task evaluating performance using Bayesian and Reinforcement learning models. In an initial dose-testing phase we found a Guanfacine dose that increased performance accuracy, decreased distractibility and improved learning. In a second experimental phase using only that dose we examined the faster feature-based reversal learning with Guanfacine with single-subject computational modeling. Parameter estimation suggested that improved learning is not accounted for by varying a single reinforcement learning mechanism, but by changing the set of parameter values to higher learning rates and stronger suppression of non-chosen over chosen feature information. These findings provide an important starting point for developing nonhuman primate models to discern the synaptic mechanisms of attention and learning functions within the context of a computational neuropsychiatry framework. PMID:28091572

  14. Electrophysiological correlates of observational learning in children.

    PubMed

    Rodriguez Buritica, Julia M; Eppinger, Ben; Schuck, Nicolas W; Heekeren, Hauke R; Li, Shu-Chen

    2016-09-01

    Observational learning is an important mechanism for cognitive and social development. However, the neurophysiological mechanisms underlying observational learning in children are not well understood. In this study, we used a probabilistic reward-based observational learning paradigm to compare behavioral and electrophysiological markers of individual and observational reinforcement learning in 8- to 10-year-old children. Specifically, we manipulated the amount of observable information as well as children's similarity in age to the observed person (same-aged child vs. adult) to examine the effects of similarity in age on the integration of observed information in children. We show that the feedback-related negativity (FRN) during individual reinforcement learning reflects the valence of outcomes of own actions. Furthermore, we found that the feedback-related negativity during observational reinforcement learning (oFRN) showed a similar distinction between outcome valences of observed actions. This suggests that the oFRN can serve as a measure of observational learning in middle childhood. Moreover, during observational learning children profited from the additional social information and imitated the choices of their own peers more than those of adults, indicating that children have a tendency to conform more with similar others (e.g. their own peers) compared to dissimilar others (adults). Taken together, our results show that children can benefit from integrating observable information and that oFRN may serve as a measure of observational learning in children. © 2015 John Wiley & Sons Ltd.

  15. Mechanisms and time course of vocal learning and consolidation in the adult songbird.

    PubMed

    Warren, Timothy L; Tumer, Evren C; Charlesworth, Jonathan D; Brainard, Michael S

    2011-10-01

    In songbirds, the basal ganglia outflow nucleus LMAN is a cortical analog that is required for several forms of song plasticity and learning. Moreover, in adults, inactivating LMAN can reverse the initial expression of learning driven via aversive reinforcement. In the present study, we investigated how LMAN contributes to both reinforcement-driven learning and a self-driven recovery process in adult Bengalese finches. We first drove changes in the fundamental frequency of targeted song syllables and compared the effects of inactivating LMAN with the effects of interfering with N-methyl-d-aspartate (NMDA) receptor-dependent transmission from LMAN to one of its principal targets, the song premotor nucleus RA. Inactivating LMAN and blocking NMDA receptors in RA caused indistinguishable reversions in the expression of learning, indicating that LMAN contributes to learning through NMDA receptor-mediated glutamatergic transmission to RA. We next assessed how LMAN's role evolves over time by maintaining learned changes to song while periodically inactivating LMAN. The expression of learning consolidated to become LMAN independent over multiple days, indicating that this form of consolidation is not completed over one night, as previously suggested, and instead may occur gradually during singing. Subsequent cessation of reinforcement was followed by a gradual self-driven recovery of original song structure, indicating that consolidation does not correspond with the lasting retention of changes to song. Finally, for self-driven recovery, as for reinforcement-driven learning, LMAN was required for the expression of initial, but not later, changes to song. Our results indicate that NMDA receptor-dependent transmission from LMAN to RA plays an essential role in the initial expression of two distinct forms of vocal learning and that this role gradually wanes over a multiday process of consolidation. The results support an emerging view that cortical-basal ganglia circuits can direct the initial expression of learning via top-down influences on primary motor circuitry.

  16. Mechanisms and time course of vocal learning and consolidation in the adult songbird

    PubMed Central

    Tumer, Evren C.; Charlesworth, Jonathan D.; Brainard, Michael S.

    2011-01-01

    In songbirds, the basal ganglia outflow nucleus LMAN is a cortical analog that is required for several forms of song plasticity and learning. Moreover, in adults, inactivating LMAN can reverse the initial expression of learning driven via aversive reinforcement. In the present study, we investigated how LMAN contributes to both reinforcement-driven learning and a self-driven recovery process in adult Bengalese finches. We first drove changes in the fundamental frequency of targeted song syllables and compared the effects of inactivating LMAN with the effects of interfering with N-methyl-d-aspartate (NMDA) receptor-dependent transmission from LMAN to one of its principal targets, the song premotor nucleus RA. Inactivating LMAN and blocking NMDA receptors in RA caused indistinguishable reversions in the expression of learning, indicating that LMAN contributes to learning through NMDA receptor-mediated glutamatergic transmission to RA. We next assessed how LMAN's role evolves over time by maintaining learned changes to song while periodically inactivating LMAN. The expression of learning consolidated to become LMAN independent over multiple days, indicating that this form of consolidation is not completed over one night, as previously suggested, and instead may occur gradually during singing. Subsequent cessation of reinforcement was followed by a gradual self-driven recovery of original song structure, indicating that consolidation does not correspond with the lasting retention of changes to song. Finally, for self-driven recovery, as for reinforcement-driven learning, LMAN was required for the expression of initial, but not later, changes to song. Our results indicate that NMDA receptor-dependent transmission from LMAN to RA plays an essential role in the initial expression of two distinct forms of vocal learning and that this role gradually wanes over a multiday process of consolidation. The results support an emerging view that cortical-basal ganglia circuits can direct the initial expression of learning via top-down influences on primary motor circuitry. PMID:21734110

  17. Neural Control of a Tracking Task via Attention-Gated Reinforcement Learning for Brain-Machine Interfaces.

    PubMed

    Wang, Yiwen; Wang, Fang; Xu, Kai; Zhang, Qiaosheng; Zhang, Shaomin; Zheng, Xiaoxiang

    2015-05-01

    Reinforcement learning (RL)-based brain machine interfaces (BMIs) enable the user to learn from the environment through interactions to complete the task without desired signals, which is promising for clinical applications. Previous studies exploited Q-learning techniques to discriminate neural states into simple directional actions providing the trial initial timing. However, the movements in BMI applications can be quite complicated, and the action timing explicitly shows the intention when to move. The rich actions and the corresponding neural states form a large state-action space, imposing generalization difficulty on Q-learning. In this paper, we propose to adopt attention-gated reinforcement learning (AGREL) as a new learning scheme for BMIs to adaptively decode high-dimensional neural activities into seven distinct movements (directional moves, holdings and resting) due to the efficient weight-updating. We apply AGREL on neural data recorded from M1 of a monkey to directly predict a seven-action set in a time sequence to reconstruct the trajectory of a center-out task. Compared to Q-learning techniques, AGREL could improve the target acquisition rate to 90.16% in average with faster convergence and more stability to follow neural activity over multiple days, indicating the potential to achieve better online decoding performance for more complicated BMI tasks.

  18. A Semisupervised Support Vector Machines Algorithm for BCI Systems

    PubMed Central

    Qin, Jianzhao; Li, Yuanqing; Sun, Wei

    2007-01-01

    As an emerging technology, brain-computer interfaces (BCIs) bring us new communication interfaces which translate brain activities into control signals for devices like computers, robots, and so forth. In this study, we propose a semisupervised support vector machine (SVM) algorithm for brain-computer interface (BCI) systems, aiming at reducing the time-consuming training process. In this algorithm, we apply a semisupervised SVM for translating the features extracted from the electrical recordings of brain into control signals. This SVM classifier is built from a small labeled data set and a large unlabeled data set. Meanwhile, to reduce the time for training semisupervised SVM, we propose a batch-mode incremental learning method, which can also be easily applied to the online BCI systems. Additionally, it is suggested in many studies that common spatial pattern (CSP) is very effective in discriminating two different brain states. However, CSP needs a sufficient labeled data set. In order to overcome the drawback of CSP, we suggest a two-stage feature extraction method for the semisupervised learning algorithm. We apply our algorithm to two BCI experimental data sets. The offline data analysis results demonstrate the effectiveness of our algorithm. PMID:18368141

  19. CDC/1000: a Control Data Corporation remote batch terminal emulator for Hewlett-Packard minicomputers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berg, D.E.

    1981-02-01

    The Control Data Corporation Type 200 User Terminal utilizes a unique communications protocol to provide users with batch mode remote terminal access to Control Data computers. CDC/1000 is a software subsystem that implements this protocol on Hewlett-Packard minicomputers running the Real Time Executive III, IV, or IVB operating systems. This report provides brief descriptions of the various software modules comprising CDC/1000, and contains detailed instructions for integrating CDC/1000 into the Hewlett Packard operating system and for operating UTERM, the user interface program for CDC/1000. 6 figures.

  20. The Effects of Short Interval Delay of Reinforcement Upon Human Discrimination Learning. IMRID Papers and Reports Vol. 4 No. 12.

    ERIC Educational Resources Information Center

    Kral, Paul A.; And Others

    Investigates the effect of delay of reinforcement upon human discrimination learning with particular emphasis on the form of the gradient within the first few seconds of delay. In previous studies subjects are usually required to make an instrumental response to a stimulus, this is followed by the delay interval, and finally, the reinforcement…

  1. Learning Theory and the Typewriter Teacher

    ERIC Educational Resources Information Center

    Wakin, B. Bertha

    1974-01-01

    Eight basic principles of learning are described and discussed in terms of practical learning strategies for typewriting. Described are goal setting, preassessment, active participation, individual differences, reinforcement, practice, transfer of learning, and evaluation. (SC)

  2. Energy harvesting from coupled bending-twisting oscillations in carbon-fibre reinforced polymer laminates

    NASA Astrophysics Data System (ADS)

    Xie, Mengying; Zhang, Yan; Kraśny, Marcin J.; Rhead, Andrew; Bowen, Chris; Arafa, Mustafa

    2018-07-01

    The energy harvesting capability of resonant harvesting structures, such as piezoelectric cantilever beams, can be improved by utilizing coupled oscillations that generate favourable strain mode distributions. In this work, we present the first demonstration of the use of a laminated carbon fibre reinforced polymer to create cantilever beams that undergo coupled bending-twisting oscillations for energy harvesting applications. Piezoelectric layers that operate in bending and shear mode are attached to the bend-twist coupled beam surface at locations of maximum bending and torsional strains in the first mode of vibration to fully exploit the strain distribution along the beam. Modelling of this new bend-twist harvesting system is presented, which compares favourably with experimental results. It is demonstrated that the variety of bend and torsional modes of the harvesters can be utilized to create a harvester that operates over a wider range of frequencies and such multi-modal device architectures provides a unique approach to tune the frequency response of resonant harvesting systems.

  3. Automated 3D structure composition for large RNAs

    PubMed Central

    Popenda, Mariusz; Szachniuk, Marta; Antczak, Maciej; Purzycka, Katarzyna J.; Lukasiak, Piotr; Bartol, Natalia; Blazewicz, Jacek; Adamiak, Ryszard W.

    2012-01-01

    Understanding the numerous functions that RNAs play in living cells depends critically on knowledge of their three-dimensional structure. Due to the difficulties in experimentally assessing structures of large RNAs, there is currently great demand for new high-resolution structure prediction methods. We present the novel method for the fully automated prediction of RNA 3D structures from a user-defined secondary structure. The concept is founded on the machine translation system. The translation engine operates on the RNA FRABASE database tailored to the dictionary relating the RNA secondary structure and tertiary structure elements. The translation algorithm is very fast. Initial 3D structure is composed in a range of seconds on a single processor. The method assures the prediction of large RNA 3D structures of high quality. Our approach needs neither structural templates nor RNA sequence alignment, required for comparative methods. This enables the building of unresolved yet native and artificial RNA structures. The method is implemented in a publicly available, user-friendly server RNAComposer. It works in an interactive mode and a batch mode. The batch mode is designed for large-scale modelling and accepts atomic distance restraints. Presently, the server is set to build RNA structures of up to 500 residues. PMID:22539264

  4. Disrupted Reinforcement Learning and Maladaptive Behavior in Women with a History of Childhood Sexual Abuse: A High-Density Event-Related Potential Study

    PubMed Central

    Pechtel, Pia; Pizzagalli, Diego A.

    2013-01-01

    Context Childhood sexual abuse (CSA) has been associated with psychopathology, particularly major depressive disorder (MDD), and high-risk behaviors. Despite grave epidemiological data, the mechanisms underlying these maladaptive outcomes remain poorly understood. Objective We examined whether CSA history, particularly in conjunction with past MDD, is associated with behavioral and neural dysfunction in reinforcement learning, and whether such dysfunction is linked to maladaptive behavior. Design Participants completed a clinical evaluation and a probabilistic reinforcement task while 128-channel event-related potentials were recorded. Setting Academic setting; participants recruited from the community. Participants Fifteen remitted depressed females with CSA history (CSA+rMDD), 16 remitted depressed females without CSA history (rMDD), and 18 healthy females. Main Outcome Measures Participants’ preference for choosing the most rewarded stimulus and avoiding the most punished stimulus was evaluated. The feedback-related negativity (FRN) and error-related negativity (ERN)–hypothesized to reflect activation in the anterior cingulate cortex–were used as electrophysiological indices of reinforcement learning. Results No group differences emerged in the acquisition of reinforcement contingencies. In trials requiring to rely partially or exclusively on previously rewarded information, the CSA+rMDD group showed (1) lower accuracy (relative to both controls and rMDD), (2) blunted electrophysiological differentiation between correct and incorrect responses (relative to controls), and (3) increased activation in the subgenual anterior cingulate cortex (relative to rMDD). CSA history was not associated with impairments in avoiding the most punished stimulus. Self-harm and suicidal behaviors correlated with poorer performance of previously rewarded–but not previously punished–trials. Conclusions Irrespective of past MDD, women with CSA histories showed neural and behavioral deficits in utilizing previous reinforcement to optimize decision-making in the absence of feedback (blunted “Go learning”). While the current study provides initial evidence for reward-specific deficits associated with CSA, future research is warranted to determine if disrupted positive reinforcement learning predicts high-risk behavior following CSA. PMID:23487253

  5. Closed-Loop and Robust Control of Quantum Systems

    PubMed Central

    Wang, Lin-Cheng

    2013-01-01

    For most practical quantum control systems, it is important and difficult to attain robustness and reliability due to unavoidable uncertainties in the system dynamics or models. Three kinds of typical approaches (e.g., closed-loop learning control, feedback control, and robust control) have been proved to be effective to solve these problems. This work presents a self-contained survey on the closed-loop and robust control of quantum systems, as well as a brief introduction to a selection of basic theories and methods in this research area, to provide interested readers with a general idea for further studies. In the area of closed-loop learning control of quantum systems, we survey and introduce such learning control methods as gradient-based methods, genetic algorithms (GA), and reinforcement learning (RL) methods from a unified point of view of exploring the quantum control landscapes. For the feedback control approach, the paper surveys three control strategies including Lyapunov control, measurement-based control, and coherent-feedback control. Then such topics in the field of quantum robust control as H ∞ control, sliding mode control, quantum risk-sensitive control, and quantum ensemble control are reviewed. The paper concludes with a perspective of future research directions that are likely to attract more attention. PMID:23997680

  6. Data-driven reinforcement learning–based real-time energy management system for plug-in hybrid electric vehicles

    DOE PAGES

    Qi, Xuewei; Wu, Guoyuan; Boriboonsomsin, Kanok; ...

    2016-01-01

    Plug-in hybrid electric vehicles (PHEVs) show great promise in reducing transportation-related fossil fuel consumption and greenhouse gas emissions. Designing an efficient energy management system (EMS) for PHEVs to achieve better fuel economy has been an active research topic for decades. Most of the advanced systems rely either on a priori knowledge of future driving conditions to achieve the optimal but not real-time solution (e.g., using a dynamic programming strategy) or on only current driving situations to achieve a real-time but nonoptimal solution (e.g., rule-based strategy). This paper proposes a reinforcement learning–based real-time EMS for PHEVs to address the trade-off betweenmore » real-time performance and optimal energy savings. The proposed model can optimize the power-split control in real time while learning the optimal decisions from historical driving cycles. Here, a case study on a real-world commute trip shows that about a 12% fuel saving can be achieved without considering charging opportunities; further, an 8% fuel saving can be achieved when charging opportunities are considered, compared with the standard binary mode control strategy.« less

  7. Data-driven reinforcement learning–based real-time energy management system for plug-in hybrid electric vehicles

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Qi, Xuewei; Wu, Guoyuan; Boriboonsomsin, Kanok

    Plug-in hybrid electric vehicles (PHEVs) show great promise in reducing transportation-related fossil fuel consumption and greenhouse gas emissions. Designing an efficient energy management system (EMS) for PHEVs to achieve better fuel economy has been an active research topic for decades. Most of the advanced systems rely either on a priori knowledge of future driving conditions to achieve the optimal but not real-time solution (e.g., using a dynamic programming strategy) or on only current driving situations to achieve a real-time but nonoptimal solution (e.g., rule-based strategy). This paper proposes a reinforcement learning–based real-time EMS for PHEVs to address the trade-off betweenmore » real-time performance and optimal energy savings. The proposed model can optimize the power-split control in real time while learning the optimal decisions from historical driving cycles. Here, a case study on a real-world commute trip shows that about a 12% fuel saving can be achieved without considering charging opportunities; further, an 8% fuel saving can be achieved when charging opportunities are considered, compared with the standard binary mode control strategy.« less

  8. Reinforcement Learning Deficits in People with Schizophrenia Persist after Extended Trials

    PubMed Central

    Cicero, David C.; Martin, Elizabeth A.; Becker, Theresa M.; Kerns, John G.

    2014-01-01

    Previous research suggests that people with schizophrenia have difficulty learning from positive feedback and when learning needs to occur rapidly. However, they seem to have relatively intact learning from negative feedback when learning occurs gradually. Participants are typically given a limited amount of acquisition trials to learn the reward contingencies and then tested about what they learned. The current study examined whether participants with schizophrenia continue to display these deficits when given extra time to learn the contingences. Participants with schizophrenia and matched healthy controls completed the Probabilistic Selection Task, which measures positive and negative feedback learning separately. Participants with schizophrenia showed a deficit in learning from both positive and negative feedback. These reward learning deficits persisted even if people with schizophrenia are given extra time (up to 10 blocks of 60 trials) to learn the reward contingencies. These results suggest that the observed deficits cannot be attributed solely to slower learning and instead reflect a specific deficit in reinforcement learning. PMID:25172610

  9. Reinforcement learning deficits in people with schizophrenia persist after extended trials.

    PubMed

    Cicero, David C; Martin, Elizabeth A; Becker, Theresa M; Kerns, John G

    2014-12-30

    Previous research suggests that people with schizophrenia have difficulty learning from positive feedback and when learning needs to occur rapidly. However, they seem to have relatively intact learning from negative feedback when learning occurs gradually. Participants are typically given a limited amount of acquisition trials to learn the reward contingencies and then tested about what they learned. The current study examined whether participants with schizophrenia continue to display these deficits when given extra time to learn the contingences. Participants with schizophrenia and matched healthy controls completed the Probabilistic Selection Task, which measures positive and negative feedback learning separately. Participants with schizophrenia showed a deficit in learning from both positive feedback and negative feedback. These reward learning deficits persisted even if people with schizophrenia are given extra time (up to 10 blocks of 60 trials) to learn the reward contingencies. These results suggest that the observed deficits cannot be attributed solely to slower learning and instead reflect a specific deficit in reinforcement learning. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  10. Failure behavior of glass ionomer cement under Hertzian indentation.

    PubMed

    Wang, Yan; Darvell, B W

    2008-09-01

    To investigate the load-bearing capacity and failure mode of various types of glass ionomer cement (GIC) under Hertzian indentation, exploring the relationship between the failure behavior and formulation, and examining claims of filler-reinforcement of GIC. Discs 2mm thick, 10mm diameter, 8-18 replicates, were fabricated for two filler-reinforced GICs, four unmodified and unreinforced GICs, and four resin-modified GICs, with a dental silver amalgam and a filled-resin restorative material for comparison. Testing was at 23 degrees C, wet, after 7d storage at 37 degrees C in artificial saliva at pH 6, using a 20mm diameter hard steel ball and filled-nylon substrate (E: 10GPa). First failure was detected acoustically; mode was determined visually. At least 1/3 of specimens in each case were examined under scanning electronic microscope for corroboration. Reinforced and unmodified-unreinforced GICs were indistinguishable by failure load (one-way analysis of variance, P=0.425, overall 260+/-70N) and mode. Failure loads for resin-modified GICs were 360-1150N, amalgam approximately 680N, and filled resin approximately 1200N. Resin-modified GICs tended to be tougher (incomplete fracture), all others gave complete fracture (radial cracking). The stronger materials (two resin-modified GICs and filled resin) showed some cone cracking. While resin-modified GICs showed various extents of increase of failure load over that of the plain GICs, consistent with the hybrid chemistry, filler-reinforcement was not evident for the two claimed products, consistent with structural and theoretical expectations.

  11. Reinforcement learning of periodical gaits in locomotion robots

    NASA Astrophysics Data System (ADS)

    Svinin, Mikhail; Yamada, Kazuyaki; Ushio, S.; Ueda, Kanji

    1999-08-01

    Emergence of stable gaits in locomotion robots is studied in this paper. A classifier system, implementing an instance- based reinforcement learning scheme, is used for sensory- motor control of an eight-legged mobile robot. Important feature of the classifier system is its ability to work with the continuous sensor space. The robot does not have a prior knowledge of the environment, its own internal model, and the goal coordinates. It is only assumed that the robot can acquire stable gaits by learning how to reach a light source. During the learning process the control system, is self-organized by reinforcement signals. Reaching the light source defines a global reward. Forward motion gets a local reward, while stepping back and falling down get a local punishment. Feasibility of the proposed self-organized system is tested under simulation and experiment. The control actions are specified at the leg level. It is shown that, as learning progresses, the number of the action rules in the classifier systems is stabilized to a certain level, corresponding to the acquired gait patterns.

  12. Biases in probabilistic category learning in relation to social anxiety

    PubMed Central

    Abraham, Anna; Hermann, Christiane

    2015-01-01

    Instrumental learning paradigms are rarely employed to investigate the mechanisms underlying acquired fear responses in social anxiety. Here, we adapted a probabilistic category learning paradigm to assess information processing biases as a function of the degree of social anxiety traits in a sample of healthy individuals without a diagnosis of social phobia. Participants were presented with three pairs of neutral faces with differing probabilistic accuracy contingencies (A/B: 80/20, C/D: 70/30, E/F: 60/40). Upon making their choice, negative and positive feedback was conveyed using angry and happy faces, respectively. The highly socially anxious group showed a strong tendency to be more accurate at learning the probability contingency associated with the most ambiguous stimulus pair (E/F: 60/40). Moreover, when pairing the most positively reinforced stimulus or the most negatively reinforced stimulus with all the other stimuli in a test phase, the highly socially anxious group avoided the most negatively reinforced stimulus significantly more than the control group. The results are discussed with reference to avoidance learning and hypersensitivity to negative socially evaluative information associated with social anxiety. PMID:26347685

  13. Small group discussion: Students perspectives.

    PubMed

    Annamalai, Nachal; Manivel, Rajajeyakumar; Palanisamy, Rajendran

    2015-08-01

    Various alternative methods are being used in many medical colleges to reinforce didactic lectures in physiology. Small group teaching can take on a variety of different tasks such as problem-solving, role play, discussions, brainstorming, and debate. Research has demonstrated that group discussion promotes greater synthesis and retention of materials. The aims of this study were to adopt a problem-solving approach by relating basic sciences with the clinical scenario through self-learning. To develop soft skills, to understand principles of group dynamics, and adopt a new teaching learning methodology. Experimental study design was conducted in Phase I 1(st) year medical students of 2014-2015 batch (n = 120). On the day of the session, the students were grouped into small groups (15 each). The session started with the facilitator starting off the discussion. Feedback forms from five students in each group was taken (n = 40). A five point Likert scale was used ranging from strongly agree to strongly disagree. Data were analyzed using IBM SPSS Statistics for Windows, Version 21.0. Armonk, NY: IBM Corp. Our results show that 70% of the students opined that small group discussion were interactive, friendly, innovative, built interaction between teacher and student. Small group discussion increased their thought process and helped them in better communication. The small group discussion was interactive, friendly, and bridged the gap between the teacher and student. The student's communication skills are also improved. In conclusion, small group discussion is more effective than the traditional teaching methods.

  14. Surprise beyond prediction error

    PubMed Central

    Chumbley, Justin R; Burke, Christopher J; Stephan, Klaas E; Friston, Karl J; Tobler, Philippe N; Fehr, Ernst

    2014-01-01

    Surprise drives learning. Various neural “prediction error” signals are believed to underpin surprise-based reinforcement learning. Here, we report a surprise signal that reflects reinforcement learning but is neither un/signed reward prediction error (RPE) nor un/signed state prediction error (SPE). To exclude these alternatives, we measured surprise responses in the absence of RPE and accounted for a host of potential SPE confounds. This new surprise signal was evident in ventral striatum, primary sensory cortex, frontal poles, and amygdala. We interpret these findings via a normative model of surprise. PMID:24700400

  15. Optimal and Autonomous Control Using Reinforcement Learning: A Survey.

    PubMed

    Kiumarsi, Bahare; Vamvoudakis, Kyriakos G; Modares, Hamidreza; Lewis, Frank L

    2018-06-01

    This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.

  16. Suppression of Striatal Prediction Errors by the Prefrontal Cortex in Placebo Hypoalgesia.

    PubMed

    Schenk, Lieven A; Sprenger, Christian; Onat, Selim; Colloca, Luana; Büchel, Christian

    2017-10-04

    Classical learning theories predict extinction after the discontinuation of reinforcement through prediction errors. However, placebo hypoalgesia, although mediated by associative learning, has been shown to be resistant to extinction. We tested the hypothesis that this is mediated by the suppression of prediction error processing through the prefrontal cortex (PFC). We compared pain modulation through treatment cues (placebo hypoalgesia, treatment context) with pain modulation through stimulus intensity cues (stimulus context) during functional magnetic resonance imaging in 48 male and female healthy volunteers. During acquisition, our data show that expectations are correctly learned and that this is associated with prediction error signals in the ventral striatum (VS) in both contexts. However, in the nonreinforced test phase, pain modulation and expectations of pain relief persisted to a larger degree in the treatment context, indicating that the expectations were not correctly updated in the treatment context. Consistently, we observed significantly stronger neural prediction error signals in the VS in the stimulus context compared with the treatment context. A connectivity analysis revealed negative coupling between the anterior PFC and the VS in the treatment context, suggesting that the PFC can suppress the expression of prediction errors in the VS. Consistent with this, a participant's conceptual views and beliefs about treatments influenced the pain modulation only in the treatment context. Our results indicate that in placebo hypoalgesia contextual treatment information engages prefrontal conceptual processes, which can suppress prediction error processing in the VS and lead to reduced updating of treatment expectancies, resulting in less extinction of placebo hypoalgesia. SIGNIFICANCE STATEMENT In aversive and appetitive reinforcement learning, learned effects show extinction when reinforcement is discontinued. This is thought to be mediated by prediction errors (i.e., the difference between expectations and outcome). Although reinforcement learning has been central in explaining placebo hypoalgesia, placebo hypoalgesic effects show little extinction and persist after the discontinuation of reinforcement. Our results support the idea that conceptual treatment beliefs bias the neural processing of expectations in a treatment context compared with a more stimulus-driven processing of expectations with stimulus intensity cues. We provide evidence that this is associated with the suppression of prediction error processing in the ventral striatum by the prefrontal cortex. This provides a neural basis for persisting effects in reinforcement learning and placebo hypoalgesia. Copyright © 2017 the authors 0270-6474/17/379715-09$15.00/0.

  17. Working Memory Contributions to Reinforcement Learning Impairments in Schizophrenia

    PubMed Central

    Brown, Jaime K.; Gold, James M.; Waltz, James A.; Frank, Michael J.

    2014-01-01

    Previous research has shown that patients with schizophrenia are impaired in reinforcement learning tasks. However, behavioral learning curves in such tasks originate from the interaction of multiple neural processes, including the basal ganglia- and dopamine-dependent reinforcement learning (RL) system, but also prefrontal cortex-dependent cognitive strategies involving working memory (WM). Thus, it is unclear which specific system induces impairments in schizophrenia. We recently developed a task and computational model allowing us to separately assess the roles of RL (slow, cumulative learning) mechanisms versus WM (fast but capacity-limited) mechanisms in healthy adult human subjects. Here, we used this task to assess patients' specific sources of impairments in learning. In 15 separate blocks, subjects learned to pick one of three actions for stimuli. The number of stimuli to learn in each block varied from two to six, allowing us to separate influences of capacity-limited WM from the incremental RL system. As expected, both patients (n = 49) and healthy controls (n = 36) showed effects of set size and delay between stimulus repetitions, confirming the presence of working memory effects. Patients performed significantly worse than controls overall, but computational model fits and behavioral analyses indicate that these deficits could be entirely accounted for by changes in WM parameters (capacity and reliability), whereas RL processes were spared. These results suggest that the working memory system contributes strongly to learning impairments in schizophrenia. PMID:25297101

  18. Reconsideration of Serial Visual Reversal Learning in Octopus (Octopus vulgaris) from a Methodological Perspective

    PubMed Central

    Bublitz, Alexander; Weinhold, Severine R.; Strobel, Sophia; Dehnhardt, Guido; Hanke, Frederike D.

    2017-01-01

    Octopuses (Octopus vulgaris) are generally considered to possess extraordinary cognitive abilities including the ability to successfully perform in a serial reversal learning task. During reversal learning, an animal is presented with a discrimination problem and after reaching a learning criterion, the signs of the stimuli are reversed: the former positive becomes the negative stimulus and vice versa. If an animal improves its performance over reversals, it is ascribed advanced cognitive abilities. Reversal learning has been tested in octopus in a number of studies. However, the experimental procedures adopted in these studies involved pre-training on the new positive stimulus after a reversal, strong negative reinforcement or might have enabled secondary cueing by the experimenter. These procedures could have all affected the outcome of reversal learning. Thus, in this study, serial visual reversal learning was revisited in octopus. We trained four common octopuses (O. vulgaris) to discriminate between 2-dimensional stimuli presented on a monitor in a simultaneous visual discrimination task and reversed the signs of the stimuli each time the animals reached the learning criterion of ≥80% in two consecutive sessions. The animals were trained using operant conditioning techniques including a secondary reinforcer, a rod that was pushed up and down the feeding tube, which signaled the correctness of a response and preceded the subsequent primary reinforcement of food. The experimental protocol did not involve negative reinforcement. One animal completed four reversals and showed progressive improvement, i.e., it decreased its errors to criterion the more reversals it experienced. This animal developed a generalized response strategy. In contrast, another animal completed only one reversal, whereas two animals did not learn to reverse during the first reversal. In conclusion, some octopus individuals can learn to reverse in a visual task demonstrating behavioral flexibility even with a refined methodology. PMID:28223940

  19. Learning in Mental Retardation: A Comprehensive Bibliography.

    ERIC Educational Resources Information Center

    Gardner, James M.; And Others

    The bibliography on learning in mentally handicapped persons is divided into the following topic categories: applied behavior change, classical conditioning, discrimination, generalization, motor learning, reinforcement, verbal learning, and miscellaneous. An author index is included. (KW)

  20. Asynchronous Gossip for Averaging and Spectral Ranking

    NASA Astrophysics Data System (ADS)

    Borkar, Vivek S.; Makhijani, Rahul; Sundaresan, Rajesh

    2014-08-01

    We consider two variants of the classical gossip algorithm. The first variant is a version of asynchronous stochastic approximation. We highlight a fundamental difficulty associated with the classical asynchronous gossip scheme, viz., that it may not converge to a desired average, and suggest an alternative scheme based on reinforcement learning that has guaranteed convergence to the desired average. We then discuss a potential application to a wireless network setting with simultaneous link activation constraints. The second variant is a gossip algorithm for distributed computation of the Perron-Frobenius eigenvector of a nonnegative matrix. While the first variant draws upon a reinforcement learning algorithm for an average cost controlled Markov decision problem, the second variant draws upon a reinforcement learning algorithm for risk-sensitive control. We then discuss potential applications of the second variant to ranking schemes, reputation networks, and principal component analysis.

  1. Reinforcement learning in professional basketball players

    PubMed Central

    Neiman, Tal; Loewenstein, Yonatan

    2011-01-01

    Reinforcement learning in complex natural environments is a challenging task because the agent should generalize from the outcomes of actions taken in one state of the world to future actions in different states of the world. The extent to which human experts find the proper level of generalization is unclear. Here we show, using the sequences of field goal attempts made by professional basketball players, that the outcome of even a single field goal attempt has a considerable effect on the rate of subsequent 3 point shot attempts, in line with standard models of reinforcement learning. However, this change in behaviour is associated with negative correlations between the outcomes of successive field goal attempts. These results indicate that despite years of experience and high motivation, professional players overgeneralize from the outcomes of their most recent actions, which leads to decreased performance. PMID:22146388

  2. Universal effect of dynamical reinforcement learning mechanism in spatial evolutionary games

    NASA Astrophysics Data System (ADS)

    Zhang, Hai-Feng; Wu, Zhi-Xi; Wang, Bing-Hong

    2012-06-01

    One of the prototypical mechanisms in understanding the ubiquitous cooperation in social dilemma situations is the win-stay, lose-shift rule. In this work, a generalized win-stay, lose-shift learning model—a reinforcement learning model with dynamic aspiration level—is proposed to describe how humans adapt their social behaviors based on their social experiences. In the model, the players incorporate the information of the outcomes in previous rounds with time-dependent aspiration payoffs to regulate the probability of choosing cooperation. By investigating such a reinforcement learning rule in the spatial prisoner's dilemma game and public goods game, a most noteworthy viewpoint is that moderate greediness (i.e. moderate aspiration level) favors best the development and organization of collective cooperation. The generality of this observation is tested against different regulation strengths and different types of network of interaction as well. We also make comparisons with two recently proposed models to highlight the importance of the mechanism of adaptive aspiration level in supporting cooperation in structured populations.

  3. From creatures of habit to goal-directed learners: Tracking the developmental emergence of model-based reinforcement learning

    PubMed Central

    Decker, Johannes H.; Otto, A. Ross; Daw, Nathaniel D.; Hartley, Catherine A.

    2016-01-01

    Theoretical models distinguish two decision-making strategies that have been formalized in reinforcement-learning theory. A model-based strategy leverages a cognitive model of potential actions and their consequences to make goal-directed choices, whereas a model-free strategy evaluates actions based solely on their reward history. Research in adults has begun to elucidate the psychological mechanisms and neural substrates underlying these learning processes and factors that influence their relative recruitment. However, the developmental trajectory of these evaluative strategies has not been well characterized. In this study, children, adolescents, and adults, performed a sequential reinforcement-learning task that enables estimation of model-based and model-free contributions to choice. Whereas a model-free strategy was evident in choice behavior across all age groups, evidence of a model-based strategy only emerged during adolescence and continued to increase into adulthood. These results suggest that recruitment of model-based valuation systems represents a critical cognitive component underlying the gradual maturation of goal-directed behavior. PMID:27084852

  4. Reinforcement learning or active inference?

    PubMed

    Friston, Karl J; Daunizeau, Jean; Kiebel, Stefan J

    2009-07-29

    This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  5. A cholinergic feedback circuit to regulate striatal population uncertainty and optimize reinforcement learning

    PubMed Central

    Franklin, Nicholas T; Frank, Michael J

    2015-01-01

    Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments. DOI: http://dx.doi.org/10.7554/eLife.12029.001 PMID:26705698

  6. Distributed Economic Dispatch in Microgrids Based on Cooperative Reinforcement Learning.

    PubMed

    Liu, Weirong; Zhuang, Peng; Liang, Hao; Peng, Jun; Huang, Zhiwu; Weirong Liu; Peng Zhuang; Hao Liang; Jun Peng; Zhiwu Huang; Liu, Weirong; Liang, Hao; Peng, Jun; Zhuang, Peng; Huang, Zhiwu

    2018-06-01

    Microgrids incorporated with distributed generation (DG) units and energy storage (ES) devices are expected to play more and more important roles in the future power systems. Yet, achieving efficient distributed economic dispatch in microgrids is a challenging issue due to the randomness and nonlinear characteristics of DG units and loads. This paper proposes a cooperative reinforcement learning algorithm for distributed economic dispatch in microgrids. Utilizing the learning algorithm can avoid the difficulty of stochastic modeling and high computational complexity. In the cooperative reinforcement learning algorithm, the function approximation is leveraged to deal with the large and continuous state spaces. And a diffusion strategy is incorporated to coordinate the actions of DG units and ES devices. Based on the proposed algorithm, each node in microgrids only needs to communicate with its local neighbors, without relying on any centralized controllers. Algorithm convergence is analyzed, and simulations based on real-world meteorological and load data are conducted to validate the performance of the proposed algorithm.

  7. Self-directed learning readiness of Asian students: students perspective on a hybrid problem based learning curriculum.

    PubMed

    Leatemia, Lukas D; Susilo, Astrid P; van Berkel, Henk

    2016-12-03

    To identify the student's readiness to perform self-directed learning and the underlying factors influencing it on the hybrid problem based learning curriculum. A combination of quantitative and qualitative studies was conducted in five medical schools in Indonesia. In the quantitative study, the Self Directed Learning Readiness Scale was distributed to all students in all batches, who had experience with the hybrid problem based curriculum. They were categorized into low- and high -level based on the score of the questionnaire. Three focus group discussions (low-, high-, and mixed level) were conducted in the qualitative study with six to twelve students chosen randomly from each group to find the factors influencing their self-directed learning readiness. Two researchers analysed the qualitative data as a measure of triangulation. The quantitative study showed only half of the students had a high-level of self-directed learning readiness, and a similar trend also occurred in each batch. The proportion of students with a high level of self-directed learning readiness was lower in the senior students compared to more junior students. The qualitative study showed that problem based learning processes, assessments, learning environment, students' life styles, students' perceptions of the topics, and mood, were factors influencing their self-directed learning. A hybrid problem based curriculum may not fully affect the students' self-directed learning. The curriculum system, teacher's experiences, student's background and cultural factors might contribute to the difficulties for the student's in conducting self-directed learning.

  8. Photocatalytic degradation of carbofuran by TiO2-coated activated carbon: Model for kinetic, electrical energy per order and economic analysis.

    PubMed

    Vishnuganth, M A; Remya, Neelancherry; Kumar, Mathava; Selvaraju, N

    2016-10-01

    The photocatalytic removal of carbofuran (CBF) from aqueous solution in the presence of granular activated carbon supported TiO2 (GAC-TiO2) catalyst was investigated under batch-mode experiments. The presence of GAC enhanced the photocatalytic efficiency of the TiO2 catalyst. Experiments were conducted at different concentrations of CBF to clarify the dependence of apparent rate constant (kapp) in the pseudo first-order kinetics on CBF photodegradation. The general relationship between the adsorption equilibrium constant (K) and reaction rate constant (kr) were explained by using the modified Langmuir-Hinshelwood (L-H) model. From the observed kinetics, it was observed that the surface reaction was the rate limiting step in the GAC-TiO2 catalyzed photodegradation of CBF. The values of K and kr for this pseudo first-order reaction were found to be 0.1942 L  mg(-1) and 1.51 mg L(-1) min(-1), respectively. In addition, the dependence of kapp on the half-life time was determined by calculating the electrical energy per order experimentally (EEO experimental) and also by modeling (EEO model). The batch-mode experimental outcomes revealed the possibility of 100% CBF removal (under optimized conditions and at an initial concentration of 50 mg L(-1) and 100 mg L(-1)) at a contact time of 90 min and 120 min, respectively. Both L-H kinetic model and EEO model fitted well with the batch-mode experimental data and also elucidated successfully the phenomena of photocatalytic degradation in the presence of GAC-TiO2 catalyst. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. Aerobic sludge digestion under low dissolved oxygen concentrations.

    PubMed

    Arunachalam, RaviSankar; Shah, Hemant K; Ju, Lu-Kwang

    2004-01-01

    Low dissolved oxygen (DO) concentrations occur commonly in aerobic digesters treating thickened sludge, with benefits of smaller digester size, much reduced aeration cost, and higher digestion temperature (especially important for plants in colder areas). The effects of low DO concentrations on digestion kinetics were studied using the sludge from municipal wastewater treatment plants in Akron, Ohio, and Los Lunas, New Mexico. The experiments were conducted in both batch digestion and a mixed mode of continuous, fed-batch, and batch operations. The low DO condition was clearly advantageous in eliminating the need for pH control because of the simultaneous occurrence of nitrification and denitrification. However, when compared with fully aerobic (high DO) systems under constant pH control (rare in full-scale plants), low DO concentrations and a higher solids loading had a negative effect on the specific volatile solids (VS) digestion kinetics. Nonetheless, the overall (volumetric) digestion performance depends not only on the specific digestion kinetics, but also the solids concentration, pH, and digester temperature. All of the latter factors favor the low DO digestion of thickened sludge. The significant effect of temperature on low DO digestion was confirmed in the mixed-mode study with the Akron sludge. When compared with the well-known empirical correlation between VS reduction and the product (temperature x solids retention time), the experimental data followed the same trend, but were lower than the correlation predictions. The latter was attributed to the lower digestible VS in the Akron sludge, the slower digestion at low DO concentrations, or both. Through model simulation, the first-order decay constant (kd) was estimated as 0.004 h(-1) in the mixed-mode operations, much lower than those (0.011 to 0.029 h(-1)) obtained in batch digestion. The findings suggested that the interactions among sludges with different treatment ages may have a substantially negative effect on digestion kinetics. The use of multistage digesters, especially with small front-end reactors, may be advantageous in both "process" kinetics and "biological reaction" kinetics for sludge digestion.

  10. Research progress of microbial corrosion of reinforced concrete structure

    NASA Astrophysics Data System (ADS)

    Li, Shengli; Li, Dawang; Jiang, Nan; Wang, Dongwei

    2011-04-01

    Microbial corrosion of reinforce concrete structure is a new branch of learning. This branch deals with civil engineering , environment engineering, biology, chemistry, materials science and so on and is a interdisciplinary area. Research progress of the causes, research methods and contents of microbial corrosion of reinforced concrete structure is described. The research in the field is just beginning and concerted effort is needed to go further into the mechanism of reinforce concrete structure and assess the security and natural life of reinforce concrete structure under the special condition and put forward the protective methods.

  11. Reinforcement learning with Marr.

    PubMed

    Niv, Yael; Langdon, Angela

    2016-10-01

    To many, the poster child for David Marr's famous three levels of scientific inquiry is reinforcement learning-a computational theory of reward optimization, which readily prescribes algorithmic solutions that evidence striking resemblance to signals found in the brain, suggesting a straightforward neural implementation. Here we review questions that remain open at each level of analysis, concluding that the path forward to their resolution calls for inspiration across levels, rather than a focus on mutual constraints.

  12. Time-Extended Policies in Mult-Agent Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Agogino, Adrian K.

    2004-01-01

    Reinforcement learning methods perform well in many domains where a single agent needs to take a sequence of actions to perform a task. These methods use sequences of single-time-step rewards to create a policy that tries to maximize a time-extended utility, which is a (possibly discounted) sum of these rewards. In this paper we build on our previous work showing how these methods can be extended to a multi-agent environment where each agent creates its own policy that works towards maximizing a time-extended global utility over all agents actions. We show improved methods for creating time-extended utilities for the agents that are both "aligned" with the global utility and "learnable." We then show how to crate single-time-step rewards while avoiding the pi fall of having rewards aligned with the global reward leading to utilities not aligned with the global utility. Finally, we apply these reward functions to the multi-agent Gridworld problem. We explicitly quantify a utility's learnability and alignment, and show that reinforcement learning agents using the prescribed reward functions successfully tradeoff learnability and alignment. As a result they outperform both global (e.g., team games ) and local (e.g., "perfectly learnable" ) reinforcement learning solutions by as much as an order of magnitude.

  13. Modeling Avoidance in Mood and Anxiety Disorders Using Reinforcement Learning.

    PubMed

    Mkrtchian, Anahit; Aylward, Jessica; Dayan, Peter; Roiser, Jonathan P; Robinson, Oliver J

    2017-10-01

    Serious and debilitating symptoms of anxiety are the most common mental health problem worldwide, accounting for around 5% of all adult years lived with disability in the developed world. Avoidance behavior-avoiding social situations for fear of embarrassment, for instance-is a core feature of such anxiety. However, as for many other psychiatric symptoms the biological mechanisms underlying avoidance remain unclear. Reinforcement learning models provide formal and testable characterizations of the mechanisms of decision making; here, we examine avoidance in these terms. A total of 101 healthy participants and individuals with mood and anxiety disorders completed an approach-avoidance go/no-go task under stress induced by threat of unpredictable shock. We show an increased reliance in the mood and anxiety group on a parameter of our reinforcement learning model that characterizes a prepotent (pavlovian) bias to withhold responding in the face of negative outcomes. This was particularly the case when the mood and anxiety group was under stress. This formal description of avoidance within the reinforcement learning framework provides a new means of linking clinical symptoms with biophysically plausible models of neural circuitry and, as such, takes us closer to a mechanistic understanding of mood and anxiety disorders. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  14. Variable Behavior and Repeated Learning in Two Mouse Strains: Developmental and Genetic Contributions.

    PubMed

    Arnold, Megan A; Newland, M Christopher

    2018-06-16

    Behavioral inflexibility is often assessed using reversal learning tasks, which require a relatively low degree of response variability. No studies have assessed sensitivity to reinforcement contingencies that specifically select highly variable response patterns in mice, let alone in models of neurodevelopmental disorders involving limited response variation. Operant variability and incremental repeated acquisition (IRA) were used to assess unique aspects of behavioral variability of two mouse strains: BALB/c, a model of some deficits in ASD, and C57Bl/6. On the operant variability task, BALB/c mice responded more repetitively during adolescence than C57Bl/6 mice when reinforcement did not require variability but responded more variably when reinforcement required variability. During IRA testing in adulthood, both strains acquired an unchanging, performance sequence equally well. Strain differences emerged, however, after novel learning sequences began alternating with the performance sequence: BALB/c mice substantially outperformed C57Bl/6 mice. Using litter-mate controls, it was found that adolescent experience with variability did not affect either learning or performance on the IRA task in adulthood. These findings constrain the use of BALB/c mice as a model of ASD, but once again reveal this strain is highly sensitive to reinforcement contingencies and they are fast and robust learners. Copyright © 2018. Published by Elsevier B.V.

  15. Preventing Learned Helplessness.

    ERIC Educational Resources Information Center

    Hoy, Cheri

    1986-01-01

    To prevent learned helplessness in learning disabled students, teachers can share responsibilities with the students, train students to reinforce themselves for effort and self control, and introduce opportunities for changing counterproductive attitudes. (CL)

  16. Relapse processes after the extinction of instrumental learning: Renewal, resurgence, and reacquisition

    PubMed Central

    Bouton, Mark E.; Winterbauer, Neil E.; Todd, Travis P.

    2012-01-01

    It is widely recognized that extinction (the procedure in which a Pavlovian conditioned stimulus or an instrumental action is repeatedly presented without its reinforcer) weakens behavior without erasing the original learning. Most of the experiments that support this claim have focused on several “relapse” effects that occur after Pavlovian extinction, which collectively suggest that the original learning is saved through extinction. However, although such effects do occur after instrumental extinction, they have not been explored there in as much detail. This article reviews recent research in our laboratory that has investigated three relapse effects that occur after the extinction of instrumental (operant) learning. In renewal, responding returns after extinction when the behavior is tested in a different context; in resurgence, responding recovers when a second response that has been reinforced during extinction of the first is itself put on extinction; and in rapid reacquisition, extinguished responding returns rapidly when the response is reinforced again. The results provide new insights into extinction and relapse, and are consistent with principles that have been developed to explain extinction and relapse as they occur after Pavlovian conditioning. Extinction of instrumental learning, like Pavlovian learning, involves new learning that is relatively dependent on the context for expression. PMID:22450305

  17. Enhanced appetitive learning and reversal learning in a mouse model for Prader-Willi syndrome.

    PubMed

    Relkovic, Dinko; Humby, Trevor; Hagan, Jim J; Wilkinson, Lawrence S; Isles, Anthony R

    2012-06-01

    Prader-Willi syndrome (PWS) is caused by lack of paternally derived gene expression from the imprinted gene cluster on human chromosome 15q11-q13. PWS is characterized by severe hypotonia, a failure to thrive in infancy and, on emerging from infancy, evidence of learning disabilities and overeating behavior due to an abnormal satiety response and increased motivation by food. We have previously shown that an imprinting center deletion mouse model (PWS-IC) is quicker to acquire a preference for, and consume more of a palatable food. Here we examined how the use of this palatable food as a reinforcer influences learning in PWS-IC mice performing a simple appetitive learning task. On a nonspatial maze-based task, PWS-IC mice acquired criteria much quicker, making fewer errors during initial acquisition and also reversal learning. A manipulation where the reinforcer was devalued impaired wild-type performance but had no effect on PWS-IC mice. This suggests that increased motivation for the reinforcer in PWS-IC mice may underlie their enhanced learning. This supports previous findings in PWS patients and is the first behavioral study of an animal model of PWS in which the motivation of behavior by food rewards has been examined. © 2012 American Psychological Association

  18. High-strength N-methyl-2-pyrrolidone-containing process wastewater treatment using sequencing batch reactor and membrane bioreactor: A feasibility study.

    PubMed

    Loh, Chun Heng; Wu, Bing; Ge, Liya; Pan, Chaozhi; Wang, Rong

    2018-03-01

    N-methyl-2-pyrrolidone (NMP) is widely used as a solvent in polymeric membrane fabrication process, its elimination from the process wastewater (normally at a high concentration > 1000 mg/L) prior to discharge is essential because of environmental concern. This study investigated the feasibility of treating high-strength NMP-containing process wastewater in a sequencing batch reactor (SBR; i.e., batch feeding and intermittent aerobic/anoxic condition) and a membrane bioreactor (MBR; i.e., continuous feeding and aeration), respectively. The results showed that the SBR with the acclimated sludge was capable of removing >90% of dissolved organic carbon (DOC) and almost 98% of NMP within 2 h. In contrast, the MBR with the acclimated sludge showed a decreasing NMP removal efficiency from 100% to 40% over 15-day operation. The HPLC and LC-MS/MS analytical results showed that NMP degradation in SBR and MBR could undergo different pathways. This may be attributed to the dissimilar bacterial community compositions in the SBR and MBR as identified by 16s rRNA gene sequencing analysis. Interestingly, the NMP-degrading capability of the activated sludge derived from MBR could be recovered to >98% after they were operated at the SBR mode (batch feeding mode with intermittent aerobic/anoxic condition). This study reveals that SBR is probably a more feasible process to treat high-strength NMP-containing wastewater, but residual NMP metabolites in the SBR effluent need to be post-treated by an oxidation or adsorption process in order to achieve zero-discharge of toxic chemicals. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Economic decision-making in the ultimatum game by smokers.

    PubMed

    Takahashi, Taiki

    2007-10-01

    No study to date compared degrees of inequity aversion in economic decision-making in the ultimatum game between non-addictive and addictive reinforcers. The comparison is potentially important in neuroeconomics and reinforcement learning theory of addiction. We compared the degrees of inequity aversion in the ultimatum game between money and cigarettes in habitual smokers. Smokers avoided inequity in the ultimatum game more dramatically for money than for cigarettes; i.e., there was a "domain effect" in decision-making in the ultimatum game. Reward-processing neural activities in the brain for non-addictive and addictive reinforcers may be distinct and the insula activation due to cue-induced craving may conflict with unfair offer-induced insula activation. Future studies in neuroeconomics of addiction should employ game-theoretic decision tasks for elucidating reinforcement learning processes in dopaminergic neural circuits.

  20. Reinforced two-step-ahead weight adjustment technique for online training of recurrent neural networks.

    PubMed

    Chang, Li-Chiu; Chen, Pin-An; Chang, Fi-John

    2012-08-01

    A reliable forecast of future events possesses great value. The main purpose of this paper is to propose an innovative learning technique for reinforcing the accuracy of two-step-ahead (2SA) forecasts. The real-time recurrent learning (RTRL) algorithm for recurrent neural networks (RNNs) can effectively model the dynamics of complex processes and has been used successfully in one-step-ahead forecasts for various time series. A reinforced RTRL algorithm for 2SA forecasts using RNNs is proposed in this paper, and its performance is investigated by two famous benchmark time series and a streamflow during flood events in Taiwan. Results demonstrate that the proposed reinforced 2SA RTRL algorithm for RNNs can adequately forecast the benchmark (theoretical) time series, significantly improve the accuracy of flood forecasts, and effectively reduce time-lag effects.

Top