ensemble learning approach: Topics by Science.gov

Sample records for ensemble learning approach

An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets.

PubMed

Stanescu, Ana; Caragea, Doina

2015-01-01

Recent biochemical advances have led to inexpensive, time-efficient production of massive volumes of raw genomic data. Traditional machine learning approaches to genome annotation typically rely on large amounts of labeled data. The process of labeling data can be expensive, as it requires domain knowledge and expert involvement. Semi-supervised learning approaches that can make use of unlabeled data, in addition to small amounts of labeled data, can help reduce the costs associated with labeling. In this context, we focus on the problem of predicting splice sites in a genome using semi-supervised learning approaches. This is a challenging problem, due to the highly imbalanced distribution of the data, i.e., small number of splice sites as compared to the number of non-splice sites. To address this challenge, we propose to use ensembles of semi-supervised classifiers, specifically self-training and co-training classifiers. Our experiments on five highly imbalanced splice site datasets, with positive to negative ratios of 1-to-99, showed that the ensemble-based semi-supervised approaches represent a good choice, even when the amount of labeled data consists of less than 1% of all training data. In particular, we found that ensembles of co-training and self-training classifiers that dynamically balance the set of labeled instances during the semi-supervised iterations show improvements over the corresponding supervised ensemble baselines. In the presence of limited amounts of labeled data, ensemble-based semi-supervised approaches can successfully leverage the unlabeled data to enhance supervised ensembles learned from highly imbalanced data distributions. Given that such distributions are common for many biological sequence classification problems, our work can be seen as a stepping stone towards more sophisticated ensemble-based approaches to biological sequence annotation in a semi-supervised framework.
An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets

PubMed Central

2015-01-01

Background Recent biochemical advances have led to inexpensive, time-efficient production of massive volumes of raw genomic data. Traditional machine learning approaches to genome annotation typically rely on large amounts of labeled data. The process of labeling data can be expensive, as it requires domain knowledge and expert involvement. Semi-supervised learning approaches that can make use of unlabeled data, in addition to small amounts of labeled data, can help reduce the costs associated with labeling. In this context, we focus on the problem of predicting splice sites in a genome using semi-supervised learning approaches. This is a challenging problem, due to the highly imbalanced distribution of the data, i.e., small number of splice sites as compared to the number of non-splice sites. To address this challenge, we propose to use ensembles of semi-supervised classifiers, specifically self-training and co-training classifiers. Results Our experiments on five highly imbalanced splice site datasets, with positive to negative ratios of 1-to-99, showed that the ensemble-based semi-supervised approaches represent a good choice, even when the amount of labeled data consists of less than 1% of all training data. In particular, we found that ensembles of co-training and self-training classifiers that dynamically balance the set of labeled instances during the semi-supervised iterations show improvements over the corresponding supervised ensemble baselines. Conclusions In the presence of limited amounts of labeled data, ensemble-based semi-supervised approaches can successfully leverage the unlabeled data to enhance supervised ensembles learned from highly imbalanced data distributions. Given that such distributions are common for many biological sequence classification problems, our work can be seen as a stepping stone towards more sophisticated ensemble-based approaches to biological sequence annotation in a semi-supervised framework. PMID:26356316
Decision tree and ensemble learning algorithms with their applications in bioinformatics.

PubMed

Che, Dongsheng; Liu, Qi; Rasheed, Khaled; Tao, Xiuping

2011-01-01

Machine learning approaches have wide applications in bioinformatics, and decision tree is one of the successful approaches applied in this field. In this chapter, we briefly review decision tree and related ensemble algorithms and show the successful applications of such approaches on solving biological problems. We hope that by learning the algorithms of decision trees and ensemble classifiers, biologists can get the basic ideas of how machine learning algorithms work. On the other hand, by being exposed to the applications of decision trees and ensemble algorithms in bioinformatics, computer scientists can get better ideas of which bioinformatics topics they may work on in their future research directions. We aim to provide a platform to bridge the gap between biologists and computer scientists.
Argumentation Based Joint Learning: A Novel Ensemble Learning Approach

PubMed Central

Xu, Junyi; Yao, Li; Li, Le

2015-01-01

Recently, ensemble learning methods have been widely used to improve classification performance in machine learning. In this paper, we present a novel ensemble learning method: argumentation based multi-agent joint learning (AMAJL), which integrates ideas from multi-agent argumentation, ensemble learning, and association rule mining. In AMAJL, argumentation technology is introduced as an ensemble strategy to integrate multiple base classifiers and generate a high performance ensemble classifier. We design an argumentation framework named Arena as a communication platform for knowledge integration. Through argumentation based joint learning, high quality individual knowledge can be extracted, and thus a refined global knowledge base can be generated and used independently for classification. We perform numerous experiments on multiple public datasets using AMAJL and other benchmark methods. The results demonstrate that our method can effectively extract high quality knowledge for ensemble classifier and improve the performance of classification. PMID:25966359
An ensemble rank learning approach for gene prioritization.

PubMed

Lee, Po-Feng; Soo, Von-Wun

2013-01-01

Several different computational approaches have been developed to solve the gene prioritization problem. We intend to use the ensemble boosting learning techniques to combine variant computational approaches for gene prioritization in order to improve the overall performance. In particular we add a heuristic weighting function to the Rankboost algorithm according to: 1) the absolute ranks generated by the adopted methods for a certain gene, and 2) the ranking relationship between all gene-pairs from each prioritization result. We select 13 known prostate cancer genes in OMIM database as training set and protein coding gene data in HGNC database as test set. We adopt the leave-one-out strategy for the ensemble rank boosting learning. The experimental results show that our ensemble learning approach outperforms the four gene-prioritization methods in ToppGene suite in the ranking results of the 13 known genes in terms of mean average precision, ROC and AUC measures.
Ensemble learning of inverse probability weights for marginal structural modeling in large observational datasets.

PubMed

Gruber, Susan; Logan, Roger W; Jarrín, Inmaculada; Monge, Susana; Hernán, Miguel A

2015-01-15

Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However, a data-adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V-fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS-MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results. Copyright © 2014 John Wiley & Sons, Ltd.
Ensemble learning of inverse probability weights for marginal structural modeling in large observational datasets

PubMed Central

Gruber, Susan; Logan, Roger W.; Jarrín, Inmaculada; Monge, Susana; Hernán, Miguel A.

2014-01-01

Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However a data-adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V -fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS-MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results. PMID:25316152
Quantum Ensemble Classification: A Sampling-Based Learning Control Approach.

PubMed

Chen, Chunlin; Dong, Daoyi; Qi, Bo; Petersen, Ian R; Rabitz, Herschel

2017-06-01

Quantum ensemble classification (QEC) has significant applications in discrimination of atoms (or molecules), separation of isotopes, and quantum information extraction. However, quantum mechanics forbids deterministic discrimination among nonorthogonal states. The classification of inhomogeneous quantum ensembles is very challenging, since there exist variations in the parameters characterizing the members within different classes. In this paper, we recast QEC as a supervised quantum learning problem. A systematic classification methodology is presented by using a sampling-based learning control (SLC) approach for quantum discrimination. The classification task is accomplished via simultaneously steering members belonging to different classes to their corresponding target states (e.g., mutually orthogonal states). First, a new discrimination method is proposed for two similar quantum systems. Then, an SLC method is presented for QEC. Numerical results demonstrate the effectiveness of the proposed approach for the binary classification of two-level quantum ensembles and the multiclass classification of multilevel quantum ensembles.
Competitive Learning Neural Network Ensemble Weighted by Predicted Performance

ERIC Educational Resources Information Center

Ye, Qiang

2010-01-01

Ensemble approaches have been shown to enhance classification by combining the outputs from a set of voting classifiers. Diversity in error patterns among base classifiers promotes ensemble performance. Multi-task learning is an important characteristic for Neural Network classifiers. Introducing a secondary output unit that receives different…
Evolutionary Ensemble for In Silico Prediction of Ames Test Mutagenicity

NASA Astrophysics Data System (ADS)

Chen, Huanhuan; Yao, Xin

Driven by new regulations and animal welfare, the need to develop in silico models has increased recently as alternative approaches to safety assessment of chemicals without animal testing. This paper describes a novel machine learning ensemble approach to building an in silico model for the prediction of the Ames test mutagenicity, one of a battery of the most commonly used experimental in vitro and in vivo genotoxicity tests for safety evaluation of chemicals. Evolutionary random neural ensemble with negative correlation learning (ERNE) [1] was developed based on neural networks and evolutionary algorithms. ERNE combines the method of bootstrap sampling on training data with the method of random subspace feature selection to ensure diversity in creating individuals within an initial ensemble. Furthermore, while evolving individuals within the ensemble, it makes use of the negative correlation learning, enabling individual NNs to be trained as accurate as possible while still manage to maintain them as diverse as possible. Therefore, the resulting individuals in the final ensemble are capable of cooperating collectively to achieve better generalization of prediction. The empirical experiment suggest that ERNE is an effective ensemble approach for predicting the Ames test mutagenicity of chemicals.
Improving Classification Performance through an Advanced Ensemble Based Heterogeneous Extreme Learning Machines.

PubMed

Abuassba, Adnan O M; Zhang, Dezheng; Luo, Xiong; Shaheryar, Ahmad; Ali, Hazrat

2017-01-01

Extreme Learning Machine (ELM) is a fast-learning algorithm for a single-hidden layer feedforward neural network (SLFN). It often has good generalization performance. However, there are chances that it might overfit the training data due to having more hidden nodes than needed. To address the generalization performance, we use a heterogeneous ensemble approach. We propose an Advanced ELM Ensemble (AELME) for classification, which includes Regularized-ELM, L 2 -norm-optimized ELM (ELML2), and Kernel-ELM. The ensemble is constructed by training a randomly chosen ELM classifier on a subset of training data selected through random resampling. The proposed AELM-Ensemble is evolved by employing an objective function of increasing diversity and accuracy among the final ensemble. Finally, the class label of unseen data is predicted using majority vote approach. Splitting the training data into subsets and incorporation of heterogeneous ELM classifiers result in higher prediction accuracy, better generalization, and a lower number of base classifiers, as compared to other models (Adaboost, Bagging, Dynamic ELM ensemble, data splitting ELM ensemble, and ELM ensemble). The validity of AELME is confirmed through classification on several real-world benchmark datasets.
Improving Classification Performance through an Advanced Ensemble Based Heterogeneous Extreme Learning Machines

PubMed Central

Abuassba, Adnan O. M.; Ali, Hazrat

2017-01-01

Extreme Learning Machine (ELM) is a fast-learning algorithm for a single-hidden layer feedforward neural network (SLFN). It often has good generalization performance. However, there are chances that it might overfit the training data due to having more hidden nodes than needed. To address the generalization performance, we use a heterogeneous ensemble approach. We propose an Advanced ELM Ensemble (AELME) for classification, which includes Regularized-ELM, L2-norm-optimized ELM (ELML2), and Kernel-ELM. The ensemble is constructed by training a randomly chosen ELM classifier on a subset of training data selected through random resampling. The proposed AELM-Ensemble is evolved by employing an objective function of increasing diversity and accuracy among the final ensemble. Finally, the class label of unseen data is predicted using majority vote approach. Splitting the training data into subsets and incorporation of heterogeneous ELM classifiers result in higher prediction accuracy, better generalization, and a lower number of base classifiers, as compared to other models (Adaboost, Bagging, Dynamic ELM ensemble, data splitting ELM ensemble, and ELM ensemble). The validity of AELME is confirmed through classification on several real-world benchmark datasets. PMID:28546808
An Ensemble Approach in Converging Contents of LMS and KMS

ERIC Educational Resources Information Center

Sabitha, A. Sai; Mehrotra, Deepti; Bansal, Abhay

2017-01-01

Currently the challenges in e-Learning are converging the learning content from various sources and managing them within e-learning practices. Data mining learning algorithms can be used and the contents can be converged based on the Metadata of the objects. Ensemble methods use multiple learning algorithms and it can be used to converge the…
Automatic Estimation of Osteoporotic Fracture Cases by Using Ensemble Learning Approaches.

PubMed

Kilic, Niyazi; Hosgormez, Erkan

2016-03-01

Ensemble learning methods are one of the most powerful tools for the pattern classification problems. In this paper, the effects of ensemble learning methods and some physical bone densitometry parameters on osteoporotic fracture detection were investigated. Six feature set models were constructed including different physical parameters and they fed into the ensemble classifiers as input features. As ensemble learning techniques, bagging, gradient boosting and random subspace (RSM) were used. Instance based learning (IBk) and random forest (RF) classifiers applied to six feature set models. The patients were classified into three groups such as osteoporosis, osteopenia and control (healthy), using ensemble classifiers. Total classification accuracy and f-measure were also used to evaluate diagnostic performance of the proposed ensemble classification system. The classification accuracy has reached to 98.85 % by the combination of model 6 (five BMD + five T-score values) using RSM-RF classifier. The findings of this paper suggest that the patients will be able to be warned before a bone fracture occurred, by just examining some physical parameters that can easily be measured without invasive operations.
Ensemble learning with trees and rules: supervised, semi-supervised, unsupervised

USDA-ARS?s Scientific Manuscript database

In this article, we propose several new approaches for post processing a large ensemble of conjunctive rules for supervised and semi-supervised learning problems. We show with various examples that for high dimensional regression problems the models constructed by the post processing the rules with ...
Multiple-instance ensemble learning for hyperspectral images

NASA Astrophysics Data System (ADS)

Ergul, Ugur; Bilgin, Gokhan

2017-10-01

An ensemble framework for multiple-instance (MI) learning (MIL) is introduced for use in hyperspectral images (HSIs) by inspiring the bagging (bootstrap aggregation) method in ensemble learning. Ensemble-based bagging is performed by a small percentage of training samples, and MI bags are formed by a local windowing process with variable window sizes on selected instances. In addition to bootstrap aggregation, random subspace is another method used to diversify base classifiers. The proposed method is implemented using four MIL classification algorithms. The classifier model learning phase is carried out with MI bags, and the estimation phase is performed over single-test instances. In the experimental part of the study, two different HSIs that have ground-truth information are used, and comparative results are demonstrated with state-of-the-art classification methods. In general, the MI ensemble approach produces more compact results in terms of both diversity and error compared to equipollent non-MIL algorithms.
Exploiting ensemble learning for automatic cataract detection and grading.

PubMed

Yang, Ji-Jiang; Li, Jianqiang; Shen, Ruifang; Zeng, Yang; He, Jian; Bi, Jing; Li, Yong; Zhang, Qinyan; Peng, Lihui; Wang, Qing

2016-02-01

Cataract is defined as a lenticular opacity presenting usually with poor visual acuity. It is one of the most common causes of visual impairment worldwide. Early diagnosis demands the expertise of trained healthcare professionals, which may present a barrier to early intervention due to underlying costs. To date, studies reported in the literature utilize a single learning model for retinal image classification in grading cataract severity. We present an ensemble learning based approach as a means to improving diagnostic accuracy. Three independent feature sets, i.e., wavelet-, sketch-, and texture-based features, are extracted from each fundus image. For each feature set, two base learning models, i.e., Support Vector Machine and Back Propagation Neural Network, are built. Then, the ensemble methods, majority voting and stacking, are investigated to combine the multiple base learning models for final fundus image classification. Empirical experiments are conducted for cataract detection (two-class task, i.e., cataract or non-cataractous) and cataract grading (four-class task, i.e., non-cataractous, mild, moderate or severe) tasks. The best performance of the ensemble classifier is 93.2% and 84.5% in terms of the correct classification rates for cataract detection and grading tasks, respectively. The results demonstrate that the ensemble classifier outperforms the single learning model significantly, which also illustrates the effectiveness of the proposed approach. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Detection of eardrum abnormalities using ensemble deep learning approaches

NASA Astrophysics Data System (ADS)

Senaras, Caglar; Moberly, Aaron C.; Teknos, Theodoros; Essig, Garth; Elmaraghy, Charles; Taj-Schaal, Nazhat; Yua, Lianbo; Gurcan, Metin N.

2018-02-01

In this study, we proposed an approach to report the condition of the eardrum as "normal" or "abnormal" by ensembling two different deep learning architectures. In the first network (Network 1), we applied transfer learning to the Inception V3 network by using 409 labeled samples. As a second network (Network 2), we designed a convolutional neural network to take advantage of auto-encoders by using additional 673 unlabeled eardrum samples. The individual classification accuracies of the Network 1 and Network 2 were calculated as 84.4%(+/- 12.1%) and 82.6% (+/- 11.3%), respectively. Only 32% of the errors of the two networks were the same, making it possible to combine two approaches to achieve better classification accuracy. The proposed ensemble method allows us to achieve robust classification because it has high accuracy (84.4%) with the lowest standard deviation (+/- 10.3%).
New technologies for examining the role of neuronal ensembles in drug addiction and fear.

PubMed

Cruz, Fabio C; Koya, Eisuke; Guez-Barber, Danielle H; Bossert, Jennifer M; Lupica, Carl R; Shaham, Yavin; Hope, Bruce T

2013-11-01

Correlational data suggest that learned associations are encoded within neuronal ensembles. However, it has been difficult to prove that neuronal ensembles mediate learned behaviours because traditional pharmacological and lesion methods, and even newer cell type-specific methods, affect both activated and non-activated neurons. In addition, previous studies on synaptic and molecular alterations induced by learning did not distinguish between behaviourally activated and non-activated neurons. Here, we describe three new approaches--Daun02 inactivation, FACS sorting of activated neurons and Fos-GFP transgenic rats--that have been used to selectively target and study activated neuronal ensembles in models of conditioned drug effects and relapse. We also describe two new tools--Fos-tTA transgenic mice and inactivation of CREB-overexpressing neurons--that have been used to study the role of neuronal ensembles in conditioned fear.
Practice Makes Perfect?: Effective Practice Instruction in Large Ensembles

ERIC Educational Resources Information Center

Prichard, Stephanie

2012-01-01

Helping young musicians learn how to practice effectively is a challenge faced by all music educators. This article presents a system of individual music practice instruction that can be seamlessly integrated within large-ensemble rehearsals. Using a step-by-step approach, large-ensemble conductors can teach students to identify and isolate…

Anomaly Detection Using an Ensemble of Feature Models

PubMed Central

Noto, Keith; Brodley, Carla; Slonim, Donna

2011-01-01

We present a new approach to semi-supervised anomaly detection. Given a set of training examples believed to come from the same distribution or class, the task is to learn a model that will be able to distinguish examples in the future that do not belong to the same class. Traditional approaches typically compare the position of a new data point to the set of “normal” training data points in a chosen representation of the feature space. For some data sets, the normal data may not have discernible positions in feature space, but do have consistent relationships among some features that fail to appear in the anomalous examples. Our approach learns to predict the values of training set features from the values of other features. After we have formed an ensemble of predictors, we apply this ensemble to new data points. To combine the contribution of each predictor in our ensemble, we have developed a novel, information-theoretic anomaly measure that our experimental results show selects against noisy and irrelevant features. Our results on 47 data sets show that for most data sets, this approach significantly improves performance over current state-of-the-art feature space distance and density-based approaches. PMID:22020249
Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping.

PubMed

Shafizadeh-Moghadam, Hossein; Valavi, Roozbeh; Shahabi, Himan; Chapi, Kamran; Shirzadi, Ataollah

2018-07-01

In this research, eight individual machine learning and statistical models are implemented and compared, and based on their results, seven ensemble models for flood susceptibility assessment are introduced. The individual models included artificial neural networks, classification and regression trees, flexible discriminant analysis, generalized linear model, generalized additive model, boosted regression trees, multivariate adaptive regression splines, and maximum entropy, and the ensemble models were Ensemble Model committee averaging (EMca), Ensemble Model confidence interval Inferior (EMciInf), Ensemble Model confidence interval Superior (EMciSup), Ensemble Model to estimate the coefficient of variation (EMcv), Ensemble Model to estimate the mean (EMmean), Ensemble Model to estimate the median (EMmedian), and Ensemble Model based on weighted mean (EMwmean). The data set covered 201 flood events in the Haraz watershed (Mazandaran province in Iran) and 10,000 randomly selected non-occurrence points. Among the individual models, the Area Under the Receiver Operating Characteristic (AUROC), which showed the highest value, belonged to boosted regression trees (0.975) and the lowest value was recorded for generalized linear model (0.642). On the other hand, the proposed EMmedian resulted in the highest accuracy (0.976) among all models. In spite of the outstanding performance of some models, nevertheless, variability among the prediction of individual models was considerable. Therefore, to reduce uncertainty, creating more generalizable, more stable, and less sensitive models, ensemble forecasting approaches and in particular the EMmedian is recommended for flood susceptibility assessment. Copyright © 2018 Elsevier Ltd. All rights reserved.
Prediction of drug synergy in cancer using ensemble-based machine learning techniques

NASA Astrophysics Data System (ADS)

Singh, Harpreet; Rana, Prashant Singh; Singh, Urvinder

2018-04-01

Drug synergy prediction plays a significant role in the medical field for inhibiting specific cancer agents. It can be developed as a pre-processing tool for therapeutic successes. Examination of different drug-drug interaction can be done by drug synergy score. It needs efficient regression-based machine learning approaches to minimize the prediction errors. Numerous machine learning techniques such as neural networks, support vector machines, random forests, LASSO, Elastic Nets, etc., have been used in the past to realize requirement as mentioned above. However, these techniques individually do not provide significant accuracy in drug synergy score. Therefore, the primary objective of this paper is to design a neuro-fuzzy-based ensembling approach. To achieve this, nine well-known machine learning techniques have been implemented by considering the drug synergy data. Based on the accuracy of each model, four techniques with high accuracy are selected to develop ensemble-based machine learning model. These models are Random forest, Fuzzy Rules Using Genetic Cooperative-Competitive Learning method (GFS.GCCL), Adaptive-Network-Based Fuzzy Inference System (ANFIS) and Dynamic Evolving Neural-Fuzzy Inference System method (DENFIS). Ensembling is achieved by evaluating the biased weighted aggregation (i.e. adding more weights to the model with a higher prediction score) of predicted data by selected models. The proposed and existing machine learning techniques have been evaluated on drug synergy score data. The comparative analysis reveals that the proposed method outperforms others in terms of accuracy, root mean square error and coefficient of correlation.
Comparison of different deep learning approaches for parotid gland segmentation from CT images

NASA Astrophysics Data System (ADS)

Hänsch, Annika; Schwier, Michael; Gass, Tobias; Morgas, Tomasz; Haas, Benjamin; Klein, Jan; Hahn, Horst K.

2018-02-01

The segmentation of target structures and organs at risk is a crucial and very time-consuming step in radiotherapy planning. Good automatic methods can significantly reduce the time clinicians have to spend on this task. Due to its variability in shape and often low contrast to surrounding structures, segmentation of the parotid gland is especially challenging. Motivated by the recent success of deep learning, we study different deep learning approaches for parotid gland segmentation. Particularly, we compare 2D, 2D ensemble and 3D U-Net approaches and find that the 2D U-Net ensemble yields the best results with a mean Dice score of 0.817 on our test data. The ensemble approach reduces false positives without the need for an automatic region of interest detection. We also apply our trained 2D U-Net ensemble to segment the test data of the 2015 MICCAI head and neck auto-segmentation challenge. With a mean Dice score of 0.861, our classifier exceeds the highest mean score in the challenge. This shows that the method generalizes well onto data from independent sites. Since appropriate reference annotations are essential for training but often difficult and expensive to obtain, it is important to know how many samples are needed to properly train a neural network. We evaluate the classifier performance after training with differently sized training sets (50-450) and find that 250 cases (without using extensive data augmentation) are sufficient to obtain good results with the 2D ensemble. Adding more samples does not significantly improve the Dice score of the segmentations.
Cluster ensemble based on Random Forests for genetic data.

PubMed

Alhusain, Luluah; Hafez, Alaaeldin M

2017-01-01

Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Advances in DNA sequencing technology have facilitated the obtainment of genetic datasets with exceptional sizes. Genetic data usually contain hundreds of thousands of genetic markers genotyped for thousands of individuals, making an efficient means for handling such data desirable. Random Forests (RFs) has emerged as an efficient algorithm capable of handling high-dimensional data. RFs provides a proximity measure that can capture different levels of co-occurring relationships between variables. RFs has been widely considered a supervised learning method, although it can be converted into an unsupervised learning method. Therefore, RF-derived proximity measure combined with a clustering technique may be well suited for determining the underlying structure of unlabeled data. This paper proposes, RFcluE, a cluster ensemble approach for determining the underlying structure of genetic data based on RFs. The approach comprises a cluster ensemble framework to combine multiple runs of RF clustering. Experiments were conducted on high-dimensional, real genetic dataset to evaluate the proposed approach. The experiments included an examination of the impact of parameter changes, comparing RFcluE performance against other clustering methods, and an assessment of the relationship between the diversity and quality of the ensemble and its effect on RFcluE performance. This paper proposes, RFcluE, a cluster ensemble approach based on RF clustering to address the problem of population structure analysis and demonstrate the effectiveness of the approach. The paper also illustrates that applying a cluster ensemble approach, combining multiple RF clusterings, produces more robust and higher-quality results as a consequence of feeding the ensemble with diverse views of high-dimensional genetic data obtained through bagging and random subspace, the two key features of the RF algorithm.
Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project.

PubMed

Alghamdi, Manal; Al-Mallah, Mouaz; Keteyian, Steven; Brawner, Clinton; Ehrman, Jonathan; Sakr, Sherif

2017-01-01

Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data.
A novel approach to malignant-benign classification of pulmonary nodules by using ensemble learning classifiers.

PubMed

Tartar, A; Akan, A; Kilic, N

2014-01-01

Computer-aided detection systems can help radiologists to detect pulmonary nodules at an early stage. In this paper, a novel Computer-Aided Diagnosis system (CAD) is proposed for the classification of pulmonary nodules as malignant and benign. The proposed CAD system using ensemble learning classifiers, provides an important support to radiologists at the diagnosis process of the disease, achieves high classification performance. The proposed approach with bagging classifier results in 94.7 %, 90.0 % and 77.8 % classification sensitivities for benign, malignant and undetermined classes (89.5 % accuracy), respectively.
A deep learning-based multi-model ensemble method for cancer prediction.

PubMed

Xiao, Yawen; Wu, Jun; Lin, Zongli; Zhao, Xiaodong

2018-01-01

Cancer is a complex worldwide health problem associated with high mortality. With the rapid development of the high-throughput sequencing technology and the application of various machine learning methods that have emerged in recent years, progress in cancer prediction has been increasingly made based on gene expression, providing insight into effective and accurate treatment decision making. Thus, developing machine learning methods, which can successfully distinguish cancer patients from healthy persons, is of great current interest. However, among the classification methods applied to cancer prediction so far, no one method outperforms all the others. In this paper, we demonstrate a new strategy, which applies deep learning to an ensemble approach that incorporates multiple different machine learning models. We supply informative gene data selected by differential gene expression analysis to five different classification models. Then, a deep learning method is employed to ensemble the outputs of the five classifiers. The proposed deep learning-based multi-model ensemble method was tested on three public RNA-seq data sets of three kinds of cancers, Lung Adenocarcinoma, Stomach Adenocarcinoma and Breast Invasive Carcinoma. The test results indicate that it increases the prediction accuracy of cancer for all the tested RNA-seq data sets as compared to using a single classifier or the majority voting algorithm. By taking full advantage of different classifiers, the proposed deep learning-based multi-model ensemble method is shown to be accurate and effective for cancer prediction. Copyright © 2017 Elsevier B.V. All rights reserved.
Ensemble Learning Method for Hidden Markov Models

DTIC Science & Technology

2014-12-01

Ensemble HMM landmine detector Mine signatures vary according to the mine type, mine size , and burial depth. Similarly, clutter signatures vary with soil ...approaches for the di erent K groups depending on their size and homogeneity. In particular, we investigate the maximum likelihood (ML), the minimum...propose using and optimizing various training approaches for the different K groups depending on their size and homogeneity. In particular, we
Active classifier selection for RGB-D object categorization using a Markov random field ensemble method

NASA Astrophysics Data System (ADS)

Durner, Maximilian; Márton, Zoltán.; Hillenbrand, Ulrich; Ali, Haider; Kleinsteuber, Martin

2017-03-01

In this work, a new ensemble method for the task of category recognition in different environments is presented. The focus is on service robotic perception in an open environment, where the robot's task is to recognize previously unseen objects of predefined categories, based on training on a public dataset. We propose an ensemble learning approach to be able to flexibly combine complementary sources of information (different state-of-the-art descriptors computed on color and depth images), based on a Markov Random Field (MRF). By exploiting its specific characteristics, the MRF ensemble method can also be executed as a Dynamic Classifier Selection (DCS) system. In the experiments, the committee- and topology-dependent performance boost of our ensemble is shown. Despite reduced computational costs and using less information, our strategy performs on the same level as common ensemble approaches. Finally, the impact of large differences between datasets is analyzed.
CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods.

PubMed

Zhang, Li; Ai, Haixin; Chen, Wen; Yin, Zimo; Hu, Huan; Zhu, Junfeng; Zhao, Jian; Zhao, Qi; Liu, Hongsheng

2017-05-18

Carcinogenicity refers to a highly toxic end point of certain chemicals, and has become an important issue in the drug development process. In this study, three novel ensemble classification models, namely Ensemble SVM, Ensemble RF, and Ensemble XGBoost, were developed to predict carcinogenicity of chemicals using seven types of molecular fingerprints and three machine learning methods based on a dataset containing 1003 diverse compounds with rat carcinogenicity. Among these three models, Ensemble XGBoost is found to be the best, giving an average accuracy of 70.1 ± 2.9%, sensitivity of 67.0 ± 5.0%, and specificity of 73.1 ± 4.4% in five-fold cross-validation and an accuracy of 70.0%, sensitivity of 65.2%, and specificity of 76.5% in external validation. In comparison with some recent methods, the ensemble models outperform some machine learning-based approaches and yield equal accuracy and higher specificity but lower sensitivity than rule-based expert systems. It is also found that the ensemble models could be further improved if more data were available. As an application, the ensemble models are employed to discover potential carcinogens in the DrugBank database. The results indicate that the proposed models are helpful in predicting the carcinogenicity of chemicals. A web server called CarcinoPred-EL has been built for these models ( http://ccsipb.lnu.edu.cn/toxicity/CarcinoPred-EL/ ).
Robust Framework to Combine Diverse Classifiers Assigning Distributed Confidence to Individual Classifiers at Class Level

PubMed Central

Arshad, Sannia; Rho, Seungmin

2014-01-01

We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes. PMID:25295302
Robust framework to combine diverse classifiers assigning distributed confidence to individual classifiers at class level.

PubMed

Khalid, Shehzad; Arshad, Sannia; Jabbar, Sohail; Rho, Seungmin

2014-01-01

We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes.
A new approach to human microRNA target prediction using ensemble pruning and rotation forest.

PubMed

Mousavi, Reza; Eftekhari, Mahdi; Haghighi, Mehdi Ghezelbash

2015-12-01

MicroRNAs (miRNAs) are small non-coding RNAs that have important functions in gene regulation. Since finding miRNA target experimentally is costly and needs spending much time, the use of machine learning methods is a growing research area for miRNA target prediction. In this paper, a new approach is proposed by using two popular ensemble strategies, i.e. Ensemble Pruning and Rotation Forest (EP-RTF), to predict human miRNA target. For EP, the approach utilizes Genetic Algorithm (GA). In other words, a subset of classifiers from the heterogeneous ensemble is first selected by GA. Next, the selected classifiers are trained based on the RTF method and then are combined using weighted majority voting. In addition to seeking a better subset of classifiers, the parameter of RTF is also optimized by GA. Findings of the present study confirm that the newly developed EP-RTF outperforms (in terms of classification accuracy, sensitivity, and specificity) the previously applied methods over four datasets in the field of human miRNA target. Diversity-error diagrams reveal that the proposed ensemble approach constructs individual classifiers which are more accurate and usually diverse than the other ensemble approaches. Given these experimental results, we highly recommend EP-RTF for improving the performance of miRNA target prediction.
Mapping groundwater contamination risk of multiple aquifers using multi-model ensemble of machine learning algorithms.

PubMed

Barzegar, Rahim; Moghaddam, Asghar Asghari; Deo, Ravinesh; Fijani, Elham; Tziritis, Evangelos

2018-04-15

Constructing accurate and reliable groundwater risk maps provide scientifically prudent and strategic measures for the protection and management of groundwater. The objectives of this paper are to design and validate machine learning based-risk maps using ensemble-based modelling with an integrative approach. We employ the extreme learning machines (ELM), multivariate regression splines (MARS), M5 Tree and support vector regression (SVR) applied in multiple aquifer systems (e.g. unconfined, semi-confined and confined) in the Marand plain, North West Iran, to encapsulate the merits of individual learning algorithms in a final committee-based ANN model. The DRASTIC Vulnerability Index (VI) ranged from 56.7 to 128.1, categorized with no risk, low and moderate vulnerability thresholds. The correlation coefficient (r) and Willmott's Index (d) between NO 3 concentrations and VI were 0.64 and 0.314, respectively. To introduce improvements in the original DRASTIC method, the vulnerability indices were adjusted by NO 3 concentrations, termed as the groundwater contamination risk (GCR). Seven DRASTIC parameters utilized as the model inputs and GCR values utilized as the outputs of individual machine learning models were served in the fully optimized committee-based ANN-predictive model. The correlation indicators demonstrated that the ELM and SVR models outperformed the MARS and M5 Tree models, by virtue of a larger d and r value. Subsequently, the r and d metrics for the ANN-committee based multi-model in the testing phase were 0.8889 and 0.7913, respectively; revealing the superiority of the integrated (or ensemble) machine learning models when compared with the original DRASTIC approach. The newly designed multi-model ensemble-based approach can be considered as a pragmatic step for mapping groundwater contamination risks of multiple aquifer systems with multi-model techniques, yielding the high accuracy of the ANN committee-based model. Copyright © 2017 Elsevier B.V. All rights reserved.
Climatological Observations for Maritime Prediction and Analysis Support Service (COMPASS)

NASA Astrophysics Data System (ADS)

OConnor, A.; Kirtman, B. P.; Harrison, S.; Gorman, J.

2016-02-01

Current US Navy forecasting systems cannot easily incorporate extended-range forecasts that can improve mission readiness and effectiveness; ensure safety; and reduce cost, labor, and resource requirements. If Navy operational planners had systems that incorporated these forecasts, they could plan missions using more reliable and longer-term weather and climate predictions. Further, using multi-model forecast ensembles instead of single forecasts would produce higher predictive performance. Extended-range multi-model forecast ensembles, such as those available in the North American Multi-Model Ensemble (NMME), are ideal for system integration because of their high skill predictions; however, even higher skill predictions can be produced if forecast model ensembles are combined correctly. While many methods for weighting models exist, the best method in a given environment requires expert knowledge of the models and combination methods.We present an innovative approach that uses machine learning to combine extended-range predictions from multi-model forecast ensembles and generate a probabilistic forecast for any region of the globe up to 12 months in advance. Our machine-learning approach uses 30 years of hindcast predictions to learn patterns of forecast model successes and failures. Each model is assigned a weight for each environmental condition, 100 km2 region, and day given any expected environmental information. These weights are then applied to the respective predictions for the region and time of interest to effectively stitch together a single, coherent probabilistic forecast. Our experimental results demonstrate the benefits of our approach to produce extended-range probabilistic forecasts for regions and time periods of interest that are superior, in terms of skill, to individual NMME forecast models and commonly weighted models. The probabilistic forecast leverages the strengths of three NMME forecast models to predict environmental conditions for an area spanning from San Diego, CA to Honolulu, HI, seven months in-advance. Key findings include: weighted combinations of models are strictly better than individual models; machine-learned combinations are especially better; and forecasts produced using our approach have the highest rank probability skill score most often.
Machine learning approaches to evaluate correlation patterns in allosteric signaling: A case study of the PDZ2 domain

NASA Astrophysics Data System (ADS)

Botlani, Mohsen; Siddiqui, Ahnaf; Varma, Sameer

2018-06-01

Many proteins are regulated by dynamic allostery wherein regulator-induced changes in structure are comparable with thermal fluctuations. Consequently, understanding their mechanisms requires assessment of relationships between and within conformational ensembles of different states. Here we show how machine learning based approaches can be used to simplify this high-dimensional data mining task and also obtain mechanistic insight. In particular, we use these approaches to investigate two fundamental questions in dynamic allostery. First, how do regulators modify inter-site correlations in conformational fluctuations (Cij)? Second, how are regulator-induced shifts in conformational ensembles at two different sites in a protein related to each other? We address these questions in the context of the human protein tyrosine phosphatase 1E's PDZ2 domain, which is a model protein for studying dynamic allostery. We use molecular dynamics to generate conformational ensembles of the PDZ2 domain in both the regulator-bound and regulator-free states. The employed protocol reproduces methyl deuterium order parameters from NMR. Results from unsupervised clustering of Cij combined with flow analyses of weighted graphs of Cij show that regulator binding significantly alters the global signaling network in the protein; however, not by altering the spatial arrangement of strongly interacting amino acid clusters but by modifying the connectivity between clusters. Additionally, we find that regulator-induced shifts in conformational ensembles, which we evaluate by repartitioning ensembles using supervised learning, are, in fact, correlated. This correlation Δij is less extensive compared to Cij, but in contrast to Cij, Δij depends inversely on the distance from the regulator binding site. Assuming that Δij is an indicator of the transduction of the regulatory signal leads to the conclusion that the regulatory signal weakens with distance from the regulatory site. Overall, this work provides new approaches to analyze high-dimensional molecular simulation data and also presents applications that yield new insight into dynamic allostery.
Hierarchical Ensemble Methods for Protein Function Prediction

PubMed Central

2014-01-01

Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware “flat” prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a “consensus” ensemble decision, taking into account the hierarchical relationships between classes. The main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research. PMID:25937954
Dynamic adaptive learning for decision-making supporting systems

NASA Astrophysics Data System (ADS)

He, Haibo; Cao, Yuan; Chen, Sheng; Desai, Sachi; Hohil, Myron E.

2008-03-01

This paper proposes a novel adaptive learning method for data mining in support of decision-making systems. Due to the inherent characteristics of information ambiguity/uncertainty, high dimensionality and noisy in many homeland security and defense applications, such as surveillances, monitoring, net-centric battlefield, and others, it is critical to develop autonomous learning methods to efficiently learn useful information from raw data to help the decision making process. The proposed method is based on a dynamic learning principle in the feature spaces. Generally speaking, conventional approaches of learning from high dimensional data sets include various feature extraction (principal component analysis, wavelet transform, and others) and feature selection (embedded approach, wrapper approach, filter approach, and others) methods. However, very limited understandings of adaptive learning from different feature spaces have been achieved. We propose an integrative approach that takes advantages of feature selection and hypothesis ensemble techniques to achieve our goal. Based on the training data distributions, a feature score function is used to provide a measurement of the importance of different features for learning purpose. Then multiple hypotheses are iteratively developed in different feature spaces according to their learning capabilities. Unlike the pre-set iteration steps in many of the existing ensemble learning approaches, such as adaptive boosting (AdaBoost) method, the iterative learning process will automatically stop when the intelligent system can not provide a better understanding than a random guess in that particular subset of feature spaces. Finally, a voting algorithm is used to combine all the decisions from different hypotheses to provide the final prediction results. Simulation analyses of the proposed method on classification of different US military aircraft databases show the effectiveness of this method.
Using ensemble of classifiers for predicting HIV protease cleavage sites in proteins.

PubMed

Nanni, Loris; Lumini, Alessandra

2009-03-01

The focus of this work is the use of ensembles of classifiers for predicting HIV protease cleavage sites in proteins. Due to the complex relationships in the biological data, several recent works show that often ensembles of learning algorithms outperform stand-alone methods. We show that the fusion of approaches based on different encoding models can be useful for improving the performance of this classification problem. In particular, in this work four different feature encodings for peptides are described and tested. An extensive evaluation on a large dataset according to a blind testing protocol is reported which demonstrates how different feature extraction methods and classifiers can be combined for obtaining a robust and reliable system. The comparison with other stand-alone approaches allows quantifying the performance improvement obtained by the ensembles proposed in this work.

Learning of Rule Ensembles for Multiple Attribute Ranking Problems

NASA Astrophysics Data System (ADS)

Dembczyński, Krzysztof; Kotłowski, Wojciech; Słowiński, Roman; Szeląg, Marcin

In this paper, we consider the multiple attribute ranking problem from a Machine Learning perspective. We propose two approaches to statistical learning of an ensemble of decision rules from decision examples provided by the Decision Maker in terms of pairwise comparisons of some objects. The first approach consists in learning a preference function defining a binary preference relation for a pair of objects. The result of application of this function on all pairs of objects to be ranked is then exploited using the Net Flow Score procedure, giving a linear ranking of objects. The second approach consists in learning a utility function for single objects. The utility function also gives a linear ranking of objects. In both approaches, the learning is based on the boosting technique. The presented approaches to Preference Learning share good properties of the decision rule preference model and have good performance in the massive-data learning problems. As Preference Learning and Multiple Attribute Decision Aiding share many concepts and methodological issues, in the introduction, we review some aspects bridging these two fields. To illustrate the two approaches proposed in this paper, we solve with them a toy example concerning the ranking of a set of cars evaluated by multiple attributes. Then, we perform a large data experiment on real data sets. The first data set concerns credit rating. Since recent research in the field of Preference Learning is motivated by the increasing role of modeling preferences in recommender systems and information retrieval, we chose two other massive data sets from this area - one comes from movie recommender system MovieLens, and the other concerns ranking of text documents from 20 Newsgroups data set.
Time Series Forecasting of Daily Reference Evapotranspiration by Neural Network Ensemble Learning for Irrigation System

NASA Astrophysics Data System (ADS)

Manikumari, N.; Murugappan, A.; Vinodhini, G.

2017-07-01

Time series forecasting has gained remarkable interest of researchers in the last few decades. Neural networks based time series forecasting have been employed in various application areas. Reference Evapotranspiration (ETO) is one of the most important components of the hydrologic cycle and its precise assessment is vital in water balance and crop yield estimation, water resources system design and management. This work aimed at achieving accurate time series forecast of ETO using a combination of neural network approaches. This work was carried out using data collected in the command area of VEERANAM Tank during the period 2004 - 2014 in India. In this work, the Neural Network (NN) models were combined by ensemble learning in order to improve the accuracy for forecasting Daily ETO (for the year 2015). Bagged Neural Network (Bagged-NN) and Boosted Neural Network (Boosted-NN) ensemble learning were employed. It has been proved that Bagged-NN and Boosted-NN ensemble models are better than individual NN models in terms of accuracy. Among the ensemble models, Boosted-NN reduces the forecasting errors compared to Bagged-NN and individual NNs. Regression co-efficient, Mean Absolute Deviation, Mean Absolute Percentage error and Root Mean Square Error also ascertain that Boosted-NN lead to improved ETO forecasting performance.
Predicting protein function and other biomedical characteristics with heterogeneous ensembles

PubMed Central

Whalen, Sean; Pandey, Om Prakash

2015-01-01

Prediction problems in biomedical sciences, including protein function prediction (PFP), are generally quite difficult. This is due in part to incomplete knowledge of the cellular phenomenon of interest, the appropriateness and data quality of the variables and measurements used for prediction, as well as a lack of consensus regarding the ideal predictor for specific problems. In such scenarios, a powerful approach to improving prediction performance is to construct heterogeneous ensemble predictors that combine the output of diverse individual predictors that capture complementary aspects of the problems and/or datasets. In this paper, we demonstrate the potential of such heterogeneous ensembles, derived from stacking and ensemble selection methods, for addressing PFP and other similar biomedical prediction problems. Deeper analysis of these results shows that the superior predictive ability of these methods, especially stacking, can be attributed to their attention to the following aspects of the ensemble learning process: (i) better balance of diversity and performance, (ii) more effective calibration of outputs and (iii) more robust incorporation of additional base predictors. Finally, to make the effective application of heterogeneous ensembles to large complex datasets (big data) feasible, we present DataSink, a distributed ensemble learning framework, and demonstrate its sound scalability using the examined datasets. DataSink is publicly available from https://github.com/shwhalen/datasink. PMID:26342255
Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification.

PubMed

Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo

2016-01-01

Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble's output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) - k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer's disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.
A Symbiotic Framework for coupling Machine Learning and Geosciences in Prediction and Predictability

NASA Astrophysics Data System (ADS)

Ravela, S.

2017-12-01

In this presentation we review the two directions of a symbiotic relationship between machine learning and the geosciences in relation to prediction and predictability. In the first direction, we develop ensemble, information theoretic and manifold learning framework to adaptively improve state and parameter estimates in nonlinear high-dimensional non-Gaussian problems, showing in particular that tractable variational approaches can be produced. We demonstrate these applications in the context of autonomous mapping of environmental coherent structures and other idealized problems. In the reverse direction, we show that data assimilation, particularly probabilistic approaches for filtering and smoothing offer a novel and useful way to train neural networks, and serve as a better basis than gradient based approaches when we must quantify uncertainty in association with nonlinear, chaotic processes. In many inference problems in geosciences we seek to build reduced models to characterize local sensitivies, adjoints or other mechanisms that propagate innovations and errors. Here, the particular use of neural approaches for such propagation trained using ensemble data assimilation provides a novel framework. Through these two examples of inference problems in the earth sciences, we show that not only is learning useful to broaden existing methodology, but in reverse, geophysical methodology can be used to influence paradigms in learning.
Classifying injury narratives of large administrative databases for surveillance-A practical approach combining machine learning ensembles and human review.

PubMed

Marucci-Wellman, Helen R; Corns, Helen L; Lehto, Mark R

2017-01-01

Injury narratives are now available real time and include useful information for injury surveillance and prevention. However, manual classification of the cause or events leading to injury found in large batches of narratives, such as workers compensation claims databases, can be prohibitive. In this study we compare the utility of four machine learning algorithms (Naïve Bayes, Single word and Bi-gram models, Support Vector Machine and Logistic Regression) for classifying narratives into Bureau of Labor Statistics Occupational Injury and Illness event leading to injury classifications for a large workers compensation database. These algorithms are known to do well classifying narrative text and are fairly easy to implement with off-the-shelf software packages such as Python. We propose human-machine learning ensemble approaches which maximize the power and accuracy of the algorithms for machine-assigned codes and allow for strategic filtering of rare, emerging or ambiguous narratives for manual review. We compare human-machine approaches based on filtering on the prediction strength of the classifier vs. agreement between algorithms. Regularized Logistic Regression (LR) was the best performing algorithm alone. Using this algorithm and filtering out the bottom 30% of predictions for manual review resulted in high accuracy (overall sensitivity/positive predictive value of 0.89) of the final machine-human coded dataset. The best pairings of algorithms included Naïve Bayes with Support Vector Machine whereby the triple ensemble NB SW =NB BI-GRAM =SVM had very high performance (0.93 overall sensitivity/positive predictive value and high accuracy (i.e. high sensitivity and positive predictive values)) across both large and small categories leaving 41% of the narratives for manual review. Integrating LR into this ensemble mix improved performance only slightly. For large administrative datasets we propose incorporation of methods based on human-machine pairings such as we have done here, utilizing readily-available off-the-shelf machine learning techniques and resulting in only a fraction of narratives that require manual review. Human-machine ensemble methods are likely to improve performance over total manual coding. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data.

PubMed

Ali, Safdar; Majid, Abdul; Javed, Syed Gibran; Sattar, Mohsin

2016-06-01

Early prediction of breast cancer is important for effective treatment and survival. We developed an effective Cost-Sensitive Classifier with GentleBoost Ensemble (Can-CSC-GBE) for the classification of breast cancer using protein amino acid features. In this work, first, discriminant information of the protein sequences related to breast tissue is extracted. Then, the physicochemical properties hydrophobicity and hydrophilicity of amino acids are employed to generate molecule descriptors in different feature spaces. For comparison, we obtained results by combining Cost-Sensitive learning with conventional ensemble of AdaBoostM1 and Bagging. The proposed Can-CSC-GBE system has effectively reduced the misclassification costs and thereby improved the overall classification performance. Our novel approach has highlighted promising results as compared to the state-of-the-art ensemble approaches. Copyright © 2016 Elsevier Ltd. All rights reserved.
A comparative study of breast cancer diagnosis based on neural network ensemble via improved training algorithms.

PubMed

Azami, Hamed; Escudero, Javier

2015-08-01

Breast cancer is one of the most common types of cancer in women all over the world. Early diagnosis of this kind of cancer can significantly increase the chances of long-term survival. Since diagnosis of breast cancer is a complex problem, neural network (NN) approaches have been used as a promising solution. Considering the low speed of the back-propagation (BP) algorithm to train a feed-forward NN, we consider a number of improved NN trainings for the Wisconsin breast cancer dataset: BP with momentum, BP with adaptive learning rate, BP with adaptive learning rate and momentum, Polak-Ribikre conjugate gradient algorithm (CGA), Fletcher-Reeves CGA, Powell-Beale CGA, scaled CGA, resilient BP (RBP), one-step secant and quasi-Newton methods. An NN ensemble, which is a learning paradigm to combine a number of NN outputs, is used to improve the accuracy of the classification task. Results demonstrate that NN ensemble-based classification methods have better performance than NN-based algorithms. The highest overall average accuracy is 97.68% obtained by NN ensemble trained by RBP for 50%-50% training-test evaluation method.
Distinct contributions of attention and working memory to visual statistical learning and ensemble processing.

PubMed

Hall, Michelle G; Mattingley, Jason B; Dux, Paul E

2015-08-01

The brain exploits redundancies in the environment to efficiently represent the complexity of the visual world. One example of this is ensemble processing, which provides a statistical summary of elements within a set (e.g., mean size). Another is statistical learning, which involves the encoding of stable spatial or temporal relationships between objects. It has been suggested that ensemble processing over arrays of oriented lines disrupts statistical learning of structure within the arrays (Zhao, Ngo, McKendrick, & Turk-Browne, 2011). Here we asked whether ensemble processing and statistical learning are mutually incompatible, or whether this disruption might occur because ensemble processing encourages participants to process the stimulus arrays in a way that impedes statistical learning. In Experiment 1, we replicated Zhao and colleagues' finding that ensemble processing disrupts statistical learning. In Experiments 2 and 3, we found that statistical learning was unimpaired by ensemble processing when task demands necessitated (a) focal attention to individual items within the stimulus arrays and (b) the retention of individual items in working memory. Together, these results are consistent with an account suggesting that ensemble processing and statistical learning can operate over the same stimuli given appropriate stimulus processing demands during exposure to regularities. (c) 2015 APA, all rights reserved).
A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance

PubMed Central

Song, Ge

2014-01-01

Textual stream classification has become a realistic and challenging issue since large-scale, high-dimensional, and non-stationary streams with class imbalance have been widely used in various real-life applications. According to the characters of textual streams, it is technically difficult to deal with the classification of textual stream, especially in imbalanced environment. In this paper, we propose a new ensemble framework, clustering forest, for learning from the textual imbalanced stream with concept drift (CFIM). The CFIM is based on ensemble learning by integrating a set of clustering trees (CTs). An adaptive selection method, which flexibly chooses the useful CTs by the property of the stream, is presented in CFIM. In particular, to deal with the problem of class imbalance, we collect and reuse both rare-class instances and misclassified instances from the historical chunks. Compared to most existing approaches, it is worth pointing out that our approach assumes that both majority class and rareclass may suffer from concept drift. Thus the distribution of resampled instances is similar to the current concept. The effectiveness of CFIM is examined in five real-world textual streams under an imbalanced nonstationary environment. Experimental results demonstrate that CFIM achieves better performance than four state-of-the-art ensemble models. PMID:24982961
A further step toward an optimal ensemble of classifiers for peptide classification, a case study: HIV protease.

PubMed

Nanni, Loris; Lumini, Alessandra

2009-01-01

The focuses of this work are: to propose a novel method for building an ensemble of classifiers for peptide classification based on substitution matrices; to show the importance to select a proper set of the parameters of the classifiers that build the ensemble of learning systems. The HIV-1 protease cleavage site prediction problem is here studied. The results obtained by a blind testing protocol are reported, the comparison with other state-of-the-art approaches, based on ensemble of classifiers, allows to quantify the performance improvement obtained by the systems proposed in this paper. The simulation based on experimentally determined protease cleavage data has demonstrated the success of these new ensemble algorithms. Particularly interesting it is to note that also if the HIV-1 protease cleavage site prediction problem is considered linearly separable we obtain the best performance using an ensemble of non-linear classifiers.
Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification

PubMed Central

Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo

2016-01-01

Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble’s output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) − k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer’s disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases. PMID:26764911
Using a Guided Machine Learning Ensemble Model to Predict Discharge Disposition following Meningioma Resection.

PubMed

Muhlestein, Whitney E; Akagi, Dallin S; Kallos, Justiss A; Morone, Peter J; Weaver, Kyle D; Thompson, Reid C; Chambless, Lola B

2018-04-01

Objective Machine learning (ML) algorithms are powerful tools for predicting patient outcomes. This study pilots a novel approach to algorithm selection and model creation using prediction of discharge disposition following meningioma resection as a proof of concept. Materials and Methods A diversity of ML algorithms were trained on a single-institution database of meningioma patients to predict discharge disposition. Algorithms were ranked by predictive power and top performers were combined to create an ensemble model. The final ensemble was internally validated on never-before-seen data to demonstrate generalizability. The predictive power of the ensemble was compared with a logistic regression. Further analyses were performed to identify how important variables impact the ensemble. Results Our ensemble model predicted disposition significantly better than a logistic regression (area under the curve of 0.78 and 0.71, respectively, p = 0.01). Tumor size, presentation at the emergency department, body mass index, convexity location, and preoperative motor deficit most strongly influence the model, though the independent impact of individual variables is nuanced. Conclusion Using a novel ML technique, we built a guided ML ensemble model that predicts discharge destination following meningioma resection with greater predictive power than a logistic regression, and that provides greater clinical insight than a univariate analysis. These techniques can be extended to predict many other patient outcomes of interest.
Prognostics of Proton Exchange Membrane Fuel Cells stack using an ensemble of constraints based connectionist networks

NASA Astrophysics Data System (ADS)

Javed, Kamran; Gouriveau, Rafael; Zerhouni, Noureddine; Hissel, Daniel

2016-08-01

Proton Exchange Membrane Fuel Cell (PEMFC) is considered the most versatile among available fuel cell technologies, which qualify for diverse applications. However, the large-scale industrial deployment of PEMFCs is limited due to their short life span and high exploitation costs. Therefore, ensuring fuel cell service for a long duration is of vital importance, which has led to Prognostics and Health Management of fuel cells. More precisely, prognostics of PEMFC is major area of focus nowadays, which aims at identifying degradation of PEMFC stack at early stages and estimating its Remaining Useful Life (RUL) for life cycle management. This paper presents a data-driven approach for prognostics of PEMFC stack using an ensemble of constraint based Summation Wavelet- Extreme Learning Machine (SW-ELM) models. This development aim at improving the robustness and applicability of prognostics of PEMFC for an online application, with limited learning data. The proposed approach is applied to real data from two different PEMFC stacks and compared with ensembles of well known connectionist algorithms. The results comparison on long-term prognostics of both PEMFC stacks validates our proposition.
Instances selection algorithm by ensemble margin

NASA Astrophysics Data System (ADS)

Saidi, Meryem; Bechar, Mohammed El Amine; Settouti, Nesma; Chikh, Mohamed Amine

2018-05-01

The main limit of data mining algorithms is their inability to deal with the huge amount of available data in a reasonable processing time. A solution of producing fast and accurate results is instances and features selection. This process eliminates noisy or redundant data in order to reduce the storage and computational cost without performances degradation. In this paper, a new instance selection approach called Ensemble Margin Instance Selection (EMIS) algorithm is proposed. This approach is based on the ensemble margin. To evaluate our approach, we have conducted several experiments on different real-world classification problems from UCI Machine learning repository. The pixel-based image segmentation is a field where the storage requirement and computational cost of applied model become higher. To solve these limitations we conduct a study based on the application of EMIS and other instance selection techniques for the segmentation and automatic recognition of white blood cells WBC (nucleus and cytoplasm) in cytological images.
Comparative study of feature selection with ensemble learning using SOM variants

NASA Astrophysics Data System (ADS)

Filali, Ameni; Jlassi, Chiraz; Arous, Najet

2017-03-01

Ensemble learning has succeeded in the growth of stability and clustering accuracy, but their runtime prohibits them from scaling up to real-world applications. This study deals the problem of selecting a subset of the most pertinent features for every cluster from a dataset. The proposed method is another extension of the Random Forests approach using self-organizing maps (SOM) variants to unlabeled data that estimates the out-of-bag feature importance from a set of partitions. Every partition is created using a various bootstrap sample and a random subset of the features. Then, we show that the process internal estimates are used to measure variable pertinence in Random Forests are also applicable to feature selection in unsupervised learning. This approach aims to the dimensionality reduction, visualization and cluster characterization at the same time. Hence, we provide empirical results on nineteen benchmark data sets indicating that RFS can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art unsupervised methods, with a very limited subset of features. The approach proves promise to treat with very broad domains.
Ensemble Methods

NASA Astrophysics Data System (ADS)

Re, Matteo; Valentini, Giorgio

2012-03-01

Ensemble methods are statistical and computational learning procedures reminiscent of the human social learning behavior of seeking several opinions before making any crucial decision. The idea of combining the opinions of different "experts" to obtain an overall “ensemble” decision is rooted in our culture at least from the classical age of ancient Greece, and it has been formalized during the Enlightenment with the Condorcet Jury Theorem[45]), which proved that the judgment of a committee is superior to those of individuals, provided the individuals have reasonable competence. Ensembles are sets of learning machines that combine in some way their decisions, or their learning algorithms, or different views of data, or other specific characteristics to obtain more reliable and more accurate predictions in supervised and unsupervised learning problems [48,116]. A simple example is represented by the majority vote ensemble, by which the decisions of different learning machines are combined, and the class that receives the majority of “votes” (i.e., the class predicted by the majority of the learning machines) is the class predicted by the overall ensemble [158]. In the literature, a plethora of terms other than ensembles has been used, such as fusion, combination, aggregation, and committee, to indicate sets of learning machines that work together to solve a machine learning problem [19,40,56,66,99,108,123], but in this chapter we maintain the term ensemble in its widest meaning, in order to include the whole range of combination methods. Nowadays, ensemble methods represent one of the main current research lines in machine learning [48,116], and the interest of the research community on ensemble methods is witnessed by conferences and workshops specifically devoted to ensembles, first of all the multiple classifier systems (MCS) conference organized by Roli, Kittler, Windeatt, and other researchers of this area [14,62,85,149,173]. Several theories have been proposed to explain the characteristics and the successful application of ensembles to different application domains. For instance, Allwein, Schapire, and Singer interpreted the improved generalization capabilities of ensembles of learning machines in the framework of large margin classifiers [4,177], Kleinberg in the context of stochastic discrimination theory [112], and Breiman and Friedman in the light of the bias-variance analysis borrowed from classical statistics [21,70]. Empirical studies showed that both in classification and regression problems, ensembles improve on single learning machines, and moreover large experimental studies compared the effectiveness of different ensemble methods on benchmark data sets [10,11,49,188]. The interest in this research area is motivated also by the availability of very fast computers and networks of workstations at a relatively low cost that allow the implementation and the experimentation of complex ensemble methods using off-the-shelf computer platforms. However, as explained in Section 26.2 there are deeper reasons to use ensembles of learning machines, motivated by the intrinsic characteristics of the ensemble methods. The main aim of this chapter is to introduce ensemble methods and to provide an overview and a bibliography of the main areas of research, without pretending to be exhaustive or to explain the detailed characteristics of each ensemble method. The paper is organized as follows. In the next section, the main theoretical and practical reasons for combining multiple learners are introduced. Section 26.3 depicts the main taxonomies on ensemble methods proposed in the literature. In Section 26.4 and 26.5, we present an overview of the main supervised ensemble methods reported in the literature, adopting a simple taxonomy, originally proposed in Ref. [201]. Applications of ensemble methods are only marginally considered, but a specific section on some relevant applications of ensemble methods in astronomy and astrophysics has been added (Section 26.6). The conclusion (Section 26.7) ends this paper and lists some issues not covered in this work.
A Random Forest-based ensemble method for activity recognition.

PubMed

Feng, Zengtao; Mo, Lingfei; Li, Meng

2015-01-01

This paper presents a multi-sensor ensemble approach to human physical activity (PA) recognition, using random forest. We designed an ensemble learning algorithm, which integrates several independent Random Forest classifiers based on different sensor feature sets to build a more stable, more accurate and faster classifier for human activity recognition. To evaluate the algorithm, PA data collected from the PAMAP (Physical Activity Monitoring for Aging People), which is a standard, publicly available database, was utilized to train and test. The experimental results show that the algorithm is able to correctly recognize 19 PA types with an accuracy of 93.44%, while the training is faster than others. The ensemble classifier system based on the RF (Random Forest) algorithm can achieve high recognition accuracy and fast calculation.
The search for a hippocampal engram.

PubMed

Mayford, Mark

2014-01-05

Understanding the molecular and cellular changes that underlie memory, the engram, requires the identification, isolation and manipulation of the neurons involved. This presents a major difficulty for complex forms of memory, for example hippocampus-dependent declarative memory, where the participating neurons are likely to be sparse, anatomically distributed and unique to each individual brain and learning event. In this paper, I discuss several new approaches to this problem. In vivo calcium imaging techniques provide a means of assessing the activity patterns of large numbers of neurons over long periods of time with precise anatomical identification. This provides important insight into how the brain represents complex information and how this is altered with learning. The development of techniques for the genetic modification of neural ensembles based on their natural, sensory-evoked, activity along with optogenetics allows direct tests of the coding function of these ensembles. These approaches provide a new methodological framework in which to examine the mechanisms of complex forms of learning at the level of the neurons involved in a specific memory.
The search for a hippocampal engram

PubMed Central

Mayford, Mark

2014-01-01

Understanding the molecular and cellular changes that underlie memory, the engram, requires the identification, isolation and manipulation of the neurons involved. This presents a major difficulty for complex forms of memory, for example hippocampus-dependent declarative memory, where the participating neurons are likely to be sparse, anatomically distributed and unique to each individual brain and learning event. In this paper, I discuss several new approaches to this problem. In vivo calcium imaging techniques provide a means of assessing the activity patterns of large numbers of neurons over long periods of time with precise anatomical identification. This provides important insight into how the brain represents complex information and how this is altered with learning. The development of techniques for the genetic modification of neural ensembles based on their natural, sensory-evoked, activity along with optogenetics allows direct tests of the coding function of these ensembles. These approaches provide a new methodological framework in which to examine the mechanisms of complex forms of learning at the level of the neurons involved in a specific memory. PMID:24298162

Prediction of activity type in preschool children using machine learning techniques.

PubMed

Hagenbuchner, Markus; Cliff, Dylan P; Trost, Stewart G; Van Tuc, Nguyen; Peoples, Gregory E

2015-07-01

Recent research has shown that machine learning techniques can accurately predict activity classes from accelerometer data in adolescents and adults. The purpose of this study is to develop and test machine learning models for predicting activity type in preschool-aged children. Participants completed 12 standardised activity trials (TV, reading, tablet game, quiet play, art, treasure hunt, cleaning up, active game, obstacle course, bicycle riding) over two laboratory visits. Eleven children aged 3-6 years (mean age=4.8±0.87; 55% girls) completed the activity trials while wearing an ActiGraph GT3X+ accelerometer on the right hip. Activities were categorised into five activity classes: sedentary activities, light activities, moderate to vigorous activities, walking, and running. A standard feed-forward Artificial Neural Network and a Deep Learning Ensemble Network were trained on features in the accelerometer data used in previous investigations (10th, 25th, 50th, 75th and 90th percentiles and the lag-one autocorrelation). Overall recognition accuracy for the standard feed forward Artificial Neural Network was 69.7%. Recognition accuracy for sedentary activities, light activities and games, moderate-to-vigorous activities, walking, and running was 82%, 79%, 64%, 36% and 46%, respectively. In comparison, overall recognition accuracy for the Deep Learning Ensemble Network was 82.6%. For sedentary activities, light activities and games, moderate-to-vigorous activities, walking, and running recognition accuracy was 84%, 91%, 79%, 73% and 73%, respectively. Ensemble machine learning approaches such as Deep Learning Ensemble Network can accurately predict activity type from accelerometer data in preschool children. Copyright © 2014 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
A Novel Data-Driven Learning Method for Radar Target Detection in Nonstationary Environments

DTIC Science & Technology

2016-05-01

Classifier ensembles for changing environments,” in Multiple Classifier Systems, vol. 3077, F. Roli, J. Kittler and T. Windeatt, Eds. New York, NY...Dec. 2006, pp. 1113–1118. [21] J. Z. Kolter and M. A. Maloof, “Dynamic weighted majority: An ensemble method for drifting concepts,” J. Mach. Learn...Trans. Neural Netw., vol. 22, no. 10, pp. 1517–1531, Oct. 2011. [23] R. Polikar, “ Ensemble learning,” in Ensemble Machine Learning: Methods and
Extensions and applications of ensemble-of-trees methods in machine learning

NASA Astrophysics Data System (ADS)

Bleich, Justin

Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of violence during probation hearings in court systems.
BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data.

PubMed

Guo, Yang; Liu, Shuhui; Li, Zhanhuai; Shang, Xuequn

2018-04-11

The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest model has been proposed as an alternative of deep neural networks to learn hyper-representations by using cascade ensemble decision trees. It has been proved that the deep forest model has competitive or even better performance than deep neural networks in some extent. However, the standard deep forest model may face overfitting and ensemble diversity challenges when dealing with small sample size and high-dimensional biology data. In this paper, we propose a deep learning model, so-called BCDForest, to address cancer subtype classification on small-scale biology datasets, which can be viewed as a modification of the standard deep forest model. The BCDForest distinguishes from the standard deep forest model with the following two main contributions: First, a named multi-class-grained scanning method is proposed to train multiple binary classifiers to encourage diversity of ensemble. Meanwhile, the fitting quality of each classifier is considered in representation learning. Second, we propose a boosting strategy to emphasize more important features in cascade forests, thus to propagate the benefits of discriminative features among cascade layers to improve the classification performance. Systematic comparison experiments on both microarray and RNA-Seq gene expression datasets demonstrate that our method consistently outperforms the state-of-the-art methods in application of cancer subtype classification. The multi-class-grained scanning and boosting strategy in our model provide an effective solution to ease the overfitting challenge and improve the robustness of deep forest model working on small-scale data. Our model provides a useful approach to the classification of cancer subtypes by using deep learning on high-dimensional and small-scale biology data.
Multi-complexity ensemble measures for gait time series analysis: application to diagnostics, monitoring and biometrics.

PubMed

Gavrishchaka, Valeriy; Senyukova, Olga; Davis, Kristina

2015-01-01

Previously, we have proposed to use complementary complexity measures discovered by boosting-like ensemble learning for the enhancement of quantitative indicators dealing with necessarily short physiological time series. We have confirmed robustness of such multi-complexity measures for heart rate variability analysis with the emphasis on detection of emerging and intermittent cardiac abnormalities. Recently, we presented preliminary results suggesting that such ensemble-based approach could be also effective in discovering universal meta-indicators for early detection and convenient monitoring of neurological abnormalities using gait time series. Here, we argue and demonstrate that these multi-complexity ensemble measures for gait time series analysis could have significantly wider application scope ranging from diagnostics and early detection of physiological regime change to gait-based biometrics applications.
Ensemble approach for differentiation of malignant melanoma

NASA Astrophysics Data System (ADS)

Rastgoo, Mojdeh; Morel, Olivier; Marzani, Franck; Garcia, Rafael

2015-04-01

Melanoma is the deadliest type of skin cancer, yet it is the most treatable kind depending on its early diagnosis. The early prognosis of melanoma is a challenging task for both clinicians and dermatologists. Due to the importance of early diagnosis and in order to assist the dermatologists, we propose an automated framework based on ensemble learning methods and dermoscopy images to differentiate melanoma from dysplastic and benign lesions. The evaluation of our framework on the recent and public dermoscopy benchmark (PH2 dataset) indicates the potential of proposed method. Our evaluation, using only global features, revealed that ensembles such as random forest perform better than single learner. Using random forest ensemble and combination of color and texture features, our framework achieved the highest sensitivity of 94% and specificity of 92%.
Image Change Detection via Ensemble Learning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Martin, Benjamin W; Vatsavai, Raju

2013-01-01

The concept of geographic change detection is relevant in many areas. Changes in geography can reveal much information about a particular location. For example, analysis of changes in geography can identify regions of population growth, change in land use, and potential environmental disturbance. A common way to perform change detection is to use a simple method such as differencing to detect regions of change. Though these techniques are simple, often the application of these techniques is very limited. Recently, use of machine learning methods such as neural networks for change detection has been explored with great success. In this work,more » we explore the use of ensemble learning methodologies for detecting changes in bitemporal synthetic aperture radar (SAR) images. Ensemble learning uses a collection of weak machine learning classifiers to create a stronger classifier which has higher accuracy than the individual classifiers in the ensemble. The strength of the ensemble lies in the fact that the individual classifiers in the ensemble create a mixture of experts in which the final classification made by the ensemble classifier is calculated from the outputs of the individual classifiers. Our methodology leverages this aspect of ensemble learning by training collections of weak decision tree based classifiers to identify regions of change in SAR images collected of a region in the Staten Island, New York area during Hurricane Sandy. Preliminary studies show that the ensemble method has approximately 11.5% higher change detection accuracy than an individual classifier.« less
Approximating prediction uncertainty for random forest regression models

Treesearch

John W. Coulston; Christine E. Blinn; Valerie A. Thomas; Randolph H. Wynne

2016-01-01

Machine learning approaches such as random forest haveÂ increased for the spatial modeling and mapping of continuousÂ variables. Random forest is a non-parametric ensembleÂ approach, and unlike traditional regression approaches thereÂ is no direct quantification of prediction error. UnderstandingÂ prediction uncertainty is important when using model-basedÂ continuous maps as...
Synaptic Ensemble Underlying the Selection and Consolidation of Neuronal Circuits during Learning.

PubMed

Hoshiba, Yoshio; Wada, Takeyoshi; Hayashi-Takagi, Akiko

2017-01-01

Memories are crucial to the cognitive essence of who we are as human beings. Accumulating evidence has suggested that memories are stored as a subset of neurons that probably fire together in the same ensemble. Such formation of cell ensembles must meet contradictory requirements of being plastic and responsive during learning, but also stable in order to maintain the memory. Although synaptic potentiation is presumed to be the cellular substrate for this process, the link between the two remains correlational. With the application of the latest optogenetic tools, it has been possible to collect direct evidence of the contributions of synaptic potentiation in the formation and consolidation of cell ensemble in a learning task specific manner. In this review, we summarize the current view of the causative role of synaptic plasticity as the cellular mechanism underlying the encoding of memory and recalling of learned memories. In particular, we will be focusing on the latest optoprobe developed for the visualization of such "synaptic ensembles." We further discuss how a new synaptic ensemble could contribute to the formation of cell ensembles during learning and memory. With the development and application of novel research tools in the future, studies on synaptic ensembles will pioneer new discoveries, eventually leading to a comprehensive understanding of how the brain works.
Quantum ensembles of quantum classifiers.

PubMed

Schuld, Maria; Petruccione, Francesco

2018-02-09

Quantum machine learning witnesses an increasing amount of quantum algorithms for data-driven decision making, a problem with potential applications ranging from automated image recognition to medical diagnosis. Many of those algorithms are implementations of quantum classifiers, or models for the classification of data inputs with a quantum computer. Following the success of collective decision making with ensembles in classical machine learning, this paper introduces the concept of quantum ensembles of quantum classifiers. Creating the ensemble corresponds to a state preparation routine, after which the quantum classifiers are evaluated in parallel and their combined decision is accessed by a single-qubit measurement. This framework naturally allows for exponentially large ensembles in which - similar to Bayesian learning - the individual classifiers do not have to be trained. As an example, we analyse an exponentially large quantum ensemble in which each classifier is weighed according to its performance in classifying the training data, leading to new results for quantum as well as classical machine learning.
Probabilistic Modeling of Aircraft Trajectories for Dynamic Separation Volumes

NASA Technical Reports Server (NTRS)

Lewis, Timothy A.

2016-01-01

With a proliferation of new and unconventional vehicles and operations expected in the future, the ab initio airspace design will require new approaches to trajectory prediction for separation assurance and other air traffic management functions. This paper presents an approach to probabilistic modeling of the trajectory of an aircraft when its intent is unknown. The approach uses a set of feature functions to constrain a maximum entropy probability distribution based on a set of observed aircraft trajectories. This model can be used to sample new aircraft trajectories to form an ensemble reflecting the variability in an aircraft's intent. The model learning process ensures that the variability in this ensemble reflects the behavior observed in the original data set. Computational examples are presented.
Making Music or Gaining Grades? Assessment Practices in Tertiary Music Ensembles

ERIC Educational Resources Information Center

Harrison, Scott D.; Lebler, Don; Carey, Gemma; Hitchcock, Matt; O'Bryan, Jessica

2013-01-01

Participation in an ensemble is a significant aspect of tertiary music experience. Learning and assessment practices within ensembles have rarely been investigated in Australia and the perceptions of staff and students as to how they learn and are assessed within ensembles remain largely unexplored. This paper reports on part of a larger project…
A Deep Learning Approach to Neuroanatomical Characterisation of Alzheimer's Disease.

PubMed

Ambastha, Abhinit Kumar; Leong, Tze-Yun

2017-01-01

Alzheimer's disease (AD) is a neurological degenerative disorder that leads to progressive mental deterioration. This work introduces a computational approach to improve our understanding of the progression of AD. We use ensemble learning methods and deep neural networks to identify salient structural correlations among brain regions that degenerate together in AD; this provides an understanding of how AD progresses in the brain. The proposed technique has a classification accuracy of 81.79% for AD against healthy subjects using a single modality imaging dataset.
The "Tse Tsa Watle" Speaker Series: An Example of Ensemble Leadership and Generative Adult Learning

ERIC Educational Resources Information Center

McKendry, Virginia

2017-01-01

This chapter examines an Indigenous speaker series formed to foster intercultural partnerships at a Canadian university. Using ensemble leadership and generative learning theories to make sense of the project, the author argues that ensemble leadership is key to designing the generative learning adult learners need in an era of ambiguity.
Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models.

PubMed

Papanikolaou, Yannis; Tsoumakas, Grigorios; Laliotis, Manos; Markantonatos, Nikos; Vlahavas, Ioannis

2017-09-22

In this paper we present the approach that we employed to deal with large scale multi-label semantic indexing of biomedical papers. This work was mainly implemented within the context of the BioASQ challenge (2013-2017), a challenge concerned with biomedical semantic indexing and question answering. Our main contribution is a MUlti-Label Ensemble method (MULE) that incorporates a McNemar statistical significance test in order to validate the combination of the constituent machine learning algorithms. Some secondary contributions include a study on the temporal aspects of the BioASQ corpus (observations apply also to the BioASQ's super-set, the PubMed articles collection) and the proper parametrization of the algorithms used to deal with this challenging classification task. The ensemble method that we developed is compared to other approaches in experimental scenarios with subsets of the BioASQ corpus giving positive results. In our participation in the BioASQ challenge we obtained the first place in 2013 and the second place in the four following years, steadily outperforming MTI, the indexing system of the National Library of Medicine (NLM). The results of our experimental comparisons, suggest that employing a statistical significance test to validate the ensemble method's choices, is the optimal approach for ensembling multi-label classifiers, especially in contexts with many rare labels.
A Machine Learning Framework for Plan Payment Risk Adjustment.

PubMed

Rose, Sherri

2016-12-01

To introduce cross-validation and a nonparametric machine learning framework for plan payment risk adjustment and then assess whether they have the potential to improve risk adjustment. 2011-2012 Truven MarketScan database. We compare the performance of multiple statistical approaches within a broad machine learning framework for estimation of risk adjustment formulas. Total annual expenditure was predicted using age, sex, geography, inpatient diagnoses, and hierarchical condition category variables. The methods included regression, penalized regression, decision trees, neural networks, and an ensemble super learner, all in concert with screening algorithms that reduce the set of variables considered. The performance of these methods was compared based on cross-validated R 2 . Our results indicate that a simplified risk adjustment formula selected via this nonparametric framework maintains much of the efficiency of a traditional larger formula. The ensemble approach also outperformed classical regression and all other algorithms studied. The implementation of cross-validated machine learning techniques provides novel insight into risk adjustment estimation, possibly allowing for a simplified formula, thereby reducing incentives for increased coding intensity as well as the ability of insurers to "game" the system with aggressive diagnostic upcoding. © Health Research and Educational Trust.
Deep learning ensemble with asymptotic techniques for oscillometric blood pressure estimation.

PubMed

Lee, Soojeong; Chang, Joon-Hyuk

2017-11-01

This paper proposes a deep learning based ensemble regression estimator with asymptotic techniques, and offers a method that can decrease uncertainty for oscillometric blood pressure (BP) measurements using the bootstrap and Monte-Carlo approach. While the former is used to estimate SBP and DBP, the latter attempts to determine confidence intervals (CIs) for SBP and DBP based on oscillometric BP measurements. This work originally employs deep belief networks (DBN)-deep neural networks (DNN) to effectively estimate BPs based on oscillometric measurements. However, there are some inherent problems with these methods. First, it is not easy to determine the best DBN-DNN estimator, and worthy information might be omitted when selecting one DBN-DNN estimator and discarding the others. Additionally, our input feature vectors, obtained from only five measurements per subject, represent a very small sample size; this is a critical weakness when using the DBN-DNN technique and can cause overfitting or underfitting, depending on the structure of the algorithm. To address these problems, an ensemble with an asymptotic approach (based on combining the bootstrap with the DBN-DNN technique) is utilized to generate the pseudo features needed to estimate the SBP and DBP. In the first stage, the bootstrap-aggregation technique is used to create ensemble parameters. Afterward, the AdaBoost approach is employed for the second-stage SBP and DBP estimation. We then use the bootstrap and Monte-Carlo techniques in order to determine the CIs based on the target BP estimated using the DBN-DNN ensemble regression estimator with the asymptotic technique in the third stage. The proposed method can mitigate the estimation uncertainty such as large the standard deviation of error (SDE) on comparing the proposed DBN-DNN ensemble regression estimator with the DBN-DNN single regression estimator, we identify that the SDEs of the SBP and DBP are reduced by 0.58 and 0.57  mmHg, respectively. These indicate that the proposed method actually enhances the performance by 9.18% and 10.88% compared with the DBN-DNN single estimator. The proposed methodology improves the accuracy of BP estimation and reduces the uncertainty for BP estimation. Copyright © 2017 Elsevier B.V. All rights reserved.
A fuzzy integral method based on the ensemble of neural networks to analyze fMRI data for cognitive state classification across multiple subjects.

PubMed

Cacha, L A; Parida, S; Dehuri, S; Cho, S-B; Poznanski, R R

2016-12-01

The huge number of voxels in fMRI over time poses a major challenge to for effective analysis. Fast, accurate, and reliable classifiers are required for estimating the decoding accuracy of brain activities. Although machine-learning classifiers seem promising, individual classifiers have their own limitations. To address this limitation, the present paper proposes a method based on the ensemble of neural networks to analyze fMRI data for cognitive state classification for application across multiple subjects. Similarly, the fuzzy integral (FI) approach has been employed as an efficient tool for combining different classifiers. The FI approach led to the development of a classifiers ensemble technique that performs better than any of the single classifier by reducing the misclassification, the bias, and the variance. The proposed method successfully classified the different cognitive states for multiple subjects with high accuracy of classification. Comparison of the performance improvement, while applying ensemble neural networks method, vs. that of the individual neural network strongly points toward the usefulness of the proposed method.
Clustering-Based Ensemble Learning for Activity Recognition in Smart Homes

PubMed Central

Jurek, Anna; Nugent, Chris; Bi, Yaxin; Wu, Shengli

2014-01-01

Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks. PMID:25014095
Clustering-based ensemble learning for activity recognition in smart homes.

PubMed

Jurek, Anna; Nugent, Chris; Bi, Yaxin; Wu, Shengli

2014-07-10

Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks.

Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.

PubMed

Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G

2017-09-01

To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.
An Improved Ensemble Learning Method for Classifying High-Dimensional and Imbalanced Biomedicine Data.

PubMed

Yu, Hualong; Ni, Jun

2014-01-01

Training classifiers on skewed data can be technically challenging tasks, especially if the data is high-dimensional simultaneously, the tasks can become more difficult. In biomedicine field, skewed data type often appears. In this study, we try to deal with this problem by combining asymmetric bagging ensemble classifier (asBagging) that has been presented in previous work and an improved random subspace (RS) generation strategy that is called feature subspace (FSS). Specifically, FSS is a novel method to promote the balance level between accuracy and diversity of base classifiers in asBagging. In view of the strong generalization capability of support vector machine (SVM), we adopt it to be base classifier. Extensive experiments on four benchmark biomedicine data sets indicate that the proposed ensemble learning method outperforms many baseline approaches in terms of Accuracy, F-measure, G-mean and AUC evaluation criterions, thus it can be regarded as an effective and efficient tool to deal with high-dimensional and imbalanced biomedical data.
Protein Sequence Classification with Improved Extreme Learning Machine Algorithms

PubMed Central

2014-01-01

Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms. PMID:24795876
Spam comments prediction using stacking with ensemble learning

NASA Astrophysics Data System (ADS)

Mehmood, Arif; On, Byung-Won; Lee, Ingyu; Ashraf, Imran; Choi, Gyu Sang

2018-01-01

Illusive comments of product or services are misleading for people in decision making. The current methodologies to predict deceptive comments are concerned for feature designing with single training model. Indigenous features have ability to show some linguistic phenomena but are hard to reveal the latent semantic meaning of the comments. We propose a prediction model on general features of documents using stacking with ensemble learning. Term Frequency/Inverse Document Frequency (TF/IDF) features are inputs to stacking of Random Forest and Gradient Boosted Trees and the outputs of the base learners are encapsulated with decision tree to make final training of the model. The results exhibits that our approach gives the accuracy of 92.19% which outperform the state-of-the-art method.
EnsembleGASVR: a novel ensemble method for classifying missense single nucleotide polymorphisms.

PubMed

Rapakoulia, Trisevgeni; Theofilatos, Konstantinos; Kleftogiannis, Dimitrios; Likothanasis, Spiros; Tsakalidis, Athanasios; Mavroudi, Seferina

2014-08-15

Single nucleotide polymorphisms (SNPs) are considered the most frequently occurring DNA sequence variations. Several computational methods have been proposed for the classification of missense SNPs to neutral and disease associated. However, existing computational approaches fail to select relevant features by choosing them arbitrarily without sufficient documentation. Moreover, they are limited to the problem of missing values, imbalance between the learning datasets and most of them do not support their predictions with confidence scores. To overcome these limitations, a novel ensemble computational methodology is proposed. EnsembleGASVR facilitates a two-step algorithm, which in its first step applies a novel evolutionary embedded algorithm to locate close to optimal Support Vector Regression models. In its second step, these models are combined to extract a universal predictor, which is less prone to overfitting issues, systematizes the rebalancing of the learning sets and uses an internal approach for solving the missing values problem without loss of information. Confidence scores support all the predictions and the model becomes tunable by modifying the classification thresholds. An extensive study was performed for collecting the most relevant features for the problem of classifying SNPs, and a superset of 88 features was constructed. Experimental results show that the proposed framework outperforms well-known algorithms in terms of classification performance in the examined datasets. Finally, the proposed algorithmic framework was able to uncover the significant role of certain features such as the solvent accessibility feature, and the top-scored predictions were further validated by linking them with disease phenotypes. Datasets and codes are freely available on the Web at http://prlab.ceid.upatras.gr/EnsembleGASVR/dataset-codes.zip. All the required information about the article is available through http://prlab.ceid.upatras.gr/EnsembleGASVR/site.html. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Universal LD50 predictions using deep learning

EPA Science Inventory

NICEATM Predictive Models for Acute Oral Systemic Toxicity LD50 entry Risa R. Sayre (sayre.risa@epa.gov) & Christopher M. Grulke Our approach uses an ensemble of multilayer perceptron regressions to predict rat acute oral LD50 values from chemical features. Features were genera...
Multiple-Swarm Ensembles: Improving the Predictive Power and Robustness of Predictive Models and Its Use in Computational Biology.

PubMed

Alves, Pedro; Liu, Shuang; Wang, Daifeng; Gerstein, Mark

2018-01-01

Machine learning is an integral part of computational biology, and has already shown its use in various applications, such as prognostic tests. In the last few years in the non-biological machine learning community, ensembling techniques have shown their power in data mining competitions such as the Netflix challenge; however, such methods have not found wide use in computational biology. In this work, we endeavor to show how ensembling techniques can be applied to practical problems, including problems in the field of bioinformatics, and how they often outperform other machine learning techniques in both predictive power and robustness. Furthermore, we develop a methodology of ensembling, Multi-Swarm Ensemble (MSWE) by using multiple particle swarm optimizations and demonstrate its ability to further enhance the performance of ensembles.
Can we use Earth Observations to improve monthly water level forecasts?

NASA Astrophysics Data System (ADS)

Slater, L. J.; Villarini, G.

2017-12-01

Dynamical-statistical hydrologic forecasting approaches benefit from different strengths in comparison with traditional hydrologic forecasting systems: they are computationally efficient, can integrate and `learn' from a broad selection of input data (e.g., General Circulation Model (GCM) forecasts, Earth Observation time series, teleconnection patterns), and can take advantage of recent progress in machine learning (e.g. multi-model blending, post-processing and ensembling techniques). Recent efforts to develop a dynamical-statistical ensemble approach for forecasting seasonal streamflow using both GCM forecasts and changing land cover have shown promising results over the U.S. Midwest. Here, we use climate forecasts from several GCMs of the North American Multi Model Ensemble (NMME) alongside 15-minute stage time series from the National River Flow Archive (NRFA) and land cover classes extracted from the European Space Agency's Climate Change Initiative 300 m annual Global Land Cover time series. With these data, we conduct systematic long-range probabilistic forecasting of monthly water levels in UK catchments over timescales ranging from one to twelve months ahead. We evaluate the improvement in model fit and model forecasting skill that comes from using land cover classes as predictors in the models. This work opens up new possibilities for combining Earth Observation time series with GCM forecasts to predict a variety of hazards from space using data science techniques.
A Hierarchical Multivariate Bayesian Approach to Ensemble Model output Statistics in Atmospheric Prediction

DTIC Science & Technology

2017-09-01

efficacy of statistical post-processing methods downstream of these dynamical model components with a hierarchical multivariate Bayesian approach to...Bayesian hierarchical modeling, Markov chain Monte Carlo methods , Metropolis algorithm, machine learning, atmospheric prediction 15. NUMBER OF PAGES...scale processes. However, this dissertation explores the efficacy of statistical post-processing methods downstream of these dynamical model components
20180411 - Universal LD50 predictions using deep learning (ICCVAM)

EPA Science Inventory

NICEATM Predictive Models for Acute Oral Systemic Toxicity LD50 entry Risa R. Sayre (sayre.risa@epa.gov) & Christopher M. Grulke Our approach uses an ensemble of multilayer perceptron regressions to predict rat acute oral LD50 values from chemical features. Features were gene...
Force Sensor Based Tool Condition Monitoring Using a Heterogeneous Ensemble Learning Model

PubMed Central

Wang, Guofeng; Yang, Yinwei; Li, Zhimeng

2014-01-01

Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability. PMID:25405514
Force sensor based tool condition monitoring using a heterogeneous ensemble learning model.

PubMed

Wang, Guofeng; Yang, Yinwei; Li, Zhimeng

2014-11-14

Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability.
Multiclass cancer classification using a feature subset-based ensemble from microRNA expression profiles.

PubMed

Piao, Yongjun; Piao, Minghao; Ryu, Keun Ho

2017-01-01

Cancer classification has been a crucial topic of research in cancer treatment. In the last decade, messenger RNA (mRNA) expression profiles have been widely used to classify different types of cancers. With the discovery of a new class of small non-coding RNAs; known as microRNAs (miRNAs), various studies have shown that the expression patterns of miRNA can also accurately classify human cancers. Therefore, there is a great demand for the development of machine learning approaches to accurately classify various types of cancers using miRNA expression data. In this article, we propose a feature subset-based ensemble method in which each model is learned from a different projection of the original feature space to classify multiple cancers. In our method, the feature relevance and redundancy are considered to generate multiple feature subsets, the base classifiers are learned from each independent miRNA subset, and the average posterior probability is used to combine the base classifiers. To test the performance of our method, we used bead-based and sequence-based miRNA expression datasets and conducted 10-fold and leave-one-out cross validations. The experimental results show that the proposed method yields good results and has higher prediction accuracy than popular ensemble methods. The Java program and source code of the proposed method and the datasets in the experiments are freely available at https://sourceforge.net/projects/mirna-ensemble/. Copyright © 2016 Elsevier Ltd. All rights reserved.
Multi-categorical deep learning neural network to classify retinal images: A pilot study employing small database.

PubMed

Choi, Joon Yul; Yoo, Tae Keun; Seo, Jeong Gi; Kwak, Jiyong; Um, Terry Taewoong; Rim, Tyler Hyungtaek

2017-01-01

Deep learning emerges as a powerful tool for analyzing medical images. Retinal disease detection by using computer-aided diagnosis from fundus image has emerged as a new method. We applied deep learning convolutional neural network by using MatConvNet for an automated detection of multiple retinal diseases with fundus photographs involved in STructured Analysis of the REtina (STARE) database. Dataset was built by expanding data on 10 categories, including normal retina and nine retinal diseases. The optimal outcomes were acquired by using a random forest transfer learning based on VGG-19 architecture. The classification results depended greatly on the number of categories. As the number of categories increased, the performance of deep learning models was diminished. When all 10 categories were included, we obtained results with an accuracy of 30.5%, relative classifier information (RCI) of 0.052, and Cohen's kappa of 0.224. Considering three integrated normal, background diabetic retinopathy, and dry age-related macular degeneration, the multi-categorical classifier showed accuracy of 72.8%, 0.283 RCI, and 0.577 kappa. In addition, several ensemble classifiers enhanced the multi-categorical classification performance. The transfer learning incorporated with ensemble classifier of clustering and voting approach presented the best performance with accuracy of 36.7%, 0.053 RCI, and 0.225 kappa in the 10 retinal diseases classification problem. First, due to the small size of datasets, the deep learning techniques in this study were ineffective to be applied in clinics where numerous patients suffering from various types of retinal disorders visit for diagnosis and treatment. Second, we found that the transfer learning incorporated with ensemble classifiers can improve the classification performance in order to detect multi-categorical retinal diseases. Further studies should confirm the effectiveness of algorithms with large datasets obtained from hospitals.
Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.

PubMed

Yu, Zhiwen; Chen, Hantao; You, Jane; Han, Guoqiang; Li, Le

2013-01-01

Cancer class discovery using biomolecular data is one of the most important tasks for cancer diagnosis and treatment. Tumor clustering from gene expression data provides a new way to perform cancer class discovery. Most of the existing research works adopt single-clustering algorithms to perform tumor clustering is from biomolecular data that lack robustness, stability, and accuracy. To further improve the performance of tumor clustering from biomolecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from biomolecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks (HFCEF), named as HFCEF-I, HFCEF-II, HFCEF-III, and HFCEF-IV, respectively, to identify samples that belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. HFCEFs adopt suitable consensus functions, such as the fuzzy c-means algorithm or the normalized cut algorithm (Ncut), to summarize generated fuzzy matrices, and obtain the final results. The experiments on real data sets from UCI machine learning repository and cancer gene expression profiles illustrate that 1) the proposed hybrid fuzzy cluster ensemble frameworks work well on real data sets, especially biomolecular data, and 2) the proposed approaches are able to provide more robust, stable, and accurate results when compared with the state-of-the-art single clustering algorithms and traditional cluster ensemble approaches.
Ensemble Clustering using Semidefinite Programming with Applications

PubMed Central

Singh, Vikas; Mukherjee, Lopamudra; Peng, Jiming; Xu, Jinhui

2011-01-01

In this paper, we study the ensemble clustering problem, where the input is in the form of multiple clustering solutions. The goal of ensemble clustering algorithms is to aggregate the solutions into one solution that maximizes the agreement in the input ensemble. We obtain several new results for this problem. Specifically, we show that the notion of agreement under such circumstances can be better captured using a 2D string encoding rather than a voting strategy, which is common among existing approaches. Our optimization proceeds by first constructing a non-linear objective function which is then transformed into a 0–1 Semidefinite program (SDP) using novel convexification techniques. This model can be subsequently relaxed to a polynomial time solvable SDP. In addition to the theoretical contributions, our experimental results on standard machine learning and synthetic datasets show that this approach leads to improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. In addition, we identify several new application scenarios for this problem. These include combining multiple image segmentations and generating tissue maps from multiple-channel Diffusion Tensor brain images to identify the underlying structure of the brain. PMID:21927539
Ensemble Clustering using Semidefinite Programming with Applications.

PubMed

Singh, Vikas; Mukherjee, Lopamudra; Peng, Jiming; Xu, Jinhui

2010-05-01

In this paper, we study the ensemble clustering problem, where the input is in the form of multiple clustering solutions. The goal of ensemble clustering algorithms is to aggregate the solutions into one solution that maximizes the agreement in the input ensemble. We obtain several new results for this problem. Specifically, we show that the notion of agreement under such circumstances can be better captured using a 2D string encoding rather than a voting strategy, which is common among existing approaches. Our optimization proceeds by first constructing a non-linear objective function which is then transformed into a 0-1 Semidefinite program (SDP) using novel convexification techniques. This model can be subsequently relaxed to a polynomial time solvable SDP. In addition to the theoretical contributions, our experimental results on standard machine learning and synthetic datasets show that this approach leads to improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. In addition, we identify several new application scenarios for this problem. These include combining multiple image segmentations and generating tissue maps from multiple-channel Diffusion Tensor brain images to identify the underlying structure of the brain.
A Comparison of the Efficacy of Individual and Collaborative Music Learning in Ensemble Rehearsals

ERIC Educational Resources Information Center

Brandler, Brian J.; Peynircioglu, Zehra F.

2015-01-01

Collaboration is essential in learning ensemble music. It is unclear, however, whether an individual benefits more from collaborative or individual rehearsal in the initial stages of such learning. In nonmusical domains, the effect of collaboration has been mixed, sometimes enhancing and sometimes inhibiting an individual's learning process. In…
Cortical ensemble activity increasingly predicts behaviour outcomes during learning of a motor task

NASA Astrophysics Data System (ADS)

Laubach, Mark; Wessberg, Johan; Nicolelis, Miguel A. L.

2000-06-01

When an animal learns to make movements in response to different stimuli, changes in activity in the motor cortex seem to accompany and underlie this learning. The precise nature of modifications in cortical motor areas during the initial stages of motor learning, however, is largely unknown. Here we address this issue by chronically recording from neuronal ensembles located in the rat motor cortex, throughout the period required for rats to learn a reaction-time task. Motor learning was demonstrated by a decrease in the variance of the rats' reaction times and an increase in the time the animals were able to wait for a trigger stimulus. These behavioural changes were correlated with a significant increase in our ability to predict the correct or incorrect outcome of single trials based on three measures of neuronal ensemble activity: average firing rate, temporal patterns of firing, and correlated firing. This increase in prediction indicates that an association between sensory cues and movement emerged in the motor cortex as the task was learned. Such modifications in cortical ensemble activity may be critical for the initial learning of motor tasks.
An efficient ensemble learning method for gene microarray classification.

PubMed

Osareh, Alireza; Shadgar, Bita

2013-01-01

The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.

Proposed hybrid-classifier ensemble algorithm to map snow cover area

NASA Astrophysics Data System (ADS)

Nijhawan, Rahul; Raman, Balasubramanian; Das, Josodhir

2018-01-01

Metaclassification ensemble approach is known to improve the prediction performance of snow-covered area. The methodology adopted in this case is based on neural network along with four state-of-art machine learning algorithms: support vector machine, artificial neural networks, spectral angle mapper, K-mean clustering, and a snow index: normalized difference snow index. An AdaBoost ensemble algorithm related to decision tree for snow-cover mapping is also proposed. According to available literature, these methods have been rarely used for snow-cover mapping. Employing the above techniques, a study was conducted for Raktavarn and Chaturangi Bamak glaciers, Uttarakhand, Himalaya using multispectral Landsat 7 ETM+ (enhanced thematic mapper) image. The study also compares the results with those obtained from statistical combination methods (majority rule and belief functions) and accuracies of individual classifiers. Accuracy assessment is performed by computing the quantity and allocation disagreement, analyzing statistic measures (accuracy, precision, specificity, AUC, and sensitivity) and receiver operating characteristic curves. A total of 225 combinations of parameters for individual classifiers were trained and tested on the dataset and results were compared with the proposed approach. It was observed that the proposed methodology produced the highest classification accuracy (95.21%), close to (94.01%) that was produced by the proposed AdaBoost ensemble algorithm. From the sets of observations, it was concluded that the ensemble of classifiers produced better results compared to individual classifiers.
Changes in Appetitive Associative Strength Modulates Nucleus Accumbens, But Not Orbitofrontal Cortex Neuronal Ensemble Excitability.

PubMed

Ziminski, Joseph J; Hessler, Sabine; Margetts-Smith, Gabriella; Sieburg, Meike C; Crombag, Hans S; Koya, Eisuke

2017-03-22

Cues that predict the availability of food rewards influence motivational states and elicit food-seeking behaviors. If a cue no longer predicts food availability, then animals may adapt accordingly by inhibiting food-seeking responses. Sparsely activated sets of neurons, coined "neuronal ensembles," have been shown to encode the strength of reward-cue associations. Although alterations in intrinsic excitability have been shown to underlie many learning and memory processes, little is known about these properties specifically on cue-activated neuronal ensembles. We examined the activation patterns of cue-activated orbitofrontal cortex (OFC) and nucleus accumbens (NAc) shell ensembles using wild-type and Fos-GFP mice, which express green fluorescent protein (GFP) in activated neurons, after appetitive conditioning with sucrose and extinction learning. We also investigated the neuronal excitability of recently activated, GFP+ neurons in these brain areas using whole-cell electrophysiology in brain slices. Exposure to a sucrose cue elicited activation of neurons in both the NAc shell and OFC. In the NAc shell, but not the OFC, these activated GFP+ neurons were more excitable than surrounding GFP- neurons. After extinction, the number of neurons activated in both areas was reduced and activated ensembles in neither area exhibited altered excitability. These data suggest that learning-induced alterations in the intrinsic excitability of neuronal ensembles is regulated dynamically across different brain areas. Furthermore, we show that changes in associative strength modulate the excitability profile of activated ensembles in the NAc shell. SIGNIFICANCE STATEMENT Sparsely distributed sets of neurons called "neuronal ensembles" encode learned associations about food and cues predictive of its availability. Widespread changes in neuronal excitability have been observed in limbic brain areas after associative learning, but little is known about the excitability changes that occur specifically on neuronal ensembles that encode appetitive associations. Here, we reveal that sucrose cue exposure recruited a more excitable ensemble in the nucleus accumbens, but not orbitofrontal cortex, compared with their surrounding neurons. This excitability difference was not observed when the cue's salience was diminished after extinction learning. These novel data provide evidence that the intrinsic excitability of appetitive memory-encoding ensembles is regulated differentially across brain areas and adapts dynamically to changes in associative strength. Copyright © 2017 the authors 0270-6474/17/373160-11$15.00/0.
Exploring diversity in ensemble classification: Applications in large area land cover mapping

NASA Astrophysics Data System (ADS)

Mellor, Andrew; Boukir, Samia

2017-07-01

Ensemble classifiers, such as random forests, are now commonly applied in the field of remote sensing, and have been shown to perform better than single classifier systems, resulting in reduced generalisation error. Diversity across the members of ensemble classifiers is known to have a strong influence on classification performance - whereby classifier errors are uncorrelated and more uniformly distributed across ensemble members. The relationship between ensemble diversity and classification performance has not yet been fully explored in the fields of information science and machine learning and has never been examined in the field of remote sensing. This study is a novel exploration of ensemble diversity and its link to classification performance, applied to a multi-class canopy cover classification problem using random forests and multisource remote sensing and ancillary GIS data, across seven million hectares of diverse dry-sclerophyll dominated public forests in Victoria Australia. A particular emphasis is placed on analysing the relationship between ensemble diversity and ensemble margin - two key concepts in ensemble learning. The main novelty of our work is on boosting diversity by emphasizing the contribution of lower margin instances used in the learning process. Exploring the influence of tree pruning on diversity is also a new empirical analysis that contributes to a better understanding of ensemble performance. Results reveal insights into the trade-off between ensemble classification accuracy and diversity, and through the ensemble margin, demonstrate how inducing diversity by targeting lower margin training samples is a means of achieving better classifier performance for more difficult or rarer classes and reducing information redundancy in classification problems. Our findings inform strategies for collecting training data and designing and parameterising ensemble classifiers, such as random forests. This is particularly important in large area remote sensing applications, for which training data is costly and resource intensive to collect.
Improving precision of glomerular filtration rate estimating model by ensemble learning.

PubMed

Liu, Xun; Li, Ningshan; Lv, Linsheng; Fu, Yongmei; Cheng, Cailian; Wang, Caixia; Ye, Yuqiu; Li, Shaomin; Lou, Tanqi

2017-11-09

Accurate assessment of kidney function is clinically important, but estimates of glomerular filtration rate (GFR) by regression are imprecise. We hypothesized that ensemble learning could improve precision. A total of 1419 participants were enrolled, with 1002 in the development dataset and 417 in the external validation dataset. GFR was independently estimated from age, sex and serum creatinine using an artificial neural network (ANN), support vector machine (SVM), regression, and ensemble learning. GFR was measured by 99mTc-DTPA renal dynamic imaging calibrated with dual plasma sample 99mTc-DTPA GFR. Mean measured GFRs were 70.0 ml/min/1.73 m 2 in the developmental and 53.4 ml/min/1.73 m 2 in the external validation cohorts. In the external validation cohort, precision was better in the ensemble model of the ANN, SVM and regression equation (IQR = 13.5 ml/min/1.73 m 2 ) than in the new regression model (IQR = 14.0 ml/min/1.73 m 2 , P < 0.001). The precision of ensemble learning was the best of the three models, but the models had similar bias and accuracy. The median difference ranged from 2.3 to 3.7 ml/min/1.73 m 2 , 30% accuracy ranged from 73.1 to 76.0%, and P was > 0.05 for all comparisons of the new regression equation and the other new models. An ensemble learning model including three variables, the average ANN, SVM, and regression equation values, was more precise than the new regression model. A more complex ensemble learning strategy may further improve GFR estimates.
Ensemble positive unlabeled learning for disease gene identification.

PubMed

Yang, Peng; Li, Xiaoli; Chua, Hon-Nian; Kwoh, Chee-Keong; Ng, See-Kiong

2014-01-01

An increasing number of genes have been experimentally confirmed in recent years as causative genes to various human diseases. The newly available knowledge can be exploited by machine learning methods to discover additional unknown genes that are likely to be associated with diseases. In particular, positive unlabeled learning (PU learning) methods, which require only a positive training set P (confirmed disease genes) and an unlabeled set U (the unknown candidate genes) instead of a negative training set N, have been shown to be effective in uncovering new disease genes in the current scenario. Using only a single source of data for prediction can be susceptible to bias due to incompleteness and noise in the genomic data and a single machine learning predictor prone to bias caused by inherent limitations of individual methods. In this paper, we propose an effective PU learning framework that integrates multiple biological data sources and an ensemble of powerful machine learning classifiers for disease gene identification. Our proposed method integrates data from multiple biological sources for training PU learning classifiers. A novel ensemble-based PU learning method EPU is then used to integrate multiple PU learning classifiers to achieve accurate and robust disease gene predictions. Our evaluation experiments across six disease groups showed that EPU achieved significantly better results compared with various state-of-the-art prediction methods as well as ensemble learning classifiers. Through integrating multiple biological data sources for training and the outputs of an ensemble of PU learning classifiers for prediction, we are able to minimize the potential bias and errors in individual data sources and machine learning algorithms to achieve more accurate and robust disease gene predictions. In the future, our EPU method provides an effective framework to integrate the additional biological and computational resources for better disease gene predictions.
Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms.

PubMed

Ozcift, Akin; Gulten, Arif

2011-12-01

Improving accuracies of machine learning algorithms is vital in designing high performance computer-aided diagnosis (CADx) systems. Researches have shown that a base classifier performance might be enhanced by ensemble classification strategies. In this study, we construct rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to evaluate their classification performances using Parkinson's, diabetes and heart diseases from literature. While making experiments, first the feature dimension of three datasets is reduced using correlation based feature selection (CFS) algorithm. Second, classification performances of 30 machine learning algorithms are calculated for three datasets. Third, 30 classifier ensembles are constructed based on RF algorithm to assess performances of respective classifiers with the same disease data. All the experiments are carried out with leave-one-out validation strategy and the performances of the 60 algorithms are evaluated using three metrics; classification accuracy (ACC), kappa error (KE) and area under the receiver operating characteristic (ROC) curve (AUC). Base classifiers succeeded 72.15%, 77.52% and 84.43% average accuracies for diabetes, heart and Parkinson's datasets, respectively. As for RF classifier ensembles, they produced average accuracies of 74.47%, 80.49% and 87.13% for respective diseases. RF, a newly proposed classifier ensemble algorithm, might be used to improve accuracy of miscellaneous machine learning algorithms to design advanced CADx systems. Copyright Â© 2011 Elsevier Ireland Ltd. All rights reserved.
New technologies for examining neuronal ensembles in drug addiction and fear

PubMed Central

Cruz, Fabio C.; Koya, Eisuke; Guez-Barber, Danielle H.; Bossert, Jennifer M.; Lupica, Carl R.; Shaham, Yavin; Hope, Bruce T.

2015-01-01

Correlational data suggest that learned associations are encoded within neuronal ensembles. However, it has been difficult to prove that neuronal ensembles mediate learned behaviours because traditional pharmacological and lesion methods, and even newer cell type-specific methods, affect both activated and non-activated neurons. Additionally, previous studies on synaptic and molecular alterations induced by learning did not distinguish between behaviourally activated and non-activated neurons. Here, we describe three new approaches—Daun02 inactivation, FACS sorting of activated neurons and c-fos-GFP transgenic rats — that have been used to selectively target and study activated neuronal ensembles in models of conditioned drug effects and relapse. We also describe two new tools — c-fos-tTA mice and inactivation of CREB-overexpressing neurons — that have been used to study the role of neuronal ensembles in conditioned fear. PMID:24088811
Improved discriminability of spatiotemporal neural patterns in rat motor cortical areas as directional choice learning progresses

PubMed Central

Mao, Hongwei; Yuan, Yuan; Si, Jennie

2015-01-01

Animals learn to choose a proper action among alternatives to improve their odds of success in food foraging and other activities critical for survival. Through trial-and-error, they learn correct associations between their choices and external stimuli. While a neural network that underlies such learning process has been identified at a high level, it is still unclear how individual neurons and a neural ensemble adapt as learning progresses. In this study, we monitored the activity of single units in the rat medial and lateral agranular (AGm and AGl, respectively) areas as rats learned to make a left or right side lever press in response to a left or right side light cue. We noticed that rat movement parameters during the performance of the directional choice task quickly became stereotyped during the first 2–3 days or sessions. But learning the directional choice problem took weeks to occur. Accompanying rats' behavioral performance adaptation, we observed neural modulation by directional choice in recorded single units. Our analysis shows that ensemble mean firing rates in the cue-on period did not change significantly as learning progressed, and the ensemble mean rate difference between left and right side choices did not show a clear trend of change either. However, the spatiotemporal firing patterns of the neural ensemble exhibited improved discriminability between the two directional choices through learning. These results suggest a spatiotemporal neural coding scheme in a motor cortical neural ensemble that may be responsible for and contributing to learning the directional choice task. PMID:25798093
Brainstorming: weighted voting prediction of inhibitors for protein targets.

PubMed

Plewczynski, Dariusz

2011-09-01

The "Brainstorming" approach presented in this paper is a weighted voting method that can improve the quality of predictions generated by several machine learning (ML) methods. First, an ensemble of heterogeneous ML algorithms is trained on available experimental data, then all solutions are gathered and a consensus is built between them. The final prediction is performed using a voting procedure, whereby the vote of each method is weighted according to a quality coefficient calculated using multivariable linear regression (MLR). The MLR optimization procedure is very fast, therefore no additional computational cost is introduced by using this jury approach. Here, brainstorming is applied to selecting actives from large collections of compounds relating to five diverse biological targets of medicinal interest, namely HIV-reverse transcriptase, cyclooxygenase-2, dihydrofolate reductase, estrogen receptor, and thrombin. The MDL Drug Data Report (MDDR) database was used for selecting known inhibitors for these protein targets, and experimental data was then used to train a set of machine learning methods. The benchmark dataset (available at http://bio.icm.edu.pl/∼darman/chemoinfo/benchmark.tar.gz ) can be used for further testing of various clustering and machine learning methods when predicting the biological activity of compounds. Depending on the protein target, the overall recall value is raised by at least 20% in comparison to any single machine learning method (including ensemble methods like random forest) and unweighted simple majority voting procedures.
Bidirectional Modulation of Intrinsic Excitability in Rat Prelimbic Cortex Neuronal Ensembles and Non-Ensembles after Operant Learning.

PubMed

Whitaker, Leslie R; Warren, Brandon L; Venniro, Marco; Harte, Tyler C; McPherson, Kylie B; Beidel, Jennifer; Bossert, Jennifer M; Shaham, Yavin; Bonci, Antonello; Hope, Bruce T

2017-09-06

Learned associations between environmental stimuli and rewards drive goal-directed learning and motivated behavior. These memories are thought to be encoded by alterations within specific patterns of sparsely distributed neurons called neuronal ensembles that are activated selectively by reward-predictive stimuli. Here, we use the Fos promoter to identify strongly activated neuronal ensembles in rat prelimbic cortex (PLC) and assess altered intrinsic excitability after 10 d of operant food self-administration training (1 h/d). First, we used the Daun02 inactivation procedure in male FosLacZ-transgenic rats to ablate selectively Fos-expressing PLC neurons that were active during operant food self-administration. Selective ablation of these neurons decreased food seeking. We then used male FosGFP-transgenic rats to assess selective alterations of intrinsic excitability in Fos-expressing neuronal ensembles (FosGFP + ) that were activated during food self-administration and compared these with alterations in less activated non-ensemble neurons (FosGFP - ). Using whole-cell recordings of layer V pyramidal neurons in an ex vivo brain slice preparation, we found that operant self-administration increased excitability of FosGFP + neurons and decreased excitability of FosGFP - neurons. Increased excitability of FosGFP + neurons was driven by increased steady-state input resistance. Decreased excitability of FosGFP - neurons was driven by increased contribution of small-conductance calcium-activated potassium (SK) channels. Injections of the specific SK channel antagonist apamin into PLC increased Fos expression but had no effect on food seeking. Overall, operant learning increased intrinsic excitability of PLC Fos-expressing neuronal ensembles that play a role in food seeking but decreased intrinsic excitability of Fos - non-ensembles. SIGNIFICANCE STATEMENT Prefrontal cortex activity plays a critical role in operant learning, but the underlying cellular mechanisms are unknown. Using the chemogenetic Daun02 inactivation procedure, we found that a small number of strongly activated Fos-expressing neuronal ensembles in rat PLC play an important role in learned operant food seeking. Using GFP expression to identify Fos-expressing layer V pyramidal neurons in prelimbic cortex (PLC) of FosGFP-transgenic rats, we found that operant food self-administration led to increased intrinsic excitability in the behaviorally relevant Fos-expressing neuronal ensembles, but decreased intrinsic excitability in Fos - neurons using distinct cellular mechanisms. Copyright © 2017 the authors 0270-6474/17/378845-12$15.00/0.
Impact of ensemble learning in the assessment of skeletal maturity.

PubMed

Cunha, Pedro; Moura, Daniel C; Guevara López, Miguel Angel; Guerra, Conceição; Pinto, Daniela; Ramos, Isabel

2014-09-01

The assessment of the bone age, or skeletal maturity, is an important task in pediatrics that measures the degree of maturation of children's bones. Nowadays, there is no standard clinical procedure for assessing bone age and the most widely used approaches are the Greulich and Pyle and the Tanner and Whitehouse methods. Computer methods have been proposed to automatize the process; however, there is a lack of exploration about how to combine the features of the different parts of the hand, and how to take advantage of ensemble techniques for this purpose. This paper presents a study where the use of ensemble techniques for improving bone age assessment is evaluated. A new computer method was developed that extracts descriptors for each joint of each finger, which are then combined using different ensemble schemes for obtaining a final bone age value. Three popular ensemble schemes are explored in this study: bagging, stacking and voting. Best results were achieved by bagging with a rule-based regression (M5P), scoring a mean absolute error of 10.16 months. Results show that ensemble techniques improve the prediction performance of most of the evaluated regression algorithms, always achieving best or comparable to best results. Therefore, the success of the ensemble methods allow us to conclude that their use may improve computer-based bone age assessment, offering a scalable option for utilizing multiple regions of interest and combining their output.
A comparison of the stochastic and machine learning approaches in hydrologic time series forecasting

NASA Astrophysics Data System (ADS)

Kim, T.; Joo, K.; Seo, J.; Heo, J. H.

2016-12-01

Hydrologic time series forecasting is an essential task in water resources management and it becomes more difficult due to the complexity of runoff process. Traditional stochastic models such as ARIMA family has been used as a standard approach in time series modeling and forecasting of hydrological variables. Due to the nonlinearity in hydrologic time series data, machine learning approaches has been studied with the advantage of discovering relevant features in a nonlinear relation among variables. This study aims to compare the predictability between the traditional stochastic model and the machine learning approach. Seasonal ARIMA model was used as the traditional time series model, and Random Forest model which consists of decision tree and ensemble method using multiple predictor approach was applied as the machine learning approach. In the application, monthly inflow data from 1986 to 2015 of Chungju dam in South Korea were used for modeling and forecasting. In order to evaluate the performances of the used models, one step ahead and multi-step ahead forecasting was applied. Root mean squared error and mean absolute error of two models were compared.
No-Reference Image Quality Assessment by Wide-Perceptual-Domain Scorer Ensemble Method.

PubMed

Liu, Tsung-Jung; Liu, Kuan-Hsien

2018-03-01

A no-reference (NR) learning-based approach to assess image quality is presented in this paper. The devised features are extracted from wide perceptual domains, including brightness, contrast, color, distortion, and texture. These features are used to train a model (scorer) which can predict scores. The scorer selection algorithms are utilized to help simplify the proposed system. In the final stage, the ensemble method is used to combine the prediction results from selected scorers. Two multiple-scale versions of the proposed approach are also presented along with the single-scale one. They turn out to have better performances than the original single-scale method. Because of having features from five different domains at multiple image scales and using the outputs (scores) from selected score prediction models as features for multi-scale or cross-scale fusion (i.e., ensemble), the proposed NR image quality assessment models are robust with respect to more than 24 image distortion types. They also can be used on the evaluation of images with authentic distortions. The extensive experiments on three well-known and representative databases confirm the performance robustness of our proposed model.
Ensemble predictive model for more accurate soil organic carbon spectroscopic estimation

NASA Astrophysics Data System (ADS)

Vašát, Radim; Kodešová, Radka; Borůvka, Luboš

2017-07-01

A myriad of signal pre-processing strategies and multivariate calibration techniques has been explored in attempt to improve the spectroscopic prediction of soil organic carbon (SOC) over the last few decades. Therefore, to come up with a novel, more powerful, and accurate predictive approach to beat the rank becomes a challenging task. However, there may be a way, so that combine several individual predictions into a single final one (according to ensemble learning theory). As this approach performs best when combining in nature different predictive algorithms that are calibrated with structurally different predictor variables, we tested predictors of two different kinds: 1) reflectance values (or transforms) at each wavelength and 2) absorption feature parameters. Consequently we applied four different calibration techniques, two per each type of predictors: a) partial least squares regression and support vector machines for type 1, and b) multiple linear regression and random forest for type 2. The weights to be assigned to individual predictions within the ensemble model (constructed as a weighted average) were determined by an automated procedure that ensured the best solution among all possible was selected. The approach was tested at soil samples taken from surface horizon of four sites differing in the prevailing soil units. By employing the ensemble predictive model the prediction accuracy of SOC improved at all four sites. The coefficient of determination in cross-validation (R2cv) increased from 0.849, 0.611, 0.811 and 0.644 (the best individual predictions) to 0.864, 0.650, 0.824 and 0.698 for Site 1, 2, 3 and 4, respectively. Generally, the ensemble model affected the final prediction so that the maximal deviations of predicted vs. observed values of the individual predictions were reduced, and thus the correlation cloud became thinner as desired.
Multi-categorical deep learning neural network to classify retinal images: A pilot study employing small database

PubMed Central

Seo, Jeong Gi; Kwak, Jiyong; Um, Terry Taewoong; Rim, Tyler Hyungtaek

2017-01-01

Deep learning emerges as a powerful tool for analyzing medical images. Retinal disease detection by using computer-aided diagnosis from fundus image has emerged as a new method. We applied deep learning convolutional neural network by using MatConvNet for an automated detection of multiple retinal diseases with fundus photographs involved in STructured Analysis of the REtina (STARE) database. Dataset was built by expanding data on 10 categories, including normal retina and nine retinal diseases. The optimal outcomes were acquired by using a random forest transfer learning based on VGG-19 architecture. The classification results depended greatly on the number of categories. As the number of categories increased, the performance of deep learning models was diminished. When all 10 categories were included, we obtained results with an accuracy of 30.5%, relative classifier information (RCI) of 0.052, and Cohen’s kappa of 0.224. Considering three integrated normal, background diabetic retinopathy, and dry age-related macular degeneration, the multi-categorical classifier showed accuracy of 72.8%, 0.283 RCI, and 0.577 kappa. In addition, several ensemble classifiers enhanced the multi-categorical classification performance. The transfer learning incorporated with ensemble classifier of clustering and voting approach presented the best performance with accuracy of 36.7%, 0.053 RCI, and 0.225 kappa in the 10 retinal diseases classification problem. First, due to the small size of datasets, the deep learning techniques in this study were ineffective to be applied in clinics where numerous patients suffering from various types of retinal disorders visit for diagnosis and treatment. Second, we found that the transfer learning incorporated with ensemble classifiers can improve the classification performance in order to detect multi-categorical retinal diseases. Further studies should confirm the effectiveness of algorithms with large datasets obtained from hospitals. PMID:29095872
Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP.

PubMed

Shim, Yoonsik; Philippides, Andrew; Staras, Kevin; Husbands, Phil

2016-10-01

We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP). The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM) networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.
Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP

PubMed Central

Staras, Kevin

2016-01-01

We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP). The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM) networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture. PMID:27760125
Applications of Machine Learning to Downscaling and Verification

NASA Astrophysics Data System (ADS)

Prudden, R.

2017-12-01

Downscaling, sometimes known as super-resolution, means converting model data into a more detailed local forecast. It is a problem which could be highly amenable to machine learning approaches, provided that sufficient historical forecast data and observations are available. It is also closely linked to the subject of verification, since improving a forecast requires a way to measure that improvement. This talk will describe some early work towards downscaling Met Office ensemble forecasts, and discuss how the output may be usefully evaluated.
Large unbalanced credit scoring using Lasso-logistic regression ensemble.

PubMed

Wang, Hong; Xu, Qingsong; Zhou, Lifeng

2015-01-01

Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.
Machine Learning Approaches for Detecting Diabetic Retinopathy from Clinical and Public Health Records.

PubMed

Ogunyemi, Omolola; Kermah, Dulcie

2015-01-01

Annual eye examinations are recommended for diabetic patients in order to detect diabetic retinopathy and other eye conditions that arise from diabetes. Medically underserved urban communities in the US have annual screening rates that are much lower than the national average and could benefit from informatics approaches to identify unscreened patients most at risk of developing retinopathy. Using clinical data from urban safety net clinics as well as public health data from the CDC's National Health and Nutrition Examination Survey, we examined different machine learning approaches for predicting retinopathy from clinical or public health data. All datasets utilized exhibited a class imbalance. Classifiers learned on the clinical data were modestly predictive of retinopathy with the best model having an AUC of 0.72, sensitivity of 69.2% and specificity of 55.9%. Classifiers learned on public health data were not predictive of retinopathy. Successful approaches to detecting latent retinopathy using machine learning could help safety net and other clinics identify unscreened patients who are most at risk of developing retinopathy and the use of ensemble classifiers on clinical data shows promise for this purpose.

Recognition of emotions using multimodal physiological signals and an ensemble deep learning model.

PubMed

Yin, Zhong; Zhao, Mengyuan; Wang, Yongxiong; Yang, Jingdong; Zhang, Jianhua

2017-03-01

Using deep-learning methodologies to analyze multimodal physiological signals becomes increasingly attractive for recognizing human emotions. However, the conventional deep emotion classifiers may suffer from the drawback of the lack of the expertise for determining model structure and the oversimplification of combining multimodal feature abstractions. In this study, a multiple-fusion-layer based ensemble classifier of stacked autoencoder (MESAE) is proposed for recognizing emotions, in which the deep structure is identified based on a physiological-data-driven approach. Each SAE consists of three hidden layers to filter the unwanted noise in the physiological features and derives the stable feature representations. An additional deep model is used to achieve the SAE ensembles. The physiological features are split into several subsets according to different feature extraction approaches with each subset separately encoded by a SAE. The derived SAE abstractions are combined according to the physiological modality to create six sets of encodings, which are then fed to a three-layer, adjacent-graph-based network for feature fusion. The fused features are used to recognize binary arousal or valence states. DEAP multimodal database was employed to validate the performance of the MESAE. By comparing with the best existing emotion classifier, the mean of classification rate and F-score improves by 5.26%. The superiority of the MESAE against the state-of-the-art shallow and deep emotion classifiers has been demonstrated under different sizes of the available physiological instances. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
"Living on the Edge": A Case of School Reform Working for Disadvantaged Young Adolescents

ERIC Educational Resources Information Center

Smyth, John; McInerney, Peter

2007-01-01

This paper describes an instance of a disadvantaged (urban) Australian government school that realized it had little alternative but to try new approaches; "old ways" were not working. The paper describes an ensemble of school reform practices, philosophies and strategies that give young adolescents genuine ownership of their learning.…
Building Curriculum-Based Concerts: Tired of the Same Old Approach to Your Ensemble's Concert and Festival Schedule?

ERIC Educational Resources Information Center

Russell, Joshua A.

2006-01-01

Since--and even before--the National Standards for Music Education were published, music educators have tried to balance the expectations associated with the traditional performance curriculum and contemporary models of music education. The Standards-based curriculum challenges directors to consider how student experiences and learning can be…
An ensemble deep learning based approach for red lesion detection in fundus images.

PubMed

Orlando, José Ignacio; Prokofyeva, Elena; Del Fresno, Mariana; Blaschko, Matthew B

2018-01-01

Diabetic retinopathy (DR) is one of the leading causes of preventable blindness in the world. Its earliest sign are red lesions, a general term that groups both microaneurysms (MAs) and hemorrhages (HEs). In daily clinical practice, these lesions are manually detected by physicians using fundus photographs. However, this task is tedious and time consuming, and requires an intensive effort due to the small size of the lesions and their lack of contrast. Computer-assisted diagnosis of DR based on red lesion detection is being actively explored due to its improvement effects both in clinicians consistency and accuracy. Moreover, it provides comprehensive feedback that is easy to assess by the physicians. Several methods for detecting red lesions have been proposed in the literature, most of them based on characterizing lesion candidates using hand crafted features, and classifying them into true or false positive detections. Deep learning based approaches, by contrast, are scarce in this domain due to the high expense of annotating the lesions manually. In this paper we propose a novel method for red lesion detection based on combining both deep learned and domain knowledge. Features learned by a convolutional neural network (CNN) are augmented by incorporating hand crafted features. Such ensemble vector of descriptors is used afterwards to identify true lesion candidates using a Random Forest classifier. We empirically observed that combining both sources of information significantly improve results with respect to using each approach separately. Furthermore, our method reported the highest performance on a per-lesion basis on DIARETDB1 and e-ophtha, and for screening and need for referral on MESSIDOR compared to a second human expert. Results highlight the fact that integrating manually engineered approaches with deep learned features is relevant to improve results when the networks are trained from lesion-level annotated data. An open source implementation of our system is publicly available at https://github.com/ignaciorlando/red-lesion-detection. Copyright © 2017 Elsevier B.V. All rights reserved.
Learning to predict chemical reactions.

PubMed

Kayala, Matthew A; Azencott, Chloé-Agathe; Chen, Jonathan H; Baldi, Pierre

2011-09-26

Being able to predict the course of arbitrary chemical reactions is essential to the theory and applications of organic chemistry. Approaches to the reaction prediction problems can be organized around three poles corresponding to: (1) physical laws; (2) rule-based expert systems; and (3) inductive machine learning. Previous approaches at these poles, respectively, are not high throughput, are not generalizable or scalable, and lack sufficient data and structure to be implemented. We propose a new approach to reaction prediction utilizing elements from each pole. Using a physically inspired conceptualization, we describe single mechanistic reactions as interactions between coarse approximations of molecular orbitals (MOs) and use topological and physicochemical attributes as descriptors. Using an existing rule-based system (Reaction Explorer), we derive a restricted chemistry data set consisting of 1630 full multistep reactions with 2358 distinct starting materials and intermediates, associated with 2989 productive mechanistic steps and 6.14 million unproductive mechanistic steps. And from machine learning, we pose identifying productive mechanistic steps as a statistical ranking, information retrieval problem: given a set of reactants and a description of conditions, learn a ranking model over potential filled-to-unfilled MO interactions such that the top-ranked mechanistic steps yield the major products. The machine learning implementation follows a two-stage approach, in which we first train atom level reactivity filters to prune 94.00% of nonproductive reactions with a 0.01% error rate. Then, we train an ensemble of ranking models on pairs of interacting MOs to learn a relative productivity function over mechanistic steps in a given system. Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanism at the top 89.05% of the time, rising to 99.86% of the time when the top four are considered. Furthermore, the system is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert does not handle. A web interface to the machine learning based mechanistic reaction predictor is accessible through our chemoinformatics portal ( http://cdb.ics.uci.edu) under the Toolkits section.
Learning to Predict Chemical Reactions

PubMed Central

Kayala, Matthew A.; Azencott, Chloé-Agathe; Chen, Jonathan H.

2011-01-01

Being able to predict the course of arbitrary chemical reactions is essential to the theory and applications of organic chemistry. Approaches to the reaction prediction problems can be organized around three poles corresponding to: (1) physical laws; (2) rule-based expert systems; and (3) inductive machine learning. Previous approaches at these poles respectively are not high-throughput, are not generalizable or scalable, or lack sufficient data and structure to be implemented. We propose a new approach to reaction prediction utilizing elements from each pole. Using a physically inspired conceptualization, we describe single mechanistic reactions as interactions between coarse approximations of molecular orbitals (MOs) and use topological and physicochemical attributes as descriptors. Using an existing rule-based system (Reaction Explorer), we derive a restricted chemistry dataset consisting of 1630 full multi-step reactions with 2358 distinct starting materials and intermediates, associated with 2989 productive mechanistic steps and 6.14 million unproductive mechanistic steps. And from machine learning, we pose identifying productive mechanistic steps as a statistical ranking, information retrieval, problem: given a set of reactants and a description of conditions, learn a ranking model over potential filled-to-unfilled MO interactions such that the top ranked mechanistic steps yield the major products. The machine learning implementation follows a two-stage approach, in which we first train atom level reactivity filters to prune 94.00% of non-productive reactions with a 0.01% error rate. Then, we train an ensemble of ranking models on pairs of interacting MOs to learn a relative productivity function over mechanistic steps in a given system. Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanism at the top 89.05% of the time, rising to 99.86% of the time when the top four are considered. Furthermore, the system is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert does not handle. A web interface to the machine learning based mechanistic reaction predictor is accessible through our chemoinformatics portal (http://cdb.ics.uci.edu) under the Toolkits section. PMID:21819139
Learning the Relationship between Galaxy Spectra and Star Formation Histories

NASA Astrophysics Data System (ADS)

Lovell, Christopher; Acquaviva, Viviana; Iyer, Kartheik; Gawiser, Eric

2018-01-01

We explore novel approaches to the problem of predicting a galaxy’s star formation history (SFH) from its Spectral Energy Distribution (SED). Traditional approaches to SED template fitting use constant or exponentially declining SFHs, and are known to incur significant bias in the inferred SFHs, which are typically skewed toward younger stellar populations. Machine learning approaches, including tree ensemble methods and convolutional neural networks, would not be affected by the same bias, and may work well in recovering unbiased and multi-episodic star formation histories. We use a supervised approach whereby models are trained using synthetic spectra, generated from three state of the art hydrodynamical simulations, including nebular emission. We explore how SED feature maps can be used to highlight areas of the spectrum with the highest predictive power and discuss the limitations of the approach when applied to real data.
Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble

PubMed Central

Wang, Hong; Xu, Qingsong; Zhou, Lifeng

2015-01-01

Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988
Human resource recommendation algorithm based on ensemble learning and Spark

NASA Astrophysics Data System (ADS)

Cong, Zihan; Zhang, Xingming; Wang, Haoxiang; Xu, Hongjie

2017-08-01

Aiming at the problem of “information overload” in the human resources industry, this paper proposes a human resource recommendation algorithm based on Ensemble Learning. The algorithm considers the characteristics and behaviours of both job seeker and job features in the real business circumstance. Firstly, the algorithm uses two ensemble learning methods-Bagging and Boosting. The outputs from both learning methods are then merged to form user interest model. Based on user interest model, job recommendation can be extracted for users. The algorithm is implemented as a parallelized recommendation system on Spark. A set of experiments have been done and analysed. The proposed algorithm achieves significant improvement in accuracy, recall rate and coverage, compared with recommendation algorithms such as UserCF and ItemCF.
A preclustering-based ensemble learning technique for acute appendicitis diagnoses.

PubMed

Lee, Yen-Hsien; Hu, Paul Jen-Hwa; Cheng, Tsang-Hsiang; Huang, Te-Chia; Chuang, Wei-Yao

2013-06-01

Acute appendicitis is a common medical condition, whose effective, timely diagnosis can be difficult. A missed diagnosis not only puts the patient in danger but also requires additional resources for corrective treatments. An acute appendicitis diagnosis constitutes a classification problem, for which a further fundamental challenge pertains to the skewed outcome class distribution of instances in the training sample. A preclustering-based ensemble learning (PEL) technique aims to address the associated imbalanced sample learning problems and thereby support the timely, accurate diagnosis of acute appendicitis. The proposed PEL technique employs undersampling to reduce the number of majority-class instances in a training sample, uses preclustering to group similar majority-class instances into multiple groups, and selects from each group representative instances to create more balanced samples. The PEL technique thereby reduces potential information loss from random undersampling. It also takes advantage of ensemble learning to improve performance. We empirically evaluate this proposed technique with 574 clinical cases obtained from a comprehensive tertiary hospital in southern Taiwan, using several prevalent techniques and a salient scoring system as benchmarks. The comparative results show that PEL is more effective and less biased than any benchmarks. The proposed PEL technique seems more sensitive to identifying positive acute appendicitis than the commonly used Alvarado scoring system and exhibits higher specificity in identifying negative acute appendicitis. In addition, the sensitivity and specificity values of PEL appear higher than those of the investigated benchmarks that follow the resampling approach. Our analysis suggests PEL benefits from the more representative majority-class instances in the training sample. According to our overall evaluation results, PEL records the best overall performance, and its area under the curve measure reaches 0.619. The PEL technique is capable of addressing imbalanced sample learning associated with acute appendicitis diagnosis. Our evaluation results suggest PEL is less biased toward a positive or negative class than the investigated benchmark techniques. In addition, our results indicate the overall effectiveness of the proposed technique, compared with prevalent scoring systems or salient classification techniques that follow the resampling approach. Copyright © 2013 Elsevier B.V. All rights reserved.
Ensemble Deep Learning for Biomedical Time Series Classification

PubMed Central

2016-01-01

Ensemble learning has been proved to improve the generalization ability effectively in both theory and practice. In this paper, we briefly outline the current status of research on it first. Then, a new deep neural network-based ensemble method that integrates filtering views, local views, distorted views, explicit training, implicit training, subview prediction, and Simple Average is proposed for biomedical time series classification. Finally, we validate its effectiveness on the Chinese Cardiovascular Disease Database containing a large number of electrocardiogram recordings. The experimental results show that the proposed method has certain advantages compared to some well-known ensemble methods, such as Bagging and AdaBoost. PMID:27725828
Vegetation Mapping in a Dryland Ecosystem Using Multi-temporal Sentinel-2 Imagery and Ensemble Learning

NASA Astrophysics Data System (ADS)

Enterkine, J.; Spaete, L.; Glenn, N. F.; Gallagher, M.

2017-12-01

Remote sensing and mapping of dryland ecosystem vegetation is notably problematic due to the low canopy cover and fugacious growing seasons. Recent improvements in available satellite imagery and machine learning techniques have enabled enhanced approaches to mapping and monitoring vegetation across dryland ecosystems. The Sentinel-2 satellites (launched June 2015 and March 2017) of ESA's Copernicus Programme offer promising developments from existing multispectral satellite systems such as Landsat. Freely-available, Sentinel-2 imagery offers a five-day revisit frequency, thirteen spectral bands (in the visible, near infrared, and shortwave infrared), and high spatial resolution (from 10m to 60m). Three narrow spectral bands located between the visible and the near infrared are designed to observe changes in photosynthesis. The high temporal, spatial, and spectral resolution of this imagery makes it ideal for monitoring vegetation in dryland ecosystems. In this study, we calculated a large number of vegetation and spectral indices from Sentinel-2 imagery spanning a growing season. This data was leveraged with robust field data of canopy cover at precise geolocations. We then used a Random Forests ensemble learning model to identify the most predictive variables for each landcover class, which were then used to impute landcover over the study area. The resulting vegetation map product will be used by land managers, and the mapping approaches will serve as a basis for future remote sensing projects using Sentinel-2 imagery and machine learning.
Using Support Vector Machine Ensembles for Target Audience Classification on Twitter

PubMed Central

Lo, Siaw Ling; Chiong, Raymond; Cornforth, David

2015-01-01

The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space. PMID:25874768
Using support vector machine ensembles for target audience classification on Twitter.

PubMed

Lo, Siaw Ling; Chiong, Raymond; Cornforth, David

2015-01-01

The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space.
Ensemble Learning of QTL Models Improves Prediction of Complex Traits

PubMed Central

Bian, Yang; Holland, James B.

2015-01-01

Quantitative trait locus (QTL) models can provide useful insights into trait genetic architecture because of their straightforward interpretability but are less useful for genetic prediction because of the difficulty in including the effects of numerous small effect loci without overfitting. Tight linkage between markers introduces near collinearity among marker genotypes, complicating the detection of QTL and estimation of QTL effects in linkage mapping, and this problem is exacerbated by very high density linkage maps. Here we developed a thinning and aggregating (TAGGING) method as a new ensemble learning approach to QTL mapping. TAGGING reduces collinearity problems by thinning dense linkage maps, maintains aspects of marker selection that characterize standard QTL mapping, and by ensembling, incorporates information from many more markers-trait associations than traditional QTL mapping. The objective of TAGGING was to improve prediction power compared with QTL mapping while also providing more specific insights into genetic architecture than genome-wide prediction models. TAGGING was compared with standard QTL mapping using cross validation of empirical data from the maize (Zea mays L.) nested association mapping population. TAGGING-assisted QTL mapping substantially improved prediction ability for both biparental and multifamily populations by reducing both the variance and bias in prediction. Furthermore, an ensemble model combining predictions from TAGGING-assisted QTL and infinitesimal models improved prediction abilities over the component models, indicating some complementarity between model assumptions and suggesting that some trait genetic architectures involve a mixture of a few major QTL and polygenic effects. PMID:26276383
Using Ensemble Decisions and Active Selection to Improve Low-Cost Labeling for Multi-View Data

NASA Technical Reports Server (NTRS)

Rebbapragada, Umaa; Wagstaff, Kiri L.

2011-01-01

This paper seeks to improve low-cost labeling in terms of training set reliability (the fraction of correctly labeled training items) and test set performance for multi-view learning methods. Co-training is a popular multiview learning method that combines high-confidence example selection with low-cost (self) labeling. However, co-training with certain base learning algorithms significantly reduces training set reliability, causing an associated drop in prediction accuracy. We propose the use of ensemble labeling to improve reliability in such cases. We also discuss and show promising results on combining low-cost ensemble labeling with active (low-confidence) example selection. We unify these example selection and labeling strategies under collaborative learning, a family of techniques for multi-view learning that we are developing for distributed, sensor-network environments.
DrugE-Rank: improving drug–target interaction prediction of new candidate drugs or targets by ensemble learning to rank

PubMed Central

Yuan, Qingjun; Gao, Junning; Wu, Dongliang; Zhang, Shihua; Mamitsuka, Hiroshi; Zhu, Shanfeng

2016-01-01

Motivation: Identifying drug–target interactions is an important task in drug discovery. To reduce heavy time and financial cost in experimental way, many computational approaches have been proposed. Although these approaches have used many different principles, their performance is far from satisfactory, especially in predicting drug–target interactions of new candidate drugs or targets. Methods: Approaches based on machine learning for this problem can be divided into two types: feature-based and similarity-based methods. Learning to rank is the most powerful technique in the feature-based methods. Similarity-based methods are well accepted, due to their idea of connecting the chemical and genomic spaces, represented by drug and target similarities, respectively. We propose a new method, DrugE-Rank, to improve the prediction performance by nicely combining the advantages of the two different types of methods. That is, DrugE-Rank uses LTR, for which multiple well-known similarity-based methods can be used as components of ensemble learning. Results: The performance of DrugE-Rank is thoroughly examined by three main experiments using data from DrugBank: (i) cross-validation on FDA (US Food and Drug Administration) approved drugs before March 2014; (ii) independent test on FDA approved drugs after March 2014; and (iii) independent test on FDA experimental drugs. Experimental results show that DrugE-Rank outperforms competing methods significantly, especially achieving more than 30% improvement in Area under Prediction Recall curve for FDA approved new drugs and FDA experimental drugs. Availability: http://datamining-iip.fudan.edu.cn/service/DrugE-Rank Contact: zhusf@fudan.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307615
DrugE-Rank: improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank.

PubMed

Yuan, Qingjun; Gao, Junning; Wu, Dongliang; Zhang, Shihua; Mamitsuka, Hiroshi; Zhu, Shanfeng

2016-06-15

Identifying drug-target interactions is an important task in drug discovery. To reduce heavy time and financial cost in experimental way, many computational approaches have been proposed. Although these approaches have used many different principles, their performance is far from satisfactory, especially in predicting drug-target interactions of new candidate drugs or targets. Approaches based on machine learning for this problem can be divided into two types: feature-based and similarity-based methods. Learning to rank is the most powerful technique in the feature-based methods. Similarity-based methods are well accepted, due to their idea of connecting the chemical and genomic spaces, represented by drug and target similarities, respectively. We propose a new method, DrugE-Rank, to improve the prediction performance by nicely combining the advantages of the two different types of methods. That is, DrugE-Rank uses LTR, for which multiple well-known similarity-based methods can be used as components of ensemble learning. The performance of DrugE-Rank is thoroughly examined by three main experiments using data from DrugBank: (i) cross-validation on FDA (US Food and Drug Administration) approved drugs before March 2014; (ii) independent test on FDA approved drugs after March 2014; and (iii) independent test on FDA experimental drugs. Experimental results show that DrugE-Rank outperforms competing methods significantly, especially achieving more than 30% improvement in Area under Prediction Recall curve for FDA approved new drugs and FDA experimental drugs. http://datamining-iip.fudan.edu.cn/service/DrugE-Rank zhusf@fudan.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Orchestrating Literacies: Print Literacy Learning Opportunities within Multimodal Intergenerational Ensembles

ERIC Educational Resources Information Center

McKee, Lori L.; Heydon, Rachel M.

2015-01-01

This exploratory case study considered the opportunities for print literacy learning within multimodal ensembles that featured art, singing and digital media within the context of an intergenerational programme that brought together 13 kindergarten children (4 and 5 years) with seven elder companions. Study questions concerned how reading and…
Designing boosting ensemble of relational fuzzy systems.

PubMed

Scherer, Rafał

2010-10-01

A method frequently used in classification systems for improving classification accuracy is to combine outputs of several classifiers. Among various types of classifiers, fuzzy ones are tempting because of using intelligible fuzzy if-then rules. In the paper we build an AdaBoost ensemble of relational neuro-fuzzy classifiers. Relational fuzzy systems bond input and output fuzzy linguistic values by a binary relation; thus, fuzzy rules have additional, comparing to traditional fuzzy systems, weights - elements of a fuzzy relation matrix. Thanks to this the system is better adjustable to data during learning. In the paper an ensemble of relational fuzzy systems is proposed. The problem is that such an ensemble contains separate rule bases which cannot be directly merged. As systems are separate, we cannot treat fuzzy rules coming from different systems as rules from the same (single) system. In the paper, the problem is addressed by a novel design of fuzzy systems constituting the ensemble, resulting in normalization of individual rule bases during learning. The method described in the paper is tested on several known benchmarks and compared with other machine learning solutions from the literature.

A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data.

PubMed

Collell, Guillem; Prelec, Drazen; Patil, Kaustubh R

2018-01-31

Class imbalance presents a major hurdle in the application of classification methods. A commonly taken approach is to learn ensembles of classifiers using rebalanced data. Examples include bootstrap averaging (bagging) combined with either undersampling or oversampling of the minority class examples. However, rebalancing methods entail asymmetric changes to the examples of different classes, which in turn can introduce their own biases. Furthermore, these methods often require specifying the performance measure of interest a priori, i.e., before learning. An alternative is to employ the threshold moving technique, which applies a threshold to the continuous output of a model, offering the possibility to adapt to a performance measure a posteriori , i.e., a plug-in method. Surprisingly, little attention has been paid to this combination of a bagging ensemble and threshold-moving. In this paper, we study this combination and demonstrate its competitiveness. Contrary to the other resampling methods, we preserve the natural class distribution of the data resulting in well-calibrated posterior probabilities. Additionally, we extend the proposed method to handle multiclass data. We validated our method on binary and multiclass benchmark data sets by using both, decision trees and neural networks as base classifiers. We perform analyses that provide insights into the proposed method.
Extending Correlation Filter-Based Visual Tracking by Tree-Structured Ensemble and Spatial Windowing.

PubMed

Gundogdu, Erhan; Ozkan, Huseyin; Alatan, A Aydin

2017-11-01

Correlation filters have been successfully used in visual tracking due to their modeling power and computational efficiency. However, the state-of-the-art correlation filter-based (CFB) tracking algorithms tend to quickly discard the previous poses of the target, since they consider only a single filter in their models. On the contrary, our approach is to register multiple CFB trackers for previous poses and exploit the registered knowledge when an appearance change occurs. To this end, we propose a novel tracking algorithm [of complexity O(D) ] based on a large ensemble of CFB trackers. The ensemble [of size O(2 D ) ] is organized over a binary tree (depth D ), and learns the target appearance subspaces such that each constituent tracker becomes an expert of a certain appearance. During tracking, the proposed algorithm combines only the appearance-aware relevant experts to produce boosted tracking decisions. Additionally, we propose a versatile spatial windowing technique to enhance the individual expert trackers. For this purpose, spatial windows are learned for target objects as well as the correlation filters and then the windowed regions are processed for more robust correlations. In our extensive experiments on benchmark datasets, we achieve a substantial performance increase by using the proposed tracking algorithm together with the spatial windowing.
Thorough statistical comparison of machine learning regression models and their ensembles for sub-pixel imperviousness and imperviousness change mapping

NASA Astrophysics Data System (ADS)

Drzewiecki, Wojciech

2017-12-01

We evaluated the performance of nine machine learning regression algorithms and their ensembles for sub-pixel estimation of impervious areas coverages from Landsat imagery. The accuracy of imperviousness mapping in individual time points was assessed based on RMSE, MAE and R2. These measures were also used for the assessment of imperviousness change intensity estimations. The applicability for detection of relevant changes in impervious areas coverages at sub-pixel level was evaluated using overall accuracy, F-measure and ROC Area Under Curve. The results proved that Cubist algorithm may be advised for Landsat-based mapping of imperviousness for single dates. Stochastic gradient boosting of regression trees (GBM) may be also considered for this purpose. However, Random Forest algorithm is endorsed for both imperviousness change detection and mapping of its intensity. In all applications the heterogeneous model ensembles performed at least as well as the best individual models or better. They may be recommended for improving the quality of sub-pixel imperviousness and imperviousness change mapping. The study revealed also limitations of the investigated methodology for detection of subtle changes of imperviousness inside the pixel. None of the tested approaches was able to reliably classify changed and non-changed pixels if the relevant change threshold was set as one or three percent. Also for fi ve percent change threshold most of algorithms did not ensure that the accuracy of change map is higher than the accuracy of random classifi er. For the threshold of relevant change set as ten percent all approaches performed satisfactory.
The effects of shared information on semantic calculations in the gene ontology.

PubMed

Bible, Paul W; Sun, Hong-Wei; Morasso, Maria I; Loganantharaj, Rasiah; Wei, Lai

2017-01-01

The structured vocabulary that describes gene function, the gene ontology (GO), serves as a powerful tool in biological research. One application of GO in computational biology calculates semantic similarity between two concepts to make inferences about the functional similarity of genes. A class of term similarity algorithms explicitly calculates the shared information (SI) between concepts then substitutes this calculation into traditional term similarity measures such as Resnik, Lin, and Jiang-Conrath. Alternative SI approaches, when combined with ontology choice and term similarity type, lead to many gene-to-gene similarity measures. No thorough investigation has been made into the behavior, complexity, and performance of semantic methods derived from distinct SI approaches. We apply bootstrapping to compare the generalized performance of 57 gene-to-gene semantic measures across six benchmarks. Considering the number of measures, we additionally evaluate whether these methods can be leveraged through ensemble machine learning to improve prediction performance. Results showed that the choice of ontology type most strongly influenced performance across all evaluations. Combining measures into an ensemble classifier reduces cross-validation error beyond any individual measure for protein interaction prediction. This improvement resulted from information gained through the combination of ontology types as ensemble methods within each GO type offered no improvement. These results demonstrate that multiple SI measures can be leveraged for machine learning tasks such as automated gene function prediction by incorporating methods from across the ontologies. To facilitate future research in this area, we developed the GO Graph Tool Kit (GGTK), an open source C++ library with Python interface (github.com/paulbible/ggtk).
Learning disordered topological phases by statistical recovery of symmetry

NASA Astrophysics Data System (ADS)

Yoshioka, Nobuyuki; Akagi, Yutaka; Katsura, Hosho

2018-05-01

We apply the artificial neural network in a supervised manner to map out the quantum phase diagram of disordered topological superconductors in class DIII. Given the disorder that keeps the discrete symmetries of the ensemble as a whole, translational symmetry which is broken in the quasiparticle distribution individually is recovered statistically by taking an ensemble average. By using this, we classify the phases by the artificial neural network that learned the quasiparticle distribution in the clean limit and show that the result is totally consistent with the calculation by the transfer matrix method or noncommutative geometry approach. If all three phases, namely the Z2, trivial, and thermal metal phases, appear in the clean limit, the machine can classify them with high confidence over the entire phase diagram. If only the former two phases are present, we find that the machine remains confused in a certain region, leading us to conclude the detection of the unknown phase which is eventually identified as the thermal metal phase.
Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis.

PubMed

You, Zhu-Hong; Lei, Ying-Ke; Zhu, Lin; Xia, Junfeng; Wang, Bing

2013-01-01

Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time.
Machine learning in the string landscape

NASA Astrophysics Data System (ADS)

Carifio, Jonathan; Halverson, James; Krioukov, Dmitri; Nelson, Brent D.

2017-09-01

We utilize machine learning to study the string landscape. Deep data dives and conjecture generation are proposed as useful frameworks for utilizing machine learning in the landscape, and examples of each are presented. A decision tree accurately predicts the number of weak Fano toric threefolds arising from reflexive polytopes, each of which determines a smooth F-theory compactification, and linear regression generates a previously proven conjecture for the gauge group rank in an ensemble of 4/3× 2.96× {10}^{755} F-theory compactifications. Logistic regression generates a new conjecture for when E 6 arises in the large ensemble of F-theory compactifications, which is then rigorously proven. This result may be relevant for the appearance of visible sectors in the ensemble. Through conjecture generation, machine learning is useful not only for numerics, but also for rigorous results.
Detecting epileptic seizure with different feature extracting strategies using robust machine learning classification techniques by applying advance parameter optimization approach.

PubMed

Hussain, Lal

2018-06-01

Epilepsy is a neurological disorder produced due to abnormal excitability of neurons in the brain. The research reveals that brain activity is monitored through electroencephalogram (EEG) of patients suffered from seizure to detect the epileptic seizure. The performance of EEG detection based epilepsy require feature extracting strategies. In this research, we have extracted varying features extracting strategies based on time and frequency domain characteristics, nonlinear, wavelet based entropy and few statistical features. A deeper study was undertaken using novel machine learning classifiers by considering multiple factors. The support vector machine kernels are evaluated based on multiclass kernel and box constraint level. Likewise, for K-nearest neighbors (KNN), we computed the different distance metrics, Neighbor weights and Neighbors. Similarly, the decision trees we tuned the paramours based on maximum splits and split criteria and ensemble classifiers are evaluated based on different ensemble methods and learning rate. For training/testing tenfold Cross validation was employed and performance was evaluated in form of TPR, NPR, PPV, accuracy and AUC. In this research, a deeper analysis approach was performed using diverse features extracting strategies using robust machine learning classifiers with more advanced optimal options. Support Vector Machine linear kernel and KNN with City block distance metric give the overall highest accuracy of 99.5% which was higher than using the default parameters for these classifiers. Moreover, highest separation (AUC = 0.9991, 0.9990) were obtained at different kernel scales using SVM. Additionally, the K-nearest neighbors with inverse squared distance weight give higher performance at different Neighbors. Moreover, to distinguish the postictal heart rate oscillations from epileptic ictal subjects, and highest performance of 100% was obtained using different machine learning classifiers.
Non-Metric Similarity Measures

DTIC Science & Technology

2015-03-26

Sunil Aryal and Kai Ming Ting. (2015) A generic ensemble approach to estimate multi-dimensional likelihood in Bayesian classifier learning...Computational Intelligence. http://onlinelibrary.wiley.com/doi/10.1111/coin.12063/abstract 5.2 List of peer-reviewed conference publications [3] Sunil Aryal...International Conference on Data Mining. 707-711. [4] Sunil Aryal, Kai Ming Ting, Jonathan R. Wells and Takashi Washio. (2014) Improv- ing iForest with
Prediction of Weather Impacted Airport Capacity using Ensemble Learning

NASA Technical Reports Server (NTRS)

Wang, Yao Xun

2011-01-01

Ensemble learning with the Bagging Decision Tree (BDT) model was used to assess the impact of weather on airport capacities at selected high-demand airports in the United States. The ensemble bagging decision tree models were developed and validated using the Federal Aviation Administration (FAA) Aviation System Performance Metrics (ASPM) data and weather forecast at these airports. The study examines the performance of BDT, along with traditional single Support Vector Machines (SVM), for airport runway configuration selection and airport arrival rates (AAR) prediction during weather impacts. Testing of these models was accomplished using observed weather, weather forecast, and airport operation information at the chosen airports. The experimental results show that ensemble methods are more accurate than a single SVM classifier. The airport capacity ensemble method presented here can be used as a decision support model that supports air traffic flow management to meet the weather impacted airport capacity in order to reduce costs and increase safety.
Negative correlation learning for customer churn prediction: a comparison study.

PubMed

Rodan, Ali; Fayyoumi, Ayham; Faris, Hossam; Alsakran, Jamal; Al-Kadi, Omar

2015-01-01

Recently, telecommunication companies have been paying more attention toward the problem of identification of customer churn behavior. In business, it is well known for service providers that attracting new customers is much more expensive than retaining existing ones. Therefore, adopting accurate models that are able to predict customer churn can effectively help in customer retention campaigns and maximizing the profit. In this paper we will utilize an ensemble of Multilayer perceptrons (MLP) whose training is obtained using negative correlation learning (NCL) for predicting customer churn in a telecommunication company. Experiments results confirm that NCL based MLP ensemble can achieve better generalization performance (high churn rate) compared with ensemble of MLP without NCL (flat ensemble) and other common data mining techniques used for churn analysis.
Time-course, negative-stain electron microscopy–based analysis for investigating protein–protein interactions at the single-molecule level

PubMed Central

Nogal, Bartek; Bowman, Charles A.; Ward, Andrew B.

2017-01-01

Several biophysical approaches are available to study protein–protein interactions. Most approaches are conducted in bulk solution, and are therefore limited to an average measurement of the ensemble of molecular interactions. Here, we show how single-particle EM can enrich our understanding of protein–protein interactions at the single-molecule level and potentially capture states that are unobservable with ensemble methods because they are below the limit of detection or not conducted on an appropriate time scale. Using the HIV-1 envelope glycoprotein (Env) and its interaction with receptor CD4-binding site neutralizing antibodies as a model system, we both corroborate ensemble kinetics-derived parameters and demonstrate how time-course EM can further dissect stoichiometric states of complexes that are not readily observable with other methods. Visualization of the kinetics and stoichiometry of Env–antibody complexes demonstrated the applicability of our approach to qualitatively and semi-quantitatively differentiate two highly similar neutralizing antibodies. Furthermore, implementation of machine-learning techniques for sorting class averages of these complexes into discrete subclasses of particles helped reduce human bias. Our data provide proof of concept that single-particle EM can be used to generate a “visual” kinetic profile that should be amenable to studying many other protein–protein interactions, is relatively simple and complementary to well-established biophysical approaches. Moreover, our method provides critical insights into broadly neutralizing antibody recognition of Env, which may inform vaccine immunogen design and immunotherapeutic development. PMID:28972148
Monte Carlo replica-exchange based ensemble docking of protein conformations.

PubMed

Zhang, Zhe; Ehmann, Uwe; Zacharias, Martin

2017-05-01

A replica-exchange Monte Carlo (REMC) ensemble docking approach has been developed that allows efficient exploration of protein-protein docking geometries. In addition to Monte Carlo steps in translation and orientation of binding partners, possible conformational changes upon binding are included based on Monte Carlo selection of protein conformations stored as ordered pregenerated conformational ensembles. The conformational ensembles of each binding partner protein were generated by three different approaches starting from the unbound partner protein structure with a range spanning a root mean square deviation of 1-2.5 Å with respect to the unbound structure. Because MC sampling is performed to select appropriate partner conformations on the fly the approach is not limited by the number of conformations in the ensemble compared to ensemble docking of each conformer pair in ensemble cross docking. Although only a fraction of generated conformers was in closer agreement with the bound structure the REMC ensemble docking approach achieved improved docking results compared to REMC docking with only the unbound partner structures or using docking energy minimization methods. The approach has significant potential for further improvement in combination with more realistic structural ensembles and better docking scoring functions. Proteins 2017; 85:924-937. © 2016 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Ensemble Feature Learning of Genomic Data Using Support Vector Machine

PubMed Central

Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R.; Braytee, Ali; Kennedy, Paul J.

2016-01-01

The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. PMID:27304923
Attracting Dynamics of Frontal Cortex Ensembles during Memory-Guided Decision-Making

PubMed Central

Seamans, Jeremy K.; Durstewitz, Daniel

2011-01-01

A common theoretical view is that attractor-like properties of neuronal dynamics underlie cognitive processing. However, although often proposed theoretically, direct experimental support for the convergence of neural activity to stable population patterns as a signature of attracting states has been sparse so far, especially in higher cortical areas. Combining state space reconstruction theorems and statistical learning techniques, we were able to resolve details of anterior cingulate cortex (ACC) multiple single-unit activity (MSUA) ensemble dynamics during a higher cognitive task which were not accessible previously. The approach worked by constructing high-dimensional state spaces from delays of the original single-unit firing rate variables and the interactions among them, which were then statistically analyzed using kernel methods. We observed cognitive-epoch-specific neural ensemble states in ACC which were stable across many trials (in the sense of being predictive) and depended on behavioral performance. More interestingly, attracting properties of these cognitively defined ensemble states became apparent in high-dimensional expansions of the MSUA spaces due to a proper unfolding of the neural activity flow, with properties common across different animals. These results therefore suggest that ACC networks may process different subcomponents of higher cognitive tasks by transiting among different attracting states. PMID:21625577
Variational learning and bits-back coding: an information-theoretic view to Bayesian learning.

PubMed

Honkela, Antti; Valpola, Harri

2004-07-01

The bits-back coding first introduced by Wallace in 1990 and later by Hinton and van Camp in 1993 provides an interesting link between Bayesian learning and information-theoretic minimum-description-length (MDL) learning approaches. The bits-back coding allows interpreting the cost function used in the variational Bayesian method called ensemble learning as a code length in addition to the Bayesian view of misfit of the posterior approximation and a lower bound of model evidence. Combining these two viewpoints provides interesting insights to the learning process and the functions of different parts of the model. In this paper, the problem of variational Bayesian learning of hierarchical latent variable models is used to demonstrate the benefits of the two views. The code-length interpretation provides new views to many parts of the problem such as model comparison and pruning and helps explain many phenomena occurring in learning.
Less is more: Sampling chemical space with active learning

NASA Astrophysics Data System (ADS)

Smith, Justin S.; Nebgen, Ben; Lubbers, Nicholas; Isayev, Olexandr; Roitberg, Adrian E.

2018-06-01

The development of accurate and transferable machine learning (ML) potentials for predicting molecular energetics is a challenging task. The process of data generation to train such ML potentials is a task neither well understood nor researched in detail. In this work, we present a fully automated approach for the generation of datasets with the intent of training universal ML potentials. It is based on the concept of active learning (AL) via Query by Committee (QBC), which uses the disagreement between an ensemble of ML potentials to infer the reliability of the ensemble's prediction. QBC allows the presented AL algorithm to automatically sample regions of chemical space where the ML potential fails to accurately predict the potential energy. AL improves the overall fitness of ANAKIN-ME (ANI) deep learning potentials in rigorous test cases by mitigating human biases in deciding what new training data to use. AL also reduces the training set size to a fraction of the data required when using naive random sampling techniques. To provide validation of our AL approach, we develop the COmprehensive Machine-learning Potential (COMP6) benchmark (publicly available on GitHub) which contains a diverse set of organic molecules. Active learning-based ANI potentials outperform the original random sampled ANI-1 potential with only 10% of the data, while the final active learning-based model vastly outperforms ANI-1 on the COMP6 benchmark after training to only 25% of the data. Finally, we show that our proposed AL technique develops a universal ANI potential (ANI-1x) that provides accurate energy and force predictions on the entire COMP6 benchmark. This universal ML potential achieves a level of accuracy on par with the best ML potentials for single molecules or materials, while remaining applicable to the general class of organic molecules composed of the elements CHNO.
Ensemble based adaptive over-sampling method for imbalanced data learning in computer aided detection of microaneurysm.

PubMed

Ren, Fulong; Cao, Peng; Li, Wei; Zhao, Dazhe; Zaiane, Osmar

2017-01-01

Diabetic retinopathy (DR) is a progressive disease, and its detection at an early stage is crucial for saving a patient's vision. An automated screening system for DR can help in reduce the chances of complete blindness due to DR along with lowering the work load on ophthalmologists. Among the earliest signs of DR are microaneurysms (MAs). However, current schemes for MA detection appear to report many false positives because detection algorithms have high sensitivity. Inevitably some non-MAs structures are labeled as MAs in the initial MAs identification step. This is a typical "class imbalance problem". Class imbalanced data has detrimental effects on the performance of conventional classifiers. In this work, we propose an ensemble based adaptive over-sampling algorithm for overcoming the class imbalance problem in the false positive reduction, and we use Boosting, Bagging, Random subspace as the ensemble framework to improve microaneurysm detection. The ensemble based over-sampling methods we proposed combine the strength of adaptive over-sampling and ensemble. The objective of the amalgamation of ensemble and adaptive over-sampling is to reduce the induction biases introduced from imbalanced data and to enhance the generalization classification performance of extreme learning machines (ELM). Experimental results show that our ASOBoost method has higher area under the ROC curve (AUC) and G-mean values than many existing class imbalance learning methods. Copyright © 2016 Elsevier Ltd. All rights reserved.
Design of an Evolutionary Approach for Intrusion Detection

PubMed Central

2013-01-01

A novel evolutionary approach is proposed for effective intrusion detection based on benchmark datasets. The proposed approach can generate a pool of noninferior individual solutions and ensemble solutions thereof. The generated ensembles can be used to detect the intrusions accurately. For intrusion detection problem, the proposed approach could consider conflicting objectives simultaneously like detection rate of each attack class, error rate, accuracy, diversity, and so forth. The proposed approach can generate a pool of noninferior solutions and ensembles thereof having optimized trade-offs values of multiple conflicting objectives. In this paper, a three-phase, approach is proposed to generate solutions to a simple chromosome design in the first phase. In the first phase, a Pareto front of noninferior individual solutions is approximated. In the second phase of the proposed approach, the entire solution set is further refined to determine effective ensemble solutions considering solution interaction. In this phase, another improved Pareto front of ensemble solutions over that of individual solutions is approximated. The ensemble solutions in improved Pareto front reported improved detection results based on benchmark datasets for intrusion detection. In the third phase, a combination method like majority voting method is used to fuse the predictions of individual solutions for determining prediction of ensemble solution. Benchmark datasets, namely, KDD cup 1999 and ISCX 2012 dataset, are used to demonstrate and validate the performance of the proposed approach for intrusion detection. The proposed approach can discover individual solutions and ensemble solutions thereof with a good support and a detection rate from benchmark datasets (in comparison with well-known ensemble methods like bagging and boosting). In addition, the proposed approach is a generalized classification approach that is applicable to the problem of any field having multiple conflicting objectives, and a dataset can be represented in the form of labelled instances in terms of its features. PMID:24376390
Machine Learning Based Multi-Physical-Model Blending for Enhancing Renewable Energy Forecast -- Improvement via Situation Dependent Error Correction

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lu, Siyuan; Hwang, Youngdeok; Khabibrakhmanov, Ildar

With increasing penetration of solar and wind energy to the total energy supply mix, the pressing need for accurate energy forecasting has become well-recognized. Here we report the development of a machine-learning based model blending approach for statistically combining multiple meteorological models for improving the accuracy of solar/wind power forecast. Importantly, we demonstrate that in addition to parameters to be predicted (such as solar irradiance and power), including additional atmospheric state parameters which collectively define weather situations as machine learning input provides further enhanced accuracy for the blended result. Functional analysis of variance shows that the error of individual modelmore » has substantial dependence on the weather situation. The machine-learning approach effectively reduces such situation dependent error thus produces more accurate results compared to conventional multi-model ensemble approaches based on simplistic equally or unequally weighted model averaging. Validation over an extended period of time results show over 30% improvement in solar irradiance/power forecast accuracy compared to forecasts based on the best individual model.« less

An Improved Ensemble of Random Vector Functional Link Networks Based on Particle Swarm Optimization with Double Optimization Strategy

PubMed Central

Ling, Qing-Hua; Song, Yu-Qing; Han, Fei; Yang, Dan; Huang, De-Shuang

2016-01-01

For ensemble learning, how to select and combine the candidate classifiers are two key issues which influence the performance of the ensemble system dramatically. Random vector functional link networks (RVFL) without direct input-to-output links is one of suitable base-classifiers for ensemble systems because of its fast learning speed, simple structure and good generalization performance. In this paper, to obtain a more compact ensemble system with improved convergence performance, an improved ensemble of RVFL based on attractive and repulsive particle swarm optimization (ARPSO) with double optimization strategy is proposed. In the proposed method, ARPSO is applied to select and combine the candidate RVFL. As for using ARPSO to select the optimal base RVFL, ARPSO considers both the convergence accuracy on the validation data and the diversity of the candidate ensemble system to build the RVFL ensembles. In the process of combining RVFL, the ensemble weights corresponding to the base RVFL are initialized by the minimum norm least-square method and then further optimized by ARPSO. Finally, a few redundant RVFL is pruned, and thus the more compact ensemble of RVFL is obtained. Moreover, in this paper, theoretical analysis and justification on how to prune the base classifiers on classification problem is presented, and a simple and practically feasible strategy for pruning redundant base classifiers on both classification and regression problems is proposed. Since the double optimization is performed on the basis of the single optimization, the ensemble of RVFL built by the proposed method outperforms that built by some single optimization methods. Experiment results on function approximation and classification problems verify that the proposed method could improve its convergence accuracy as well as reduce the complexity of the ensemble system. PMID:27835638
An Improved Ensemble of Random Vector Functional Link Networks Based on Particle Swarm Optimization with Double Optimization Strategy.

PubMed

Ling, Qing-Hua; Song, Yu-Qing; Han, Fei; Yang, Dan; Huang, De-Shuang

2016-01-01

For ensemble learning, how to select and combine the candidate classifiers are two key issues which influence the performance of the ensemble system dramatically. Random vector functional link networks (RVFL) without direct input-to-output links is one of suitable base-classifiers for ensemble systems because of its fast learning speed, simple structure and good generalization performance. In this paper, to obtain a more compact ensemble system with improved convergence performance, an improved ensemble of RVFL based on attractive and repulsive particle swarm optimization (ARPSO) with double optimization strategy is proposed. In the proposed method, ARPSO is applied to select and combine the candidate RVFL. As for using ARPSO to select the optimal base RVFL, ARPSO considers both the convergence accuracy on the validation data and the diversity of the candidate ensemble system to build the RVFL ensembles. In the process of combining RVFL, the ensemble weights corresponding to the base RVFL are initialized by the minimum norm least-square method and then further optimized by ARPSO. Finally, a few redundant RVFL is pruned, and thus the more compact ensemble of RVFL is obtained. Moreover, in this paper, theoretical analysis and justification on how to prune the base classifiers on classification problem is presented, and a simple and practically feasible strategy for pruning redundant base classifiers on both classification and regression problems is proposed. Since the double optimization is performed on the basis of the single optimization, the ensemble of RVFL built by the proposed method outperforms that built by some single optimization methods. Experiment results on function approximation and classification problems verify that the proposed method could improve its convergence accuracy as well as reduce the complexity of the ensemble system.
Combining Structural Modeling with Ensemble Machine Learning to Accurately Predict Protein Fold Stability and Binding Affinity Effects upon Mutation

PubMed Central

Garcia Lopez, Sebastian; Kim, Philip M.

2014-01-01

Advances in sequencing have led to a rapid accumulation of mutations, some of which are associated with diseases. However, to draw mechanistic conclusions, a biochemical understanding of these mutations is necessary. For coding mutations, accurate prediction of significant changes in either the stability of proteins or their affinity to their binding partners is required. Traditional methods have used semi-empirical force fields, while newer methods employ machine learning of sequence and structural features. Here, we show how combining both of these approaches leads to a marked boost in accuracy. We introduce ELASPIC, a novel ensemble machine learning approach that is able to predict stability effects upon mutation in both, domain cores and domain-domain interfaces. We combine semi-empirical energy terms, sequence conservation, and a wide variety of molecular details with a Stochastic Gradient Boosting of Decision Trees (SGB-DT) algorithm. The accuracy of our predictions surpasses existing methods by a considerable margin, achieving correlation coefficients of 0.77 for stability, and 0.75 for affinity predictions. Notably, we integrated homology modeling to enable proteome-wide prediction and show that accurate prediction on modeled structures is possible. Lastly, ELASPIC showed significant differences between various types of disease-associated mutations, as well as between disease and common neutral mutations. Unlike pure sequence-based prediction methods that try to predict phenotypic effects of mutations, our predictions unravel the molecular details governing the protein instability, and help us better understand the molecular causes of diseases. PMID:25243403
Integration of data mining classification techniques and ensemble learning to identify risk factors and diagnose ovarian cancer recurrence.

PubMed

Tseng, Chih-Jen; Lu, Chi-Jie; Chang, Chi-Chang; Chen, Gin-Den; Cheewakriangkrai, Chalong

2017-05-01

Ovarian cancer is the second leading cause of deaths among gynecologic cancers in the world. Approximately 90% of women with ovarian cancer reported having symptoms long before a diagnosis was made. Literature shows that recurrence should be predicted with regard to their personal risk factors and the clinical symptoms of this devastating cancer. In this study, ensemble learning and five data mining approaches, including support vector machine (SVM), C5.0, extreme learning machine (ELM), multivariate adaptive regression splines (MARS), and random forest (RF), were integrated to rank the importance of risk factors and diagnose the recurrence of ovarian cancer. The medical records and pathologic status were extracted from the Chung Shan Medical University Hospital Tumor Registry. Experimental results illustrated that the integrated C5.0 model is a superior approach in predicting the recurrence of ovarian cancer. Moreover, the classification accuracies of C5.0, ELM, MARS, RF, and SVM indeed increased after using the selected important risk factors as predictors. Our findings suggest that The International Federation of Gynecology and Obstetrics (FIGO), Pathologic M, Age, and Pathologic T were the four most critical risk factors for ovarian cancer recurrence. In summary, the above information can support the important influence of personality and clinical symptom representations on all phases of guide interventions, with the complexities of multiple symptoms associated with ovarian cancer in all phases of the recurrent trajectory. Copyright © 2017 Elsevier B.V. All rights reserved.
Negative Correlation Learning for Customer Churn Prediction: A Comparison Study

PubMed Central

Faris, Hossam

2015-01-01

Recently, telecommunication companies have been paying more attention toward the problem of identification of customer churn behavior. In business, it is well known for service providers that attracting new customers is much more expensive than retaining existing ones. Therefore, adopting accurate models that are able to predict customer churn can effectively help in customer retention campaigns and maximizing the profit. In this paper we will utilize an ensemble of Multilayer perceptrons (MLP) whose training is obtained using negative correlation learning (NCL) for predicting customer churn in a telecommunication company. Experiments results confirm that NCL based MLP ensemble can achieve better generalization performance (high churn rate) compared with ensemble of MLP without NCL (flat ensemble) and other common data mining techniques used for churn analysis. PMID:25879060
Early hospital mortality prediction of intensive care unit patients using an ensemble learning approach.

PubMed

Awad, Aya; Bader-El-Den, Mohamed; McNicholas, James; Briggs, Jim

2017-12-01

Mortality prediction of hospitalized patients is an important problem. Over the past few decades, several severity scoring systems and machine learning mortality prediction models have been developed for predicting hospital mortality. By contrast, early mortality prediction for intensive care unit patients remains an open challenge. Most research has focused on severity of illness scoring systems or data mining (DM) models designed for risk estimation at least 24 or 48h after ICU admission. This study highlights the main data challenges in early mortality prediction in ICU patients and introduces a new machine learning based framework for Early Mortality Prediction for Intensive Care Unit patients (EMPICU). The proposed method is evaluated on the Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database. Mortality prediction models are developed for patients at the age of 16 or above in Medical ICU (MICU), Surgical ICU (SICU) or Cardiac Surgery Recovery Unit (CSRU). We employ the ensemble learning Random Forest (RF), the predictive Decision Trees (DT), the probabilistic Naive Bayes (NB) and the rule-based Projective Adaptive Resonance Theory (PART) models. The primary outcome was hospital mortality. The explanatory variables included demographic, physiological, vital signs and laboratory test variables. Performance measures were calculated using cross-validated area under the receiver operating characteristic curve (AUROC) to minimize bias. 11,722 patients with single ICU stays are considered. Only patients at the age of 16 years old and above in Medical ICU (MICU), Surgical ICU (SICU) or Cardiac Surgery Recovery Unit (CSRU) are considered in this study. The proposed EMPICU framework outperformed standard scoring systems (SOFA, SAPS-I, APACHE-II, NEWS and qSOFA) in terms of AUROC and time (i.e. at 6h compared to 48h or more after admission). The results show that although there are many values missing in the first few hour of ICU admission, there is enough signal to effectively predict mortality during the first 6h of admission. The proposed framework, in particular the one that uses the ensemble learning approach - EMPICU Random Forest (EMPICU-RF) offers a base to construct an effective and novel mortality prediction model in the early hours of an ICU patient admission, with an improved performance profile. Copyright © 2017 Elsevier B.V. All rights reserved.
Cost-sensitive AdaBoost algorithm for ordinal regression based on extreme learning machine.

PubMed

Riccardi, Annalisa; Fernández-Navarro, Francisco; Carloni, Sante

2014-10-01

In this paper, the well known stagewise additive modeling using a multiclass exponential (SAMME) boosting algorithm is extended to address problems where there exists a natural order in the targets using a cost-sensitive approach. The proposed ensemble model uses an extreme learning machine (ELM) model as a base classifier (with the Gaussian kernel and the additional regularization parameter). The closed form of the derived weighted least squares problem is provided, and it is employed to estimate analytically the parameters connecting the hidden layer to the output layer at each iteration of the boosting algorithm. Compared to the state-of-the-art boosting algorithms, in particular those using ELM as base classifier, the suggested technique does not require the generation of a new training dataset at each iteration. The adoption of the weighted least squares formulation of the problem has been presented as an unbiased and alternative approach to the already existing ELM boosting techniques. Moreover, the addition of a cost model for weighting the patterns, according to the order of the targets, enables the classifier to tackle ordinal regression problems further. The proposed method has been validated by an experimental study by comparing it with already existing ensemble methods and ELM techniques for ordinal regression, showing competitive results.
Synaptic plasticity associated with a memory engram in the basolateral amygdala.

PubMed

Nonaka, Ayako; Toyoda, Takeshi; Miura, Yuki; Hitora-Imamura, Natsuko; Naka, Masamitsu; Eguchi, Megumi; Yamaguchi, Shun; Ikegaya, Yuji; Matsuki, Norio; Nomura, Hiroshi

2014-07-09

Synaptic plasticity is a cellular mechanism putatively underlying learning and memory. However, it is unclear whether learning induces synaptic modification globally or only in a subset of neurons in associated brain regions. In this study, we genetically identified neurons activated during contextual fear learning and separately recorded synaptic efficacy from recruited and nonrecruited neurons in the mouse basolateral amygdala (BLA). We found that the fear learning induces presynaptic potentiation, which was reflected by an increase in the miniature EPSC frequency and by a decrease in the paired-pulse ratio. Changes occurred only in the cortical synapses targeting the BLA neurons that were recruited into the fear memory trace. Furthermore, we found that fear learning reorganizes the neuronal ensemble responsive to the conditioning context in conjunction with the synaptic plasticity. In particular, the neuronal activity during learning was associated with the neuronal recruitment into the context-responsive ensemble. These findings suggest that synaptic plasticity in a subset of BLA neurons contributes to fear memory expression through ensemble reorganization. Copyright © 2014 the authors 0270-6474/14/349305-05$15.00/0.
A Machine Learning Ensemble Classifier for Early Prediction of Diabetic Retinopathy.

PubMed

S K, Somasundaram; P, Alli

2017-11-09

The main complication of diabetes is Diabetic retinopathy (DR), retinal vascular disease and it leads to the blindness. Regular screening for early DR disease detection is considered as an intensive labor and resource oriented task. Therefore, automatic detection of DR diseases is performed only by using the computational technique is the great solution. An automatic method is more reliable to determine the presence of an abnormality in Fundus images (FI) but, the classification process is poorly performed. Recently, few research works have been designed for analyzing texture discrimination capacity in FI to distinguish the healthy images. However, the feature extraction (FE) process was not performed well, due to the high dimensionality. Therefore, to identify retinal features for DR disease diagnosis and early detection using Machine Learning and Ensemble Classification method, called, Machine Learning Bagging Ensemble Classifier (ML-BEC) is designed. The ML-BEC method comprises of two stages. The first stage in ML-BEC method comprises extraction of the candidate objects from Retinal Images (RI). The candidate objects or the features for DR disease diagnosis include blood vessels, optic nerve, neural tissue, neuroretinal rim, optic disc size, thickness and variance. These features are initially extracted by applying Machine Learning technique called, t-distributed Stochastic Neighbor Embedding (t-SNE). Besides, t-SNE generates a probability distribution across high-dimensional images where the images are separated into similar and dissimilar pairs. Then, t-SNE describes a similar probability distribution across the points in the low-dimensional map. This lessens the Kullback-Leibler divergence among two distributions regarding the locations of the points on the map. The second stage comprises of application of ensemble classifiers to the extracted features for providing accurate analysis of digital FI using machine learning. In this stage, an automatic detection of DR screening system using Bagging Ensemble Classifier (BEC) is investigated. With the help of voting the process in ML-BEC, bagging minimizes the error due to variance of the base classifier. With the publicly available retinal image databases, our classifier is trained with 25% of RI. Results show that the ensemble classifier can achieve better classification accuracy (CA) than single classification models. Empirical experiments suggest that the machine learning-based ensemble classifier is efficient for further reducing DR classification time (CT).
IDM-PhyChm-Ens: intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids.

PubMed

Ali, Safdar; Majid, Abdul; Khan, Asifullah

2014-04-01

Development of an accurate and reliable intelligent decision-making method for the construction of cancer diagnosis system is one of the fast growing research areas of health sciences. Such decision-making system can provide adequate information for cancer diagnosis and drug discovery. Descriptors derived from physicochemical properties of protein sequences are very useful for classifying cancerous proteins. Recently, several interesting research studies have been reported on breast cancer classification. To this end, we propose the exploitation of the physicochemical properties of amino acids in protein primary sequences such as hydrophobicity (Hd) and hydrophilicity (Hb) for breast cancer classification. Hd and Hb properties of amino acids, in recent literature, are reported to be quite effective in characterizing the constituent amino acids and are used to study protein foldings, interactions, structures, and sequence-order effects. Especially, using these physicochemical properties, we observed that proline, serine, tyrosine, cysteine, arginine, and asparagine amino acids offer high discrimination between cancerous and healthy proteins. In addition, unlike traditional ensemble classification approaches, the proposed 'IDM-PhyChm-Ens' method was developed by combining the decision spaces of a specific classifier trained on different feature spaces. The different feature spaces used were amino acid composition, split amino acid composition, and pseudo amino acid composition. Consequently, we have exploited different feature spaces using Hd and Hb properties of amino acids to develop an accurate method for classification of cancerous protein sequences. We developed ensemble classifiers using diverse learning algorithms such as random forest (RF), support vector machines (SVM), and K-nearest neighbor (KNN) trained on different feature spaces. We observed that ensemble-RF, in case of cancer classification, performed better than ensemble-SVM and ensemble-KNN. Our analysis demonstrates that ensemble-RF, ensemble-SVM and ensemble-KNN are more effective than their individual counterparts. The proposed 'IDM-PhyChm-Ens' method has shown improved performance compared to existing techniques.
Incremental learning of concept drift in nonstationary environments.

PubMed

Elwell, Ryan; Polikar, Robi

2011-10-01

We introduce an ensemble of classifiers-based approach for incremental learning of concept drift, characterized by nonstationary environments (NSEs), where the underlying data distributions change over time. The proposed algorithm, named Learn(++). NSE, learns from consecutive batches of data without making any assumptions on the nature or rate of drift; it can learn from such environments that experience constant or variable rate of drift, addition or deletion of concept classes, as well as cyclical drift. The algorithm learns incrementally, as other members of the Learn(++) family of algorithms, that is, without requiring access to previously seen data. Learn(++). NSE trains one new classifier for each batch of data it receives, and combines these classifiers using a dynamically weighted majority voting. The novelty of the approach is in determining the voting weights, based on each classifier's time-adjusted accuracy on current and past environments. This approach allows the algorithm to recognize, and act accordingly, to the changes in underlying data distributions, as well as to a possible reoccurrence of an earlier distribution. We evaluate the algorithm on several synthetic datasets designed to simulate a variety of nonstationary environments, as well as a real-world weather prediction dataset. Comparisons with several other approaches are also included. Results indicate that Learn(++). NSE can track the changing environments very closely, regardless of the type of concept drift. To allow future use, comparison and benchmarking by interested researchers, we also release our data used in this paper. © 2011 IEEE
Multi-Centrality Graph Spectral Decompositions and Their Application to Cyber Intrusion Detection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Pin-Yu; Choudhury, Sutanay; Hero, Alfred

Many modern datasets can be represented as graphs and hence spectral decompositions such as graph principal component analysis (PCA) can be useful. Distinct from previous graph decomposition approaches based on subspace projection of a single topological feature, e.g., the centered graph adjacency matrix (graph Laplacian), we propose spectral decomposition approaches to graph PCA and graph dictionary learning that integrate multiple features, including graph walk statistics, centrality measures and graph distances to reference nodes. In this paper we propose a new PCA method for single graph analysis, called multi-centrality graph PCA (MC-GPCA), and a new dictionary learning method for ensembles ofmore » graphs, called multi-centrality graph dictionary learning (MC-GDL), both based on spectral decomposition of multi-centrality matrices. As an application to cyber intrusion detection, MC-GPCA can be an effective indicator of anomalous connectivity pattern and MC-GDL can provide discriminative basis for attack classification.« less
Drug-target interaction prediction using ensemble learning and dimensionality reduction.

PubMed

Ezzat, Ali; Wu, Min; Li, Xiao-Li; Kwoh, Chee-Keong

2017-10-01

Experimental prediction of drug-target interactions is expensive, time-consuming and tedious. Fortunately, computational methods help narrow down the search space for interaction candidates to be further examined via wet-lab techniques. Nowadays, the number of attributes/features for drugs and targets, as well as the amount of their interactions, are increasing, making these computational methods inefficient or occasionally prohibitive. This motivates us to derive a reduced feature set for prediction. In addition, since ensemble learning techniques are widely used to improve the classification performance, it is also worthwhile to design an ensemble learning framework to enhance the performance for drug-target interaction prediction. In this paper, we propose a framework for drug-target interaction prediction leveraging both feature dimensionality reduction and ensemble learning. First, we conducted feature subspacing to inject diversity into the classifier ensemble. Second, we applied three different dimensionality reduction methods to the subspaced features. Third, we trained homogeneous base learners with the reduced features and then aggregated their scores to derive the final predictions. For base learners, we selected two classifiers, namely Decision Tree and Kernel Ridge Regression, resulting in two variants of ensemble models, EnsemDT and EnsemKRR, respectively. In our experiments, we utilized AUC (Area under ROC Curve) as an evaluation metric. We compared our proposed methods with various state-of-the-art methods under 5-fold cross validation. Experimental results showed EnsemKRR achieving the highest AUC (94.3%) for predicting drug-target interactions. In addition, dimensionality reduction helped improve the performance of EnsemDT. In conclusion, our proposed methods produced significant improvements for drug-target interaction prediction. Copyright © 2017 Elsevier Inc. All rights reserved.
Convolutional Neural Network Based on Extreme Learning Machine for Maritime Ships Recognition in Infrared Images.

PubMed

Khellal, Atmane; Ma, Hongbin; Fei, Qing

2018-05-09

The success of Deep Learning models, notably convolutional neural networks (CNNs), makes them the favorable solution for object recognition systems in both visible and infrared domains. However, the lack of training data in the case of maritime ships research leads to poor performance due to the problem of overfitting. In addition, the back-propagation algorithm used to train CNN is very slow and requires tuning many hyperparameters. To overcome these weaknesses, we introduce a new approach fully based on Extreme Learning Machine (ELM) to learn useful CNN features and perform a fast and accurate classification, which is suitable for infrared-based recognition systems. The proposed approach combines an ELM based learning algorithm to train CNN for discriminative features extraction and an ELM based ensemble for classification. The experimental results on VAIS dataset, which is the largest dataset of maritime ships, confirm that the proposed approach outperforms the state-of-the-art models in term of generalization performance and training speed. For instance, the proposed model is up to 950 times faster than the traditional back-propagation based training of convolutional neural networks, primarily for low-level features extraction.
Learning and Information Approaches for Inference in Dynamic Data-Driven Geophysical Applications

NASA Astrophysics Data System (ADS)

Ravela, S.

2015-12-01

Many Geophysical inference problems are characterized by non-linear processes, high-dimensional models and complex uncertainties. A dynamic coupling between models, estimation, and sampling is typically sought to efficiently characterize and reduce uncertainty. This process is however fraught with several difficulties. Among them, the key difficulties are the ability to deal with model errors, efficacy of uncertainty quantification and data assimilation. In this presentation, we present three key ideas from learning and intelligent systems theory and apply them to two geophysical applications. The first idea is the use of Ensemble Learning to compensate for model error, the second is to develop tractable Information Theoretic Learning to deal with non-Gaussianity in inference, and the third is a Manifold Resampling technique for effective uncertainty quantification. We apply these methods, first to the development of a cooperative autonomous observing system using sUAS for studying coherent structures. We apply this to Second, we apply this to the problem of quantifying risk from hurricanes and storm surges in a changing climate. Results indicate that learning approaches can enable new effectiveness in cases where standard approaches to model reduction, uncertainty quantification and data assimilation fail.
A Corpus-Based Approach for Automatic Thai Unknown Word Recognition Using Boosting Techniques

NASA Astrophysics Data System (ADS)

Techo, Jakkrit; Nattee, Cholwich; Theeramunkong, Thanaruk

While classification techniques can be applied for automatic unknown word recognition in a language without word boundary, it faces with the problem of unbalanced datasets where the number of positive unknown word candidates is dominantly smaller than that of negative candidates. To solve this problem, this paper presents a corpus-based approach that introduces a so-called group-based ranking evaluation technique into ensemble learning in order to generate a sequence of classification models that later collaborate to select the most probable unknown word from multiple candidates. Given a classification model, the group-based ranking evaluation (GRE) is applied to construct a training dataset for learning the succeeding model, by weighing each of its candidates according to their ranks and correctness when the candidates of an unknown word are considered as one group. A number of experiments have been conducted on a large Thai medical text to evaluate performance of the proposed group-based ranking evaluation approach, namely V-GRE, compared to the conventional naïve Bayes classifier and our vanilla version without ensemble learning. As the result, the proposed method achieves an accuracy of 90.93±0.50% when the first rank is selected while it gains 97.26±0.26% when the top-ten candidates are considered, that is 8.45% and 6.79% improvement over the conventional record-based naïve Bayes classifier and the vanilla version. Another result on applying only best features show 93.93±0.22% and up to 98.85±0.15% accuracy for top-1 and top-10, respectively. They are 3.97% and 9.78% improvement over naive Bayes and the vanilla version. Finally, an error analysis is given.
Improving wave forecasting by integrating ensemble modelling and machine learning

NASA Astrophysics Data System (ADS)

O'Donncha, F.; Zhang, Y.; James, S. C.

2017-12-01

Modern smart-grid networks use technologies to instantly relay information on supply and demand to support effective decision making. Integration of renewable-energy resources with these systems demands accurate forecasting of energy production (and demand) capacities. For wave-energy converters, this requires wave-condition forecasting to enable estimates of energy production. Current operational wave forecasting systems exhibit substantial errors with wave-height RMSEs of 40 to 60 cm being typical, which limits the reliability of energy-generation predictions thereby impeding integration with the distribution grid. In this study, we integrate physics-based models with statistical learning aggregation techniques that combine forecasts from multiple, independent models into a single "best-estimate" prediction of the true state. The Simulating Waves Nearshore physics-based model is used to compute wind- and currents-augmented waves in the Monterey Bay area. Ensembles are developed based on multiple simulations perturbing input data (wave characteristics supplied at the model boundaries and winds) to the model. A learning-aggregation technique uses past observations and past model forecasts to calculate a weight for each model. The aggregated forecasts are compared to observation data to quantify the performance of the model ensemble and aggregation techniques. The appropriately weighted ensemble model outperforms an individual ensemble member with regard to forecasting wave conditions.
Deep ensemble learning of virtual endoluminal views for polyp detection in CT colonography

NASA Astrophysics Data System (ADS)

Umehara, Kensuke; Näppi, Janne J.; Hironaka, Toru; Regge, Daniele; Ishida, Takayuki; Yoshida, Hiroyuki

2017-03-01

Robust training of a deep convolutional neural network (DCNN) requires a very large number of annotated datasets that are currently not available in CT colonography (CTC). We previously demonstrated that deep transfer learning provides an effective approach for robust application of a DCNN in CTC. However, at high detection accuracy, the differentiation of small polyps from non-polyps was still challenging. In this study, we developed and evaluated a deep ensemble learning (DEL) scheme for reviewing of virtual endoluminal images to improve the performance of computer-aided detection (CADe) of polyps in CTC. Nine different types of image renderings were generated from virtual endoluminal images of polyp candidates detected by a conventional CADe system. Eleven DCNNs that represented three types of publically available pre-trained DCNN models were re-trained by transfer learning to identify polyps from the virtual endoluminal images. A DEL scheme that determines the final detected polyps by a review of the nine types of VE images was developed by combining the DCNNs using a random forest classifier as a meta-classifier. For evaluation, we sampled 154 CTC cases from a large CTC screening trial and divided the cases randomly into a training dataset and a test dataset. At 3.9 falsepositive (FP) detections per patient on average, the detection sensitivities of the conventional CADe system, the highestperforming single DCNN, and the DEL scheme were 81.3%, 90.7%, and 93.5%, respectively, for polyps ≥6 mm in size. For small polyps, the DEL scheme reduced the number of false positives by up to 83% over that of using a single DCNN alone. These preliminary results indicate that the DEL scheme provides an effective approach for improving the polyp detection performance of CADe in CTC, especially for small polyps.
Recent developments in machine learning applications in landslide susceptibility mapping

NASA Astrophysics Data System (ADS)

Lun, Na Kai; Liew, Mohd Shahir; Matori, Abdul Nasir; Zawawi, Noor Amila Wan Abdullah

2017-11-01

While the prediction of spatial distribution of potential landslide occurrences is a primary interest in landslide hazard mitigation, it remains a challenging task. To overcome the scarceness of complete, sufficiently detailed geomorphological attributes and environmental conditions, various machine-learning techniques are increasingly applied to effectively map landslide susceptibility for large regions. Nevertheless, limited review papers are devoted to this field, particularly on the various domain specific applications of machine learning techniques. Available literature often report relatively good predictive performance, however, papers discussing the limitations of each approaches are quite uncommon. The foremost aim of this paper is to narrow these gaps in literature and to review up-to-date machine learning and ensemble learning techniques applied in landslide susceptibility mapping. It provides new readers an introductory understanding on the subject matter and researchers a contemporary review of machine learning advancements alongside the future direction of these techniques in the landslide mitigation field.
Time-course, negative-stain electron microscopy-based analysis for investigating protein-protein interactions at the single-molecule level.

PubMed

Nogal, Bartek; Bowman, Charles A; Ward, Andrew B

2017-11-24

Several biophysical approaches are available to study protein-protein interactions. Most approaches are conducted in bulk solution, and are therefore limited to an average measurement of the ensemble of molecular interactions. Here, we show how single-particle EM can enrich our understanding of protein-protein interactions at the single-molecule level and potentially capture states that are unobservable with ensemble methods because they are below the limit of detection or not conducted on an appropriate time scale. Using the HIV-1 envelope glycoprotein (Env) and its interaction with receptor CD4-binding site neutralizing antibodies as a model system, we both corroborate ensemble kinetics-derived parameters and demonstrate how time-course EM can further dissect stoichiometric states of complexes that are not readily observable with other methods. Visualization of the kinetics and stoichiometry of Env-antibody complexes demonstrated the applicability of our approach to qualitatively and semi-quantitatively differentiate two highly similar neutralizing antibodies. Furthermore, implementation of machine-learning techniques for sorting class averages of these complexes into discrete subclasses of particles helped reduce human bias. Our data provide proof of concept that single-particle EM can be used to generate a "visual" kinetic profile that should be amenable to studying many other protein-protein interactions, is relatively simple and complementary to well-established biophysical approaches. Moreover, our method provides critical insights into broadly neutralizing antibody recognition of Env, which may inform vaccine immunogen design and immunotherapeutic development. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

A Label Propagation Approach for Detecting Buried Objects in Handheld GPR Data

DTIC Science & Technology

2016-04-17

regions of interest that correspond to locations with anomalous signatures. Second, a classifier (or an ensemble of classifiers ) is used to assign a...investigated for almost two decades and several classifiers have been developed. Most of these methods are based on the supervised learning paradigm where...labeled target and clutter signatures are needed to train a classifier to discriminate between the two classes. Typically, large and diverse labeled
Online Heterogeneous Transfer by Hedge Ensemble of Offline and Online Decisions.

PubMed

Yan, Yuguang; Wu, Qingyao; Tan, Mingkui; Ng, Michael K; Min, Huaqing; Tsang, Ivor W

2017-10-10

In this paper, we study the online heterogeneous transfer (OHT) learning problem, where the target data of interest arrive in an online manner, while the source data and auxiliary co-occurrence data are from offline sources and can be easily annotated. OHT is very challenging, since the feature spaces of the source and target domains are different. To address this, we propose a novel technique called OHT by hedge ensemble by exploiting both offline knowledge and online knowledge of different domains. To this end, we build an offline decision function based on a heterogeneous similarity that is constructed using labeled source data and unlabeled auxiliary co-occurrence data. After that, an online decision function is learned from the target data. Last, we employ a hedge weighting strategy to combine the offline and online decision functions to exploit knowledge from the source and target domains of different feature spaces. We also provide a theoretical analysis regarding the mistake bounds of the proposed approach. Comprehensive experiments on three real-world data sets demonstrate the effectiveness of the proposed technique.
Distributed deep learning networks among institutions for medical imaging.

PubMed

Chang, Ken; Balachandar, Niranjan; Lam, Carson; Yi, Darvin; Brown, James; Beers, Andrew; Rosen, Bruce; Rubin, Daniel L; Kalpathy-Cramer, Jayashree

2018-03-29

Deep learning has become a promising approach for automated support for clinical diagnosis. When medical data samples are limited, collaboration among multiple institutions is necessary to achieve high algorithm performance. However, sharing patient data often has limitations due to technical, legal, or ethical concerns. In this study, we propose methods of distributing deep learning models as an attractive alternative to sharing patient data. We simulate the distribution of deep learning models across 4 institutions using various training heuristics and compare the results with a deep learning model trained on centrally hosted patient data. The training heuristics investigated include ensembling single institution models, single weight transfer, and cyclical weight transfer. We evaluated these approaches for image classification in 3 independent image collections (retinal fundus photos, mammography, and ImageNet). We find that cyclical weight transfer resulted in a performance that was comparable to that of centrally hosted patient data. We also found that there is an improvement in the performance of cyclical weight transfer heuristic with a high frequency of weight transfer. We show that distributing deep learning models is an effective alternative to sharing patient data. This finding has implications for any collaborative deep learning study.
Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules.

PubMed

Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert

2007-12-01

We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.
Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules.

PubMed

Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert

2007-09-01

We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.
Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules

NASA Astrophysics Data System (ADS)

Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert

2007-12-01

We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.
Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules

NASA Astrophysics Data System (ADS)

Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert

2007-09-01

We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.
Using High Performance Computing to Examine the Processes of Neurogenesis Underlying Pattern Separation and Completion of Episodic Information.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aimone, James Bradley; Bernard, Michael Lewis; Vineyard, Craig Michael

2014-10-01

Adult neurogenesis in the hippocampus region of the brain is a neurobiological process that is believed to contribute to the brain's advanced abilities in complex pattern recognition and cognition. Here, we describe how realistic scale simulations of the neurogenesis process can offer both a unique perspective on the biological relevance of this process and confer computational insights that are suggestive of novel machine learning techniques. First, supercomputer based scaling studies of the neurogenesis process demonstrate how a small fraction of adult-born neurons have a uniquely larger impact in biologically realistic scaled networks. Second, we describe a novel technical approach bymore » which the information content of ensembles of neurons can be estimated. Finally, we illustrate several examples of broader algorithmic impact of neurogenesis, including both extending existing machine learning approaches and novel approaches for intelligent sensing.« less
The Effects of Theta Precession on Spatial Learning and Simplicial Complex Dynamics in a Topological Model of the Hippocampal Spatial Map

PubMed Central

Arai, Mamiko; Brandt, Vicky; Dabaghian, Yuri

2014-01-01

Learning arises through the activity of large ensembles of cells, yet most of the data neuroscientists accumulate is at the level of individual neurons; we need models that can bridge this gap. We have taken spatial learning as our starting point, computationally modeling the activity of place cells using methods derived from algebraic topology, especially persistent homology. We previously showed that ensembles of hundreds of place cells could accurately encode topological information about different environments (“learn” the space) within certain values of place cell firing rate, place field size, and cell population; we called this parameter space the learning region. Here we advance the model both technically and conceptually. To make the model more physiological, we explored the effects of theta precession on spatial learning in our virtual ensembles. Theta precession, which is believed to influence learning and memory, did in fact enhance learning in our model, increasing both speed and the size of the learning region. Interestingly, theta precession also increased the number of spurious loops during simplicial complex formation. We next explored how downstream readout neurons might define co-firing by grouping together cells within different windows of time and thereby capturing different degrees of temporal overlap between spike trains. Our model's optimum coactivity window correlates well with experimental data, ranging from ∼150–200 msec. We further studied the relationship between learning time, window width, and theta precession. Our results validate our topological model for spatial learning and open new avenues for connecting data at the level of individual neurons to behavioral outcomes at the neuronal ensemble level. Finally, we analyzed the dynamics of simplicial complex formation and loop transience to propose that the simplicial complex provides a useful working description of the spatial learning process. PMID:24945927
Enhancing the Popular Music Ensemble Workshop and Maximising Student Potential through the Integration of Creativity

ERIC Educational Resources Information Center

Hall, Richard

2015-01-01

Ensemble work is a key part of any performance-based popular music course and involves students replicating existing music or playing "covers". The creative process in popular music is a collaborative one and the ensemble workshop can be utilised to facilitate active learning and develop musical creativity within a group setting. This is…
Research and application of a novel hybrid decomposition-ensemble learning paradigm with error correction for daily PM10 forecasting

NASA Astrophysics Data System (ADS)

Luo, Hongyuan; Wang, Deyun; Yue, Chenqiang; Liu, Yanling; Guo, Haixiang

2018-03-01

In this paper, a hybrid decomposition-ensemble learning paradigm combining error correction is proposed for improving the forecast accuracy of daily PM10 concentration. The proposed learning paradigm is consisted of the following two sub-models: (1) PM10 concentration forecasting model; (2) error correction model. In the proposed model, fast ensemble empirical mode decomposition (FEEMD) and variational mode decomposition (VMD) are applied to disassemble original PM10 concentration series and error sequence, respectively. The extreme learning machine (ELM) model optimized by cuckoo search (CS) algorithm is utilized to forecast the components generated by FEEMD and VMD. In order to prove the effectiveness and accuracy of the proposed model, two real-world PM10 concentration series respectively collected from Beijing and Harbin located in China are adopted to conduct the empirical study. The results show that the proposed model performs remarkably better than all other considered models without error correction, which indicates the superior performance of the proposed model.
Prediction of plant lncRNA by ensemble machine learning classifiers.

PubMed

Simopoulos, Caitlin M A; Weretilnyk, Elizabeth A; Golding, G Brian

2018-05-02

In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation. Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from Arabidopsis thaliana, Oryza sativa and Eutrema salsugineum ranged from 51 to 83% with the highest agreement in Eutrema salsugineum. Most of the highest ranking predictions from Arabidopsis thaliana were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are identified and functionally verified. This ensemble classifier is an accurate tool that can be used to rank long non-protein coding RNA predictions for use in conjunction with gene expression studies. Selection of plant transcripts with a high potential for regulatory roles as long non-protein coding RNAs will advance research in the elucidation of long non-protein coding RNA function.
The development of ensemble theory. A new glimpse at the history of statistical mechanics

NASA Astrophysics Data System (ADS)

Inaba, Hajime

2015-12-01

This paper investigates the history of statistical mechanics from the viewpoint of the development of the ensemble theory from 1871 to 1902. In 1871, Ludwig Boltzmann introduced a prototype model of an ensemble that represents a polyatomic gas. In 1879, James Clerk Maxwell defined an ensemble as copies of systems of the same energy. Inspired by H.W. Watson, he called his approach "statistical". Boltzmann and Maxwell regarded the ensemble theory as a much more general approach than the kinetic theory. In the 1880s, influenced by Hermann von Helmholtz, Boltzmann made use of ensembles to establish thermodynamic relations. In Elementary Principles in Statistical Mechanics of 1902, Josiah Willard Gibbs tried to get his ensemble theory to mirror thermodynamics, including thermodynamic operations in its scope. Thermodynamics played the role of a "blind guide". His theory of ensembles can be characterized as more mathematically oriented than Einstein's theory proposed in the same year. Mechanical, empirical, and statistical approaches to foundations of statistical mechanics are presented. Although it was formulated in classical terms, the ensemble theory provided an infrastructure still valuable in quantum statistics because of its generality.
Online Cross-Validation-Based Ensemble Learning

PubMed Central

Benkeser, David; Ju, Cheng; Lendle, Sam; van der Laan, Mark

2017-01-01

Online estimators update a current estimate with a new incoming batch of data without having to revisit past data thereby providing streaming estimates that are scalable to big data. We develop flexible, ensemble-based online estimators of an infinite-dimensional target parameter, such as a regression function, in the setting where data are generated sequentially by a common conditional data distribution given summary measures of the past. This setting encompasses a wide range of time-series models and as special case, models for independent and identically distributed data. Our estimator considers a large library of candidate online estimators and uses online cross-validation to identify the algorithm with the best performance. We show that by basing estimates on the cross-validation-selected algorithm, we are asymptotically guaranteed to perform as well as the true, unknown best-performing algorithm. We provide extensions of this approach including online estimation of the optimal ensemble of candidate online estimators. We illustrate excellent performance of our methods using simulations and a real data example where we make streaming predictions of infectious disease incidence using data from a large database. PMID:28474419
Development of full-field optical spatial coherence tomography system for automated identification of malaria using the multilevel ensemble classifier.

PubMed

Singla, Neeru; Srivastava, Vishal; Mehta, Dalip Singh

2018-05-01

Malaria is a life-threatening infectious blood disease affecting humans and other animals caused by parasitic protozoans belonging to the Plasmodium type especially in developing countries. The gold standard method for the detection of malaria is through the microscopic method of chemically treated blood smears. We developed an automated optical spatial coherence tomographic system using a machine learning approach for a fast identification of malaria cells. In this study, 28 samples (15 healthy, 13 malaria infected stages of red blood cells) were imaged by the developed system and 13 features were extracted. We designed a multilevel ensemble-based classifier for the quantitative prediction of different stages of the malaria cells. The proposed classifier was used by repeating k-fold cross validation dataset and achieve a high-average accuracy of 97.9% for identifying malaria infected late trophozoite stage of cells. Overall, our proposed system and multilevel ensemble model has a substantial quantifiable potential to detect the different stages of malaria infection without staining or expert. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Comparing ensemble learning methods based on decision tree classifiers for protein fold recognition.

PubMed

Bardsiri, Mahshid Khatibi; Eftekhari, Mahdi

2014-01-01

In this paper, some methods for ensemble learning of protein fold recognition based on a decision tree (DT) are compared and contrasted against each other over three datasets taken from the literature. According to previously reported studies, the features of the datasets are divided into some groups. Then, for each of these groups, three ensemble classifiers, namely, random forest, rotation forest and AdaBoost.M1 are employed. Also, some fusion methods are introduced for combining the ensemble classifiers obtained in the previous step. After this step, three classifiers are produced based on the combination of classifiers of types random forest, rotation forest and AdaBoost.M1. Finally, the three different classifiers achieved are combined to make an overall classifier. Experimental results show that the overall classifier obtained by the genetic algorithm (GA) weighting fusion method, is the best one in comparison to previously applied methods in terms of classification accuracy.
Machine Learning Predictions of a Multiresolution Climate Model Ensemble

NASA Astrophysics Data System (ADS)

Anderson, Gemma J.; Lucas, Donald D.

2018-05-01

Statistical models of high-resolution climate models are useful for many purposes, including sensitivity and uncertainty analyses, but building them can be computationally prohibitive. We generated a unique multiresolution perturbed parameter ensemble of a global climate model. We use a novel application of a machine learning technique known as random forests to train a statistical model on the ensemble to make high-resolution model predictions of two important quantities: global mean top-of-atmosphere energy flux and precipitation. The random forests leverage cheaper low-resolution simulations, greatly reducing the number of high-resolution simulations required to train the statistical model. We demonstrate that high-resolution predictions of these quantities can be obtained by training on an ensemble that includes only a small number of high-resolution simulations. We also find that global annually averaged precipitation is more sensitive to resolution changes than to any of the model parameters considered.
Bioactive focus in conformational ensembles: a pluralistic approach

NASA Astrophysics Data System (ADS)

Habgood, Matthew

2017-12-01

Computational generation of conformational ensembles is key to contemporary drug design. Selecting the members of the ensemble that will approximate the conformation most likely to bind to a desired target (the bioactive conformation) is difficult, given that the potential energy usually used to generate and rank the ensemble is a notoriously poor discriminator between bioactive and non-bioactive conformations. In this study an approach to generating a focused ensemble is proposed in which each conformation is assigned multiple rankings based not just on potential energy but also on solvation energy, hydrophobic or hydrophilic interaction energy, radius of gyration, and on a statistical potential derived from Cambridge Structural Database data. The best ranked structures derived from each system are then assembled into a new ensemble that is shown to be better focused on bioactive conformations. This pluralistic approach is tested on ensembles generated by the Molecular Operating Environment's Low Mode Molecular Dynamics module, and by the Cambridge Crystallographic Data Centre's conformation generator software.
Sensory Cortical Population Dynamics Uniquely Track Behavior across Learning and Extinction

PubMed Central

Katz, Donald B.

2014-01-01

Neural responses in many cortical regions encode information relevant to behavior: information that necessarily changes as that behavior changes with learning. Although such responses are reasonably theorized to be related to behavior causation, the true nature of that relationship cannot be clarified by simple learning studies, which show primarily that responses change with experience. Neural activity that truly tracks behavior (as opposed to simply changing with experience) will not only change with learning but also change back when that learning is extinguished. Here, we directly probed for this pattern, recording the activity of ensembles of gustatory cortical single neurons as rats that normally consumed sucrose avidly were trained first to reject it (i.e., conditioned taste aversion learning) and then to enjoy it again (i.e., extinction), all within 49 h. Both learning and extinction altered cortical responses, consistent with the suggestion (based on indirect evidence) that extinction is a novel form of learning. But despite the fact that, as expected, postextinction single-neuron responses did not resemble “naive responses,” ensemble response dynamics changed with learning and reverted with extinction: both the speed of stimulus processing and the relationships among ensemble responses to the different stimuli tracked behavioral relevance. These data suggest that population coding is linked to behavior with a fidelity that single-neuron coding is not. PMID:24453316
Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis

PubMed Central

2013-01-01

Background Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. Results We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. Conclusions When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time. PMID:23815620

Online probabilistic learning with an ensemble of forecasts

NASA Astrophysics Data System (ADS)

Thorey, Jean; Mallet, Vivien; Chaussin, Christophe

2016-04-01

Our objective is to produce a calibrated weighted ensemble to forecast a univariate time series. In addition to a meteorological ensemble of forecasts, we rely on observations or analyses of the target variable. The celebrated Continuous Ranked Probability Score (CRPS) is used to evaluate the probabilistic forecasts. However applying the CRPS on weighted empirical distribution functions (deriving from the weighted ensemble) may introduce a bias because of which minimizing the CRPS does not produce the optimal weights. Thus we propose an unbiased version of the CRPS which relies on clusters of members and is strictly proper. We adapt online learning methods for the minimization of the CRPS. These methods generate the weights associated to the members in the forecasted empirical distribution function. The weights are updated before each forecast step using only past observations and forecasts. Our learning algorithms provide the theoretical guarantee that, in the long run, the CRPS of the weighted forecasts is at least as good as the CRPS of any weighted ensemble with weights constant in time. In particular, the performance of our forecast is better than that of any subset ensemble with uniform weights. A noteworthy advantage of our algorithm is that it does not require any assumption on the distributions of the observations and forecasts, both for the application and for the theoretical guarantee to hold. As application example on meteorological forecasts for photovoltaic production integration, we show that our algorithm generates a calibrated probabilistic forecast, with significant performance improvements on probabilistic diagnostic tools (the CRPS, the reliability diagram and the rank histogram).
A hybrid MLP-CNN classifier for very fine resolution remotely sensed image classification

NASA Astrophysics Data System (ADS)

Zhang, Ce; Pan, Xin; Li, Huapeng; Gardiner, Andy; Sargent, Isabel; Hare, Jonathon; Atkinson, Peter M.

2018-06-01

The contextual-based convolutional neural network (CNN) with deep architecture and pixel-based multilayer perceptron (MLP) with shallow structure are well-recognized neural network algorithms, representing the state-of-the-art deep learning method and the classical non-parametric machine learning approach, respectively. The two algorithms, which have very different behaviours, were integrated in a concise and effective way using a rule-based decision fusion approach for the classification of very fine spatial resolution (VFSR) remotely sensed imagery. The decision fusion rules, designed primarily based on the classification confidence of the CNN, reflect the generally complementary patterns of the individual classifiers. In consequence, the proposed ensemble classifier MLP-CNN harvests the complementary results acquired from the CNN based on deep spatial feature representation and from the MLP based on spectral discrimination. Meanwhile, limitations of the CNN due to the adoption of convolutional filters such as the uncertainty in object boundary partition and loss of useful fine spatial resolution detail were compensated. The effectiveness of the ensemble MLP-CNN classifier was tested in both urban and rural areas using aerial photography together with an additional satellite sensor dataset. The MLP-CNN classifier achieved promising performance, consistently outperforming the pixel-based MLP, spectral and textural-based MLP, and the contextual-based CNN in terms of classification accuracy. This research paves the way to effectively address the complicated problem of VFSR image classification.
Ensemble Semi-supervised Frame-work for Brain Magnetic Resonance Imaging Tissue Segmentation.

PubMed

Azmi, Reza; Pishgoo, Boshra; Norozi, Narges; Yeganeh, Samira

2013-04-01

Brain magnetic resonance images (MRIs) tissue segmentation is one of the most important parts of the clinical diagnostic tools. Pixel classification methods have been frequently used in the image segmentation with two supervised and unsupervised approaches up to now. Supervised segmentation methods lead to high accuracy, but they need a large amount of labeled data, which is hard, expensive, and slow to obtain. Moreover, they cannot use unlabeled data to train classifiers. On the other hand, unsupervised segmentation methods have no prior knowledge and lead to low level of performance. However, semi-supervised learning which uses a few labeled data together with a large amount of unlabeled data causes higher accuracy with less trouble. In this paper, we propose an ensemble semi-supervised frame-work for segmenting of brain magnetic resonance imaging (MRI) tissues that it has been used results of several semi-supervised classifiers simultaneously. Selecting appropriate classifiers has a significant role in the performance of this frame-work. Hence, in this paper, we present two semi-supervised algorithms expectation filtering maximization and MCo_Training that are improved versions of semi-supervised methods expectation maximization and Co_Training and increase segmentation accuracy. Afterward, we use these improved classifiers together with graph-based semi-supervised classifier as components of the ensemble frame-work. Experimental results show that performance of segmentation in this approach is higher than both supervised methods and the individual semi-supervised classifiers.
Visualization and classification of physiological failure modes in ensemble hemorrhage simulation

NASA Astrophysics Data System (ADS)

Zhang, Song; Pruett, William Andrew; Hester, Robert

2015-01-01

In an emergency situation such as hemorrhage, doctors need to predict which patients need immediate treatment and care. This task is difficult because of the diverse response to hemorrhage in human population. Ensemble physiological simulations provide a means to sample a diverse range of subjects and may have a better chance of containing the correct solution. However, to reveal the patterns and trends from the ensemble simulation is a challenging task. We have developed a visualization framework for ensemble physiological simulations. The visualization helps users identify trends among ensemble members, classify ensemble member into subpopulations for analysis, and provide prediction to future events by matching a new patient's data to existing ensembles. We demonstrated the effectiveness of the visualization on simulated physiological data. The lessons learned here can be applied to clinically-collected physiological data in the future.
Lessons Learned from Assimilating Altimeter Data into a Coupled General Circulation Model with the GMAO Augmented Ensemble Kalman Filter

NASA Technical Reports Server (NTRS)

Keppenne, Christian; Vernieres, Guillaume; Rienecker, Michele; Jacob, Jossy; Kovach, Robin

2011-01-01

Satellite altimetry measurements have provided global, evenly distributed observations of the ocean surface since 1993. However, the difficulties introduced by the presence of model biases and the requirement that data assimilation systems extrapolate the sea surface height (SSH) information to the subsurface in order to estimate the temperature, salinity and currents make it difficult to optimally exploit these measurements. This talk investigates the potential of the altimetry data assimilation once the biases are accounted for with an ad hoc bias estimation scheme. Either steady-state or state-dependent multivariate background-error covariances from an ensemble of model integrations are used to address the problem of extrapolating the information to the sub-surface. The GMAO ocean data assimilation system applied to an ensemble of coupled model instances using the GEOS-5 AGCM coupled to MOM4 is used in the investigation. To model the background error covariances, the system relies on a hybrid ensemble approach in which a small number of dynamically evolved model trajectories is augmented on the one hand with past instances of the state vector along each trajectory and, on the other, with a steady state ensemble of error estimates from a time series of short-term model forecasts. A state-dependent adaptive error-covariance localization and inflation algorithm controls how the SSH information is extrapolated to the sub-surface. A two-step predictor corrector approach is used to assimilate future information. Independent (not-assimilated) temperature and salinity observations from Argo floats are used to validate the assimilation. A two-step projection method in which the system first calculates a SSH increment and then projects this increment vertically onto the temperature, salt and current fields is found to be most effective in reconstructing the sub-surface information. The performance of the system in reconstructing the sub-surface fields is particularly impressive for temperature, but not as satisfactory for salt.
Algorithms that Defy the Gravity of Learning Curve

DTIC Science & Technology

2017-04-28

three nearest neighbour-based anomaly detectors, i.e., an ensemble of nearest neigh- bours, a recent nearest neighbour-based ensemble method called iNNE...streams. Note that the change in sample size does not alter the geometrical data characteristics discussed here. 3.1 Experimental Methodology ...need to be answered. 3.6 Comparison with conventional ensemble methods Given the theoretical results, the third aim of this project (i.e., identify the
Can-Evo-Ens: Classifier stacking based evolutionary ensemble system for prediction of human breast cancer using amino acid sequences.

PubMed

Ali, Safdar; Majid, Abdul

2015-04-01

The diagnostic of human breast cancer is an intricate process and specific indicators may produce negative results. In order to avoid misleading results, accurate and reliable diagnostic system for breast cancer is indispensable. Recently, several interesting machine-learning (ML) approaches are proposed for prediction of breast cancer. To this end, we developed a novel classifier stacking based evolutionary ensemble system "Can-Evo-Ens" for predicting amino acid sequences associated with breast cancer. In this paper, first, we selected four diverse-type of ML algorithms of Naïve Bayes, K-Nearest Neighbor, Support Vector Machines, and Random Forest as base-level classifiers. These classifiers are trained individually in different feature spaces using physicochemical properties of amino acids. In order to exploit the decision spaces, the preliminary predictions of base-level classifiers are stacked. Genetic programming (GP) is then employed to develop a meta-classifier that optimal combine the predictions of the base classifiers. The most suitable threshold value of the best-evolved predictor is computed using Particle Swarm Optimization technique. Our experiments have demonstrated the robustness of Can-Evo-Ens system for independent validation dataset. The proposed system has achieved the highest value of Area Under Curve (AUC) of ROC Curve of 99.95% for cancer prediction. The comparative results revealed that proposed approach is better than individual ML approaches and conventional ensemble approaches of AdaBoostM1, Bagging, GentleBoost, and Random Subspace. It is expected that the proposed novel system would have a major impact on the fields of Biomedical, Genomics, Proteomics, Bioinformatics, and Drug Development. Copyright © 2015 Elsevier Inc. All rights reserved.
A hybrid deep learning approach to predict malignancy of breast lesions using mammograms

NASA Astrophysics Data System (ADS)

Wang, Yunzhi; Heidari, Morteza; Mirniaharikandehei, Seyedehnafiseh; Gong, Jing; Qian, Wei; Qiu, Yuchen; Zheng, Bin

2018-03-01

Applying deep learning technology to medical imaging informatics field has been recently attracting extensive research interest. However, the limited medical image dataset size often reduces performance and robustness of the deep learning based computer-aided detection and/or diagnosis (CAD) schemes. In attempt to address this technical challenge, this study aims to develop and evaluate a new hybrid deep learning based CAD approach to predict likelihood of a breast lesion detected on mammogram being malignant. In this approach, a deep Convolutional Neural Network (CNN) was firstly pre-trained using the ImageNet dataset and serve as a feature extractor. A pseudo-color Region of Interest (ROI) method was used to generate ROIs with RGB channels from the mammographic images as the input to the pre-trained deep network. The transferred CNN features from different layers of the CNN were then obtained and a linear support vector machine (SVM) was trained for the prediction task. By applying to a dataset involving 301 suspicious breast lesions and using a leave-one-case-out validation method, the areas under the ROC curves (AUC) = 0.762 and 0.792 using the traditional CAD scheme and the proposed deep learning based CAD scheme, respectively. An ensemble classifier that combines the classification scores generated by the two schemes yielded an improved AUC value of 0.813. The study results demonstrated feasibility and potentially improved performance of applying a new hybrid deep learning approach to develop CAD scheme using a relatively small dataset of medical images.
Developing an approach to effectively use super ensemble experiments for the projection of hydrological extremes under climate change

NASA Astrophysics Data System (ADS)

Watanabe, S.; Kim, H.; Utsumi, N.

2017-12-01

This study aims to develop a new approach which projects hydrology under climate change using super ensemble experiments. The use of multiple ensemble is essential for the estimation of extreme, which is a major issue in the impact assessment of climate change. Hence, the super ensemble experiments are recently conducted by some research programs. While it is necessary to use multiple ensemble, the multiple calculations of hydrological simulation for each output of ensemble simulations needs considerable calculation costs. To effectively use the super ensemble experiments, we adopt a strategy to use runoff projected by climate models directly. The general approach of hydrological projection is to conduct hydrological model simulations which include land-surface and river routing process using atmospheric boundary conditions projected by climate models as inputs. This study, on the other hand, simulates only river routing model using runoff projected by climate models. In general, the climate model output is systematically biased so that a preprocessing which corrects such bias is necessary for impact assessments. Various bias correction methods have been proposed, but, to the best of our knowledge, no method has proposed for variables other than surface meteorology. Here, we newly propose a method for utilizing the projected future runoff directly. The developed method estimates and corrects the bias based on the pseudo-observation which is a result of retrospective offline simulation. We show an application of this approach to the super ensemble experiments conducted under the program of Half a degree Additional warming, Prognosis and Projected Impacts (HAPPI). More than 400 ensemble experiments from multiple climate models are available. The results of the validation using historical simulations by HAPPI indicates that the output of this approach can effectively reproduce retrospective runoff variability. Likewise, the bias of runoff from super ensemble climate projections is corrected, and the impact of climate change on hydrologic extremes is assessed in a cost-efficient way.
Semantic labeling of high-resolution aerial images using an ensemble of fully convolutional networks

NASA Astrophysics Data System (ADS)

Sun, Xiaofeng; Shen, Shuhan; Lin, Xiangguo; Hu, Zhanyi

2017-10-01

High-resolution remote sensing data classification has been a challenging and promising research topic in the community of remote sensing. In recent years, with the rapid advances of deep learning, remarkable progress has been made in this field, which facilitates a transition from hand-crafted features designing to an automatic end-to-end learning. A deep fully convolutional networks (FCNs) based ensemble learning method is proposed to label the high-resolution aerial images. To fully tap the potentials of FCNs, both the Visual Geometry Group network and a deeper residual network, ResNet, are employed. Furthermore, to enlarge training samples with diversity and gain better generalization, in addition to the commonly used data augmentation methods (e.g., rotation, multiscale, and aspect ratio) in the literature, aerial images from other datasets are also collected for cross-scene learning. Finally, we combine these learned models to form an effective FCN ensemble and refine the results using a fully connected conditional random field graph model. Experiments on the ISPRS 2-D Semantic Labeling Contest dataset show that our proposed end-to-end classification method achieves an overall accuracy of 90.7%, a state-of-the-art in the field.
Genetic Feedback Regulation of Frontal Cortical Neuronal Ensembles Through Activity-Dependent Arc Expression and Dopaminergic Input.

PubMed

Mastwal, Surjeet; Cao, Vania; Wang, Kuan Hong

2016-01-01

Mental functions involve coordinated activities of specific neuronal ensembles that are embedded in complex brain circuits. Aberrant neuronal ensemble dynamics is thought to form the neurobiological basis of mental disorders. A major challenge in mental health research is to identify these cellular ensembles and determine what molecular mechanisms constrain their emergence and consolidation during development and learning. Here, we provide a perspective based on recent studies that use activity-dependent gene Arc/Arg3.1 as a cellular marker to identify neuronal ensembles and a molecular probe to modulate circuit functions. These studies have demonstrated that the transcription of Arc is activated in selective groups of frontal cortical neurons in response to specific behavioral tasks. Arc expression regulates the persistent firing of individual neurons and predicts the consolidation of neuronal ensembles during repeated learning. Therefore, the Arc pathway represents a prototypical example of activity-dependent genetic feedback regulation of neuronal ensembles. The activation of this pathway in the frontal cortex starts during early postnatal development and requires dopaminergic (DA) input. Conversely, genetic disruption of Arc leads to a hypoactive mesofrontal dopamine circuit and its related cognitive deficit. This mutual interaction suggests an auto-regulatory mechanism to amplify the impact of neuromodulators and activity-regulated genes during postnatal development. Such a mechanism may contribute to the association of mutations in dopamine and Arc pathways with neurodevelopmental psychiatric disorders. As the mesofrontal dopamine circuit shows extensive activity-dependent developmental plasticity, activity-guided modulation of DA projections or Arc ensembles during development may help to repair circuit deficits related to neuropsychiatric disorders.
Ensemble based on static classifier selection for automated diagnosis of Mild Cognitive Impairment.

PubMed

Nanni, Loris; Lumini, Alessandra; Zaffonato, Nicolò

2018-05-15

Alzheimer's disease (AD) is the most common cause of neurodegenerative dementia in the elderly population. Scientific research is very active in the challenge of designing automated approaches to achieve an early and certain diagnosis. Recently an international competition among AD predictors has been organized: "A Machine learning neuroimaging challenge for automated diagnosis of Mild Cognitive Impairment" (MLNeCh). This competition is based on pre-processed sets of T1-weighted Magnetic Resonance Images (MRI) to be classified in four categories: stable AD, individuals with MCI who converted to AD, individuals with MCI who did not convert to AD and healthy controls. In this work, we propose a method to perform early diagnosis of AD, which is evaluated on MLNeCh dataset. Since the automatic classification of AD is based on the use of feature vectors of high dimensionality, different techniques of feature selection/reduction are compared in order to avoid the curse-of-dimensionality problem, then the classification method is obtained as the combination of Support Vector Machines trained using different clusters of data extracted from the whole training set. The multi-classifier approach proposed in this work outperforms all the stand-alone method tested in our experiments. The final ensemble is based on a set of classifiers, each trained on a different cluster of the training data. The proposed ensemble has the great advantage of performing well using a very reduced version of the data (the reduction factor is more than 90%). The MATLAB code for the ensemble of classifiers will be publicly available 1 to other researchers for future comparisons. Copyright © 2017 Elsevier B.V. All rights reserved.
Harnessing Disordered-Ensemble Quantum Dynamics for Machine Learning

NASA Astrophysics Data System (ADS)

Fujii, Keisuke; Nakajima, Kohei

2017-08-01

The quantum computer has an amazing potential of fast information processing. However, the realization of a digital quantum computer is still a challenging problem requiring highly accurate controls and key application strategies. Here we propose a platform, quantum reservoir computing, to solve these issues successfully by exploiting the natural quantum dynamics of ensemble systems, which are ubiquitous in laboratories nowadays, for machine learning. This framework enables ensemble quantum systems to universally emulate nonlinear dynamical systems including classical chaos. A number of numerical experiments show that quantum systems consisting of 5-7 qubits possess computational capabilities comparable to conventional recurrent neural networks of 100-500 nodes. This discovery opens up a paradigm for information processing with artificial intelligence powered by quantum physics.
Classification of Parkinson's disease utilizing multi-edit nearest-neighbor and ensemble learning algorithms with speech samples.

PubMed

Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang

2016-11-16

The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.
Refining Markov state models for conformational dynamics using ensemble-averaged data and time-series trajectories

NASA Astrophysics Data System (ADS)

Matsunaga, Y.; Sugita, Y.

2018-06-01

A data-driven modeling scheme is proposed for conformational dynamics of biomolecules based on molecular dynamics (MD) simulations and experimental measurements. In this scheme, an initial Markov State Model (MSM) is constructed from MD simulation trajectories, and then, the MSM parameters are refined using experimental measurements through machine learning techniques. The second step can reduce the bias of MD simulation results due to inaccurate force-field parameters. Either time-series trajectories or ensemble-averaged data are available as a training data set in the scheme. Using a coarse-grained model of a dye-labeled polyproline-20, we compare the performance of machine learning estimations from the two types of training data sets. Machine learning from time-series data could provide the equilibrium populations of conformational states as well as their transition probabilities. It estimates hidden conformational states in more robust ways compared to that from ensemble-averaged data although there are limitations in estimating the transition probabilities between minor states. We discuss how to use the machine learning scheme for various experimental measurements including single-molecule time-series trajectories.
Advanced ensemble modelling of flexible macromolecules using X-ray solution scattering.

PubMed

Tria, Giancarlo; Mertens, Haydyn D T; Kachala, Michael; Svergun, Dmitri I

2015-03-01

Dynamic ensembles of macromolecules mediate essential processes in biology. Understanding the mechanisms driving the function and molecular interactions of 'unstructured' and flexible molecules requires alternative approaches to those traditionally employed in structural biology. Small-angle X-ray scattering (SAXS) is an established method for structural characterization of biological macromolecules in solution, and is directly applicable to the study of flexible systems such as intrinsically disordered proteins and multi-domain proteins with unstructured regions. The Ensemble Optimization Method (EOM) [Bernadó et al. (2007 ▶). J. Am. Chem. Soc. 129, 5656-5664] was the first approach introducing the concept of ensemble fitting of the SAXS data from flexible systems. In this approach, a large pool of macromolecules covering the available conformational space is generated and a sub-ensemble of conformers coexisting in solution is selected guided by the fit to the experimental SAXS data. This paper presents a series of new developments and advancements to the method, including significantly enhanced functionality and also quantitative metrics for the characterization of the results. Building on the original concept of ensemble optimization, the algorithms for pool generation have been redesigned to allow for the construction of partially or completely symmetric oligomeric models, and the selection procedure was improved to refine the size of the ensemble. Quantitative measures of the flexibility of the system studied, based on the characteristic integral parameters of the selected ensemble, are introduced. These improvements are implemented in the new EOM version 2.0, and the capabilities as well as inherent limitations of the ensemble approach in SAXS, and of EOM 2.0 in particular, are discussed.
Towards an improved ensemble precipitation forecast: A probabilistic post-processing approach

NASA Astrophysics Data System (ADS)

Khajehei, Sepideh; Moradkhani, Hamid

2017-03-01

Recently, ensemble post-processing (EPP) has become a commonly used approach for reducing the uncertainty in forcing data and hence hydrologic simulation. The procedure was introduced to build ensemble precipitation forecasts based on the statistical relationship between observations and forecasts. More specifically, the approach relies on a transfer function that is developed based on a bivariate joint distribution between the observations and the simulations in the historical period. The transfer function is used to post-process the forecast. In this study, we propose a Bayesian EPP approach based on copula functions (COP-EPP) to improve the reliability of the precipitation ensemble forecast. Evaluation of the copula-based method is carried out by comparing the performance of the generated ensemble precipitation with the outputs from an existing procedure, i.e. mixed type meta-Gaussian distribution. Monthly precipitation from Climate Forecast System Reanalysis (CFS) and gridded observation from Parameter-Elevation Relationships on Independent Slopes Model (PRISM) have been employed to generate the post-processed ensemble precipitation. Deterministic and probabilistic verification frameworks are utilized in order to evaluate the outputs from the proposed technique. Distribution of seasonal precipitation for the generated ensemble from the copula-based technique is compared to the observation and raw forecasts for three sub-basins located in the Western United States. Results show that both techniques are successful in producing reliable and unbiased ensemble forecast, however, the COP-EPP demonstrates considerable improvement in the ensemble forecast in both deterministic and probabilistic verification, in particular in characterizing the extreme events in wet seasons.
HPSLPred: An Ensemble Multi-Label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source.

PubMed

Wan, Shixiang; Duan, Yucong; Zou, Quan

2017-09-01

Predicting the subcellular localization of proteins is an important and challenging problem. Traditional experimental approaches are often expensive and time-consuming. Consequently, a growing number of research efforts employ a series of machine learning approaches to predict the subcellular location of proteins. There are two main challenges among the state-of-the-art prediction methods. First, most of the existing techniques are designed to deal with multi-class rather than multi-label classification, which ignores connections between multiple labels. In reality, multiple locations of particular proteins imply that there are vital and unique biological significances that deserve special focus and cannot be ignored. Second, techniques for handling imbalanced data in multi-label classification problems are necessary, but never employed. For solving these two issues, we have developed an ensemble multi-label classifier called HPSLPred, which can be applied for multi-label classification with an imbalanced protein source. For convenience, a user-friendly webserver has been established at http://server.malab.cn/HPSLPred. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
On the generation of climate model ensembles

NASA Astrophysics Data System (ADS)

Haughton, Ned; Abramowitz, Gab; Pitman, Andy; Phipps, Steven J.

2014-10-01

Climate model ensembles are used to estimate uncertainty in future projections, typically by interpreting the ensemble distribution for a particular variable probabilistically. There are, however, different ways to produce climate model ensembles that yield different results, and therefore different probabilities for a future change in a variable. Perhaps equally importantly, there are different approaches to interpreting the ensemble distribution that lead to different conclusions. Here we use a reduced-resolution climate system model to compare three common ways to generate ensembles: initial conditions perturbation, physical parameter perturbation, and structural changes. Despite these three approaches conceptually representing very different categories of uncertainty within a modelling system, when comparing simulations to observations of surface air temperature they can be very difficult to separate. Using the twentieth century CMIP5 ensemble for comparison, we show that initial conditions ensembles, in theory representing internal variability, significantly underestimate observed variance. Structural ensembles, perhaps less surprisingly, exhibit over-dispersion in simulated variance. We argue that future climate model ensembles may need to include parameter or structural perturbation members in addition to perturbed initial conditions members to ensure that they sample uncertainty due to internal variability more completely. We note that where ensembles are over- or under-dispersive, such as for the CMIP5 ensemble, estimates of uncertainty need to be treated with care.
Supermodeling With A Global Atmospheric Model

NASA Astrophysics Data System (ADS)

Wiegerinck, Wim; Burgers, Willem; Selten, Frank

2013-04-01

In weather and climate prediction studies it often turns out to be the case that the multi-model ensemble mean prediction has the best prediction skill scores. One possible explanation is that the major part of the model error is random and is averaged out in the ensemble mean. In the standard multi-model ensemble approach, the models are integrated in time independently and the predicted states are combined a posteriori. Recently an alternative ensemble prediction approach has been proposed in which the models exchange information during the simulation and synchronize on a common solution that is closer to the truth than any of the individual model solutions in the standard multi-model ensemble approach or a weighted average of these. This approach is called the super modeling approach (SUMO). The potential of the SUMO approach has been demonstrated in the context of simple, low-order, chaotic dynamical systems. The information exchange takes the form of linear nudging terms in the dynamical equations that nudge the solution of each model to the solution of all other models in the ensemble. With a suitable choice of the connection strengths the models synchronize on a common solution that is indeed closer to the true system than any of the individual model solutions without nudging. This approach is called connected SUMO. An alternative approach is to integrate a weighted averaged model, weighted SUMO. At each time step all models in the ensemble calculate the tendency, these tendencies are weighted averaged and the state is integrated one time step into the future with this weighted averaged tendency. It was shown that in case the connected SUMO synchronizes perfectly, the connected SUMO follows the weighted averaged trajectory and both approaches yield the same solution. In this study we pioneer both approaches in the context of a global, quasi-geostrophic, three-level atmosphere model that is capable of simulating quite realistically the extra-tropical circulation in the Northern Hemisphere winter.

Comparison of ensemble post-processing approaches, based on empirical and dynamical error modelisation of rainfall-runoff model forecasts

NASA Astrophysics Data System (ADS)

Chardon, J.; Mathevet, T.; Le Lay, M.; Gailhard, J.

2012-04-01

In the context of a national energy company (EDF : Electricité de France), hydro-meteorological forecasts are necessary to ensure safety and security of installations, meet environmental standards and improve water ressources management and decision making. Hydrological ensemble forecasts allow a better representation of meteorological and hydrological forecasts uncertainties and improve human expertise of hydrological forecasts, which is essential to synthesize available informations, coming from different meteorological and hydrological models and human experience. An operational hydrological ensemble forecasting chain has been developed at EDF since 2008 and is being used since 2010 on more than 30 watersheds in France. This ensemble forecasting chain is characterized ensemble pre-processing (rainfall and temperature) and post-processing (streamflow), where a large human expertise is solicited. The aim of this paper is to compare 2 hydrological ensemble post-processing methods developed at EDF in order improve ensemble forecasts reliability (similar to Monatanari &Brath, 2004; Schaefli et al., 2007). The aim of the post-processing methods is to dress hydrological ensemble forecasts with hydrological model uncertainties, based on perfect forecasts. The first method (called empirical approach) is based on a statistical modelisation of empirical error of perfect forecasts, by streamflow sub-samples of quantile class and lead-time. The second method (called dynamical approach) is based on streamflow sub-samples of quantile class and streamflow variation, and lead-time. On a set of 20 watersheds used for operational forecasts, results show that both approaches are necessary to ensure a good post-processing of hydrological ensemble, allowing a good improvement of reliability, skill and sharpness of ensemble forecasts. The comparison of the empirical and dynamical approaches shows the limits of the empirical approach which is not able to take into account hydrological dynamic and processes, i. e. sample heterogeneity. For a same streamflow range corresponds different processes such as rising limbs or recession, where uncertainties are different. The dynamical approach improves reliability, skills and sharpness of forecasts and globally reduces confidence intervals width. When compared in details, the dynamical approach allows a noticeable reduction of confidence intervals during recessions where uncertainty is relatively lower and a slight increase of confidence intervals during rising limbs or snowmelt where uncertainty is greater. The dynamic approach, validated by forecaster's experience that considered the empirical approach not discriminative enough, improved forecaster's confidence and communication of uncertainties. Montanari, A. and Brath, A., (2004). A stochastic approach for assessing the uncertainty of rainfall-runoff simulations. Water Resources Research, 40, W01106, doi:10.1029/2003WR002540. Schaefli, B., Balin Talamba, D. and Musy, A., (2007). Quantifying hydrological modeling errors through a mixture of normal distributions. Journal of Hydrology, 332, 303-315.
Distinct Fos-Expressing Neuronal Ensembles in the Ventromedial Prefrontal Cortex Mediate Food Reward and Extinction Memories

PubMed Central

Warren, Brandon L.; Mendoza, Michael P.; Cruz, Fabio C.; Leao, Rodrigo M.; Caprioli, Daniele; Rubio, F. Javier; Whitaker, Leslie R.; McPherson, Kylie B.; Bossert, Jennifer M.; Shaham, Yavin

2016-01-01

In operant learning, initial reward-associated memories are thought to be distinct from subsequent extinction-associated memories. Memories formed during operant learning are thought to be stored in “neuronal ensembles.” Thus, we hypothesize that different neuronal ensembles encode reward- and extinction-associated memories. Here, we examined prefrontal cortex neuronal ensembles involved in the recall of reward and extinction memories of food self-administration. We first trained rats to lever press for palatable food pellets for 7 d (1 h/d) and then exposed them to 0, 2, or 7 daily extinction sessions in which lever presses were not reinforced. Twenty-four hours after the last training or extinction session, we exposed the rats to either a short 15 min extinction test session or left them in their homecage (a control condition). We found maximal Fos (a neuronal activity marker) immunoreactivity in the ventral medial prefrontal cortex of rats that previously received 2 extinction sessions, suggesting that neuronal ensembles in this area encode extinction memories. We then used the Daun02 inactivation procedure to selectively disrupt ventral medial prefrontal cortex neuronal ensembles that were activated during the 15 min extinction session following 0 (no extinction) or 2 prior extinction sessions to determine the effects of inactivating the putative food reward and extinction ensembles, respectively, on subsequent nonreinforced food seeking 2 d later. Inactivation of the food reward ensembles decreased food seeking, whereas inactivation of the extinction ensembles increased food seeking. Our results indicate that distinct neuronal ensembles encoding operant reward and extinction memories intermingle within the same cortical area. SIGNIFICANCE STATEMENT A current popular hypothesis is that neuronal ensembles in different prefrontal cortex areas control reward-associated versus extinction-associated memories: the dorsal medial prefrontal cortex (mPFC) promotes reward seeking, whereas the ventral mPFC inhibits reward seeking. In this paper, we use the Daun02 chemogenetic inactivation procedure to demonstrate that Fos-expressing neuronal ensembles mediating both food reward and extinction memories intermingle within the same ventral mPFC area. PMID:27335401
Distinct Fos-Expressing Neuronal Ensembles in the Ventromedial Prefrontal Cortex Mediate Food Reward and Extinction Memories.

PubMed

Warren, Brandon L; Mendoza, Michael P; Cruz, Fabio C; Leao, Rodrigo M; Caprioli, Daniele; Rubio, F Javier; Whitaker, Leslie R; McPherson, Kylie B; Bossert, Jennifer M; Shaham, Yavin; Hope, Bruce T

2016-06-22

In operant learning, initial reward-associated memories are thought to be distinct from subsequent extinction-associated memories. Memories formed during operant learning are thought to be stored in "neuronal ensembles." Thus, we hypothesize that different neuronal ensembles encode reward- and extinction-associated memories. Here, we examined prefrontal cortex neuronal ensembles involved in the recall of reward and extinction memories of food self-administration. We first trained rats to lever press for palatable food pellets for 7 d (1 h/d) and then exposed them to 0, 2, or 7 daily extinction sessions in which lever presses were not reinforced. Twenty-four hours after the last training or extinction session, we exposed the rats to either a short 15 min extinction test session or left them in their homecage (a control condition). We found maximal Fos (a neuronal activity marker) immunoreactivity in the ventral medial prefrontal cortex of rats that previously received 2 extinction sessions, suggesting that neuronal ensembles in this area encode extinction memories. We then used the Daun02 inactivation procedure to selectively disrupt ventral medial prefrontal cortex neuronal ensembles that were activated during the 15 min extinction session following 0 (no extinction) or 2 prior extinction sessions to determine the effects of inactivating the putative food reward and extinction ensembles, respectively, on subsequent nonreinforced food seeking 2 d later. Inactivation of the food reward ensembles decreased food seeking, whereas inactivation of the extinction ensembles increased food seeking. Our results indicate that distinct neuronal ensembles encoding operant reward and extinction memories intermingle within the same cortical area. A current popular hypothesis is that neuronal ensembles in different prefrontal cortex areas control reward-associated versus extinction-associated memories: the dorsal medial prefrontal cortex (mPFC) promotes reward seeking, whereas the ventral mPFC inhibits reward seeking. In this paper, we use the Daun02 chemogenetic inactivation procedure to demonstrate that Fos-expressing neuronal ensembles mediating both food reward and extinction memories intermingle within the same ventral mPFC area. Copyright © 2016 the authors 0270-6474/16/366691-13$15.00/0.
New Software for Ensemble Creation in the Spitzer-Space-Telescope Operations Database

NASA Technical Reports Server (NTRS)

Laher, Russ; Rector, John

2004-01-01

Some of the computer pipelines used to process digital astronomical images from NASA's Spitzer Space Telescope require multiple input images, in order to generate high-level science and calibration products. The images are grouped into ensembles according to well documented ensemble-creation rules by making explicit associations in the operations Informix database at the Spitzer Science Center (SSC). The advantage of this approach is that a simple database query can retrieve the required ensemble of pipeline input images. New and improved software for ensemble creation has been developed. The new software is much faster than the existing software because it uses pre-compiled database stored-procedures written in Informix SPL (SQL programming language). The new software is also more flexible because the ensemble creation rules are now stored in and read from newly defined database tables. This table-driven approach was implemented so that ensemble rules can be inserted, updated, or deleted without modifying software.
Deep neural ensemble for retinal vessel segmentation in fundus images towards achieving label-free angiography.

PubMed

Lahiri, A; Roy, Abhijit Guha; Sheet, Debdoot; Biswas, Prabir Kumar

2016-08-01

Automated segmentation of retinal blood vessels in label-free fundus images entails a pivotal role in computed aided diagnosis of ophthalmic pathologies, viz., diabetic retinopathy, hypertensive disorders and cardiovascular diseases. The challenge remains active in medical image analysis research due to varied distribution of blood vessels, which manifest variations in their dimensions of physical appearance against a noisy background. In this paper we formulate the segmentation challenge as a classification task. Specifically, we employ unsupervised hierarchical feature learning using ensemble of two level of sparsely trained denoised stacked autoencoder. First level training with bootstrap samples ensures decoupling and second level ensemble formed by different network architectures ensures architectural revision. We show that ensemble training of auto-encoders fosters diversity in learning dictionary of visual kernels for vessel segmentation. SoftMax classifier is used for fine tuning each member autoencoder and multiple strategies are explored for 2-level fusion of ensemble members. On DRIVE dataset, we achieve maximum average accuracy of 95.33% with an impressively low standard deviation of 0.003 and Kappa agreement coefficient of 0.708. Comparison with other major algorithms substantiates the high efficacy of our model.
Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifiers.

PubMed

Yu, Hualong; Hong, Shufang; Yang, Xibei; Ni, Jun; Dan, Yuanyuan; Qin, Bin

2013-01-01

DNA microarray technology can measure the activities of tens of thousands of genes simultaneously, which provides an efficient way to diagnose cancer at the molecular level. Although this strategy has attracted significant research attention, most studies neglect an important problem, namely, that most DNA microarray datasets are skewed, which causes traditional learning algorithms to produce inaccurate results. Some studies have considered this problem, yet they merely focus on binary-class problem. In this paper, we dealt with multiclass imbalanced classification problem, as encountered in cancer DNA microarray, by using ensemble learning. We utilized one-against-all coding strategy to transform multiclass to multiple binary classes, each of them carrying out feature subspace, which is an evolving version of random subspace that generates multiple diverse training subsets. Next, we introduced one of two different correction technologies, namely, decision threshold adjustment or random undersampling, into each training subset to alleviate the damage of class imbalance. Specifically, support vector machine was used as base classifier, and a novel voting rule called counter voting was presented for making a final decision. Experimental results on eight skewed multiclass cancer microarray datasets indicate that unlike many traditional classification approaches, our methods are insensitive to class imbalance.
An Ensemble Deep Convolutional Neural Network Model with Improved D-S Evidence Fusion for Bearing Fault Diagnosis.

PubMed

Li, Shaobo; Liu, Guokai; Tang, Xianghong; Lu, Jianguang; Hu, Jianjun

2017-07-28

Intelligent machine health monitoring and fault diagnosis are becoming increasingly important for modern manufacturing industries. Current fault diagnosis approaches mostly depend on expert-designed features for building prediction models. In this paper, we proposed IDSCNN, a novel bearing fault diagnosis algorithm based on ensemble deep convolutional neural networks and an improved Dempster-Shafer theory based evidence fusion. The convolutional neural networks take the root mean square (RMS) maps from the FFT (Fast Fourier Transformation) features of the vibration signals from two sensors as inputs. The improved D-S evidence theory is implemented via distance matrix from evidences and modified Gini Index. Extensive evaluations of the IDSCNN on the Case Western Reserve Dataset showed that our IDSCNN algorithm can achieve better fault diagnosis performance than existing machine learning methods by fusing complementary or conflicting evidences from different models and sensors and adapting to different load conditions.
An Ensemble Deep Convolutional Neural Network Model with Improved D-S Evidence Fusion for Bearing Fault Diagnosis

PubMed Central

Li, Shaobo; Liu, Guokai; Tang, Xianghong; Lu, Jianguang

2017-01-01

Intelligent machine health monitoring and fault diagnosis are becoming increasingly important for modern manufacturing industries. Current fault diagnosis approaches mostly depend on expert-designed features for building prediction models. In this paper, we proposed IDSCNN, a novel bearing fault diagnosis algorithm based on ensemble deep convolutional neural networks and an improved Dempster–Shafer theory based evidence fusion. The convolutional neural networks take the root mean square (RMS) maps from the FFT (Fast Fourier Transformation) features of the vibration signals from two sensors as inputs. The improved D-S evidence theory is implemented via distance matrix from evidences and modified Gini Index. Extensive evaluations of the IDSCNN on the Case Western Reserve Dataset showed that our IDSCNN algorithm can achieve better fault diagnosis performance than existing machine learning methods by fusing complementary or conflicting evidences from different models and sensors and adapting to different load conditions. PMID:28788099
Managing uncertainty in metabolic network structure and improving predictions using EnsembleFBA

PubMed Central

2017-01-01

Genome-scale metabolic network reconstructions (GENREs) are repositories of knowledge about the metabolic processes that occur in an organism. GENREs have been used to discover and interpret metabolic functions, and to engineer novel network structures. A major barrier preventing more widespread use of GENREs, particularly to study non-model organisms, is the extensive time required to produce a high-quality GENRE. Many automated approaches have been developed which reduce this time requirement, but automatically-reconstructed draft GENREs still require curation before useful predictions can be made. We present a novel approach to the analysis of GENREs which improves the predictive capabilities of draft GENREs by representing many alternative network structures, all equally consistent with available data, and generating predictions from this ensemble. This ensemble approach is compatible with many reconstruction methods. We refer to this new approach as Ensemble Flux Balance Analysis (EnsembleFBA). We validate EnsembleFBA by predicting growth and gene essentiality in the model organism Pseudomonas aeruginosa UCBPP-PA14. We demonstrate how EnsembleFBA can be included in a systems biology workflow by predicting essential genes in six Streptococcus species and mapping the essential genes to small molecule ligands from DrugBank. We found that some metabolic subsystems contributed disproportionately to the set of predicted essential reactions in a way that was unique to each Streptococcus species, leading to species-specific outcomes from small molecule interactions. Through our analyses of P. aeruginosa and six Streptococci, we show that ensembles increase the quality of predictions without drastically increasing reconstruction time, thus making GENRE approaches more practical for applications which require predictions for many non-model organisms. All of our functions and accompanying example code are available in an open online repository. PMID:28263984
Managing uncertainty in metabolic network structure and improving predictions using EnsembleFBA.

PubMed

Biggs, Matthew B; Papin, Jason A

2017-03-01

Genome-scale metabolic network reconstructions (GENREs) are repositories of knowledge about the metabolic processes that occur in an organism. GENREs have been used to discover and interpret metabolic functions, and to engineer novel network structures. A major barrier preventing more widespread use of GENREs, particularly to study non-model organisms, is the extensive time required to produce a high-quality GENRE. Many automated approaches have been developed which reduce this time requirement, but automatically-reconstructed draft GENREs still require curation before useful predictions can be made. We present a novel approach to the analysis of GENREs which improves the predictive capabilities of draft GENREs by representing many alternative network structures, all equally consistent with available data, and generating predictions from this ensemble. This ensemble approach is compatible with many reconstruction methods. We refer to this new approach as Ensemble Flux Balance Analysis (EnsembleFBA). We validate EnsembleFBA by predicting growth and gene essentiality in the model organism Pseudomonas aeruginosa UCBPP-PA14. We demonstrate how EnsembleFBA can be included in a systems biology workflow by predicting essential genes in six Streptococcus species and mapping the essential genes to small molecule ligands from DrugBank. We found that some metabolic subsystems contributed disproportionately to the set of predicted essential reactions in a way that was unique to each Streptococcus species, leading to species-specific outcomes from small molecule interactions. Through our analyses of P. aeruginosa and six Streptococci, we show that ensembles increase the quality of predictions without drastically increasing reconstruction time, thus making GENRE approaches more practical for applications which require predictions for many non-model organisms. All of our functions and accompanying example code are available in an open online repository.
FLUXCOM - Overview and First Synthesis

NASA Astrophysics Data System (ADS)

Jung, M.; Ichii, K.; Tramontana, G.; Camps-Valls, G.; Schwalm, C. R.; Papale, D.; Reichstein, M.; Gans, F.; Weber, U.

2015-12-01

We present a community effort aiming at generating an ensemble of global gridded flux products by upscaling FLUXNET data using an array of different machine learning methods including regression/model tree ensembles, neural networks, and kernel machines. We produced products for gross primary production, terrestrial ecosystem respiration, net ecosystem exchange, latent heat, sensible heat, and net radiation for two experimental protocols: 1) at a high spatial and 8-daily temporal resolution (5 arc-minute) using only remote sensing based inputs for the MODIS era; 2) 30 year records of daily, 0.5 degree spatial resolution by incorporating meteorological driver data. Within each set-up, all machine learning methods were trained with the same input data for carbon and energy fluxes respectively. Sets of input driver variables were derived using an extensive formal variable selection exercise. The performance of the extrapolation capacities of the approaches is assessed with a fully internally consistent cross-validation. We perform cross-consistency checks of the gridded flux products with independent data streams from atmospheric inversions (NEE), sun-induced fluorescence (GPP), catchment water balances (LE, H), satellite products (Rn), and process-models. We analyze the uncertainties of the gridded flux products and for example provide a breakdown of the uncertainty of mean annual GPP originating from different machine learning methods, different climate input data sets, and different flux partitioning methods. The FLUXCOM archive will provide an unprecedented source of information for water, energy, and carbon cycle studies.
HLPI-Ensemble: Prediction of human lncRNA-protein interactions based on ensemble strategy.

PubMed

Hu, Huan; Zhang, Li; Ai, Haixin; Zhang, Hui; Fan, Yetian; Zhao, Qi; Liu, Hongsheng

2018-03-27

LncRNA plays an important role in many biological and disease progression by binding to related proteins. However, the experimental methods for studying lncRNA-protein interactions are time-consuming and expensive. Although there are a few models designed to predict the interactions of ncRNA-protein, they all have some common drawbacks that limit their predictive performance. In this study, we present a model called HLPI-Ensemble designed specifically for human lncRNA-protein interactions. HLPI-Ensemble adopts the ensemble strategy based on three mainstream machine learning algorithms of Support Vector Machines (SVM), Random Forests (RF) and Extreme Gradient Boosting (XGB) to generate HLPI-SVM Ensemble, HLPI-RF Ensemble and HLPI-XGB Ensemble, respectively. The results of 10-fold cross-validation show that HLPI-SVM Ensemble, HLPI-RF Ensemble and HLPI-XGB Ensemble achieved AUCs of 0.95, 0.96 and 0.96, respectively, in the test dataset. Furthermore, we compared the performance of the HLPI-Ensemble models with the previous models through external validation dataset. The results show that the false positives (FPs) of HLPI-Ensemble models are much lower than that of the previous models, and other evaluation indicators of HLPI-Ensemble models are also higher than those of the previous models. It is further showed that HLPI-Ensemble models are superior in predicting human lncRNA-protein interaction compared with previous models. The HLPI-Ensemble is publicly available at: http://ccsipb.lnu.edu.cn/hlpiensemble/ .
Estimating predictive hydrological uncertainty by dressing deterministic and ensemble forecasts; a comparison, with application to Meuse and Rhine

NASA Astrophysics Data System (ADS)

Verkade, J. S.; Brown, J. D.; Davids, F.; Reggiani, P.; Weerts, A. H.

2017-12-01

Two statistical post-processing approaches for estimation of predictive hydrological uncertainty are compared: (i) 'dressing' of a deterministic forecast by adding a single, combined estimate of both hydrological and meteorological uncertainty and (ii) 'dressing' of an ensemble streamflow forecast by adding an estimate of hydrological uncertainty to each individual streamflow ensemble member. Both approaches aim to produce an estimate of the 'total uncertainty' that captures both the meteorological and hydrological uncertainties. They differ in the degree to which they make use of statistical post-processing techniques. In the 'lumped' approach, both sources of uncertainty are lumped by post-processing deterministic forecasts using their verifying observations. In the 'source-specific' approach, the meteorological uncertainties are estimated by an ensemble of weather forecasts. These ensemble members are routed through a hydrological model and a realization of the probability distribution of hydrological uncertainties (only) is then added to each ensemble member to arrive at an estimate of the total uncertainty. The techniques are applied to one location in the Meuse basin and three locations in the Rhine basin. Resulting forecasts are assessed for their reliability and sharpness, as well as compared in terms of multiple verification scores including the relative mean error, Brier Skill Score, Mean Continuous Ranked Probability Skill Score, Relative Operating Characteristic Score and Relative Economic Value. The dressed deterministic forecasts are generally more reliable than the dressed ensemble forecasts, but the latter are sharper. On balance, however, they show similar quality across a range of verification metrics, with the dressed ensembles coming out slightly better. Some additional analyses are suggested. Notably, these include statistical post-processing of the meteorological forecasts in order to increase their reliability, thus increasing the reliability of the streamflow forecasts produced with ensemble meteorological forcings.
Ensemble Classifiers for Predicting HIV-1 Resistance from Three Rule-Based Genotypic Resistance Interpretation Systems.

PubMed

Raposo, Letícia M; Nobre, Flavio F

2017-08-30

Resistance to antiretrovirals (ARVs) is a major problem faced by HIV-infected individuals. Different rule-based algorithms were developed to infer HIV-1 susceptibility to antiretrovirals from genotypic data. However, there is discordance between them, resulting in difficulties for clinical decisions about which treatment to use. Here, we developed ensemble classifiers integrating three interpretation algorithms: Agence Nationale de Recherche sur le SIDA (ANRS), Rega, and the genotypic resistance interpretation system from Stanford HIV Drug Resistance Database (HIVdb). Three approaches were applied to develop a classifier with a single resistance profile: stacked generalization, a simple plurality vote scheme and the selection of the interpretation system with the best performance. The strategies were compared with the Friedman's test and the performance of the classifiers was evaluated using the F-measure, sensitivity and specificity values. We found that the three strategies had similar performances for the selected antiretrovirals. For some cases, the stacking technique with naïve Bayes as the learning algorithm showed a statistically superior F-measure. This study demonstrates that ensemble classifiers can be an alternative tool for clinical decision-making since they provide a single resistance profile from the most commonly used resistance interpretation systems.
Online cross-validation-based ensemble learning.

PubMed

Benkeser, David; Ju, Cheng; Lendle, Sam; van der Laan, Mark

2018-01-30

Online estimators update a current estimate with a new incoming batch of data without having to revisit past data thereby providing streaming estimates that are scalable to big data. We develop flexible, ensemble-based online estimators of an infinite-dimensional target parameter, such as a regression function, in the setting where data are generated sequentially by a common conditional data distribution given summary measures of the past. This setting encompasses a wide range of time-series models and, as special case, models for independent and identically distributed data. Our estimator considers a large library of candidate online estimators and uses online cross-validation to identify the algorithm with the best performance. We show that by basing estimates on the cross-validation-selected algorithm, we are asymptotically guaranteed to perform as well as the true, unknown best-performing algorithm. We provide extensions of this approach including online estimation of the optimal ensemble of candidate online estimators. We illustrate excellent performance of our methods using simulations and a real data example where we make streaming predictions of infectious disease incidence using data from a large database. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Generating highly accurate prediction hypotheses through collaborative ensemble learning

NASA Astrophysics Data System (ADS)

Arsov, Nino; Pavlovski, Martin; Basnarkov, Lasko; Kocarev, Ljupco

2017-03-01

Ensemble generation is a natural and convenient way of achieving better generalization performance of learning algorithms by gathering their predictive capabilities. Here, we nurture the idea of ensemble-based learning by combining bagging and boosting for the purpose of binary classification. Since the former improves stability through variance reduction, while the latter ameliorates overfitting, the outcome of a multi-model that combines both strives toward a comprehensive net-balancing of the bias-variance trade-off. To further improve this, we alter the bagged-boosting scheme by introducing collaboration between the multi-model’s constituent learners at various levels. This novel stability-guided classification scheme is delivered in two flavours: during or after the boosting process. Applied among a crowd of Gentle Boost ensembles, the ability of the two suggested algorithms to generalize is inspected by comparing them against Subbagging and Gentle Boost on various real-world datasets. In both cases, our models obtained a 40% generalization error decrease. But their true ability to capture details in data was revealed through their application for protein detection in texture analysis of gel electrophoresis images. They achieve improved performance of approximately 0.9773 AUROC when compared to the AUROC of 0.9574 obtained by an SVM based on recursive feature elimination.
Intentions and Perceptions: In Search of Alignment

ERIC Educational Resources Information Center

Sindberg, Laura K.

2009-01-01

Teachers plan for instruction in band, choir, and orchestra; this typically includes selecting repertoire and planning outcomes and strategies for achieving those goals with a vision toward excellent musical performance. Teachers in school music ensembles plan instruction that will lead to student learning. In the ensemble setting, this learning…
Preparing Your Students for Festivals: Reflections of a Solo-Ensemble Judge

ERIC Educational Resources Information Center

Paulk, Jason

2007-01-01

Helping students prepare for solo and ensemble festivals can be an arduous task for music educators. From choosing appropriate repertoire, to learning pitches, to securing accompanists, to ensuring that students understand appropriate dress and conduct, myriad details are involved in successful preparation for educational and enriching…
Discrete post-processing of total cloud cover ensemble forecasts

NASA Astrophysics Data System (ADS)

Hemri, Stephan; Haiden, Thomas; Pappenberger, Florian

2017-04-01

This contribution presents an approach to post-process ensemble forecasts for the discrete and bounded weather variable of total cloud cover. Two methods for discrete statistical post-processing of ensemble predictions are tested. The first approach is based on multinomial logistic regression, the second involves a proportional odds logistic regression model. Applying them to total cloud cover raw ensemble forecasts from the European Centre for Medium-Range Weather Forecasts improves forecast skill significantly. Based on station-wise post-processing of raw ensemble total cloud cover forecasts for a global set of 3330 stations over the period from 2007 to early 2014, the more parsimonious proportional odds logistic regression model proved to slightly outperform the multinomial logistic regression model. Reference Hemri, S., Haiden, T., & Pappenberger, F. (2016). Discrete post-processing of total cloud cover ensemble forecasts. Monthly Weather Review 144, 2565-2577.
Rapid learning of visual ensembles.

PubMed

Chetverikov, Andrey; Campana, Gianluca; Kristjánsson, Árni

2017-02-01

We recently demonstrated that observers are capable of encoding not only summary statistics, such as mean and variance of stimulus ensembles, but also the shape of the ensembles. Here, for the first time, we show the learning dynamics of this process, investigate the possible priors for the distribution shape, and demonstrate that observers are able to learn more complex distributions, such as bimodal ones. We used speeding and slowing of response times between trials (intertrial priming) in visual search for an oddly oriented line to assess internal models of distractor distributions. Experiment 1 demonstrates that two repetitions are sufficient for enabling learning of the shape of uniform distractor distributions. In Experiment 2, we compared Gaussian and uniform distractor distributions, finding that following only two repetitions Gaussian distributions are represented differently than uniform ones. Experiment 3 further showed that when distractor distributions are bimodal (with a 30° distance between two uniform intervals), observers initially treat them as uniform, and only with further repetitions do they begin to treat the distributions as bimodal. In sum, observers do not have strong initial priors for distribution shapes and quickly learn simple ones but have the ability to adjust their representations to more complex feature distributions as information accumulates with further repetitions of the same distractor distribution.

Simulating adsorptive expansion of zeolites: application to biomass-derived solutions in contact with silicalite.

PubMed

Santander, Julian E; Tsapatsis, Michael; Auerbach, Scott M

2013-04-16

We have constructed and applied an algorithm to simulate the behavior of zeolite frameworks during liquid adsorption. We applied this approach to compute the adsorption isotherms of furfural-water and hydroxymethyl furfural (HMF)-water mixtures adsorbing in silicalite zeolite at 300 K for comparison with experimental data. We modeled these adsorption processes under two different statistical mechanical ensembles: the grand canonical (V-Nz-μg-T or GC) ensemble keeping volume fixed, and the P-Nz-μg-T (osmotic) ensemble allowing volume to fluctuate. To optimize accuracy and efficiency, we compared pure Monte Carlo (MC) sampling to hybrid MC-molecular dynamics (MD) simulations. For the external furfural-water and HMF-water phases, we assumed the ideal solution approximation and employed a combination of tabulated data and extended ensemble simulations for computing solvation free energies. We found that MC sampling in the V-Nz-μg-T ensemble (i.e., standard GCMC) does a poor job of reproducing both the Henry's law regime and the saturation loadings of these systems. Hybrid MC-MD sampling of the V-Nz-μg-T ensemble, which includes framework vibrations at fixed total volume, provides better results in the Henry's law region, but this approach still does not reproduce experimental saturation loadings. Pure MC sampling of the osmotic ensemble was found to approach experimental saturation loadings more closely, whereas hybrid MC-MD sampling of the osmotic ensemble quantitatively reproduces such loadings because the MC-MD approach naturally allows for locally anisotropic volume changes wherein some pores expand whereas others contract.
Ensemble Semi-supervised Frame-work for Brain Magnetic Resonance Imaging Tissue Segmentation

PubMed Central

Azmi, Reza; Pishgoo, Boshra; Norozi, Narges; Yeganeh, Samira

2013-01-01

Brain magnetic resonance images (MRIs) tissue segmentation is one of the most important parts of the clinical diagnostic tools. Pixel classification methods have been frequently used in the image segmentation with two supervised and unsupervised approaches up to now. Supervised segmentation methods lead to high accuracy, but they need a large amount of labeled data, which is hard, expensive, and slow to obtain. Moreover, they cannot use unlabeled data to train classifiers. On the other hand, unsupervised segmentation methods have no prior knowledge and lead to low level of performance. However, semi-supervised learning which uses a few labeled data together with a large amount of unlabeled data causes higher accuracy with less trouble. In this paper, we propose an ensemble semi-supervised frame-work for segmenting of brain magnetic resonance imaging (MRI) tissues that it has been used results of several semi-supervised classifiers simultaneously. Selecting appropriate classifiers has a significant role in the performance of this frame-work. Hence, in this paper, we present two semi-supervised algorithms expectation filtering maximization and MCo_Training that are improved versions of semi-supervised methods expectation maximization and Co_Training and increase segmentation accuracy. Afterward, we use these improved classifiers together with graph-based semi-supervised classifier as components of the ensemble frame-work. Experimental results show that performance of segmentation in this approach is higher than both supervised methods and the individual semi-supervised classifiers. PMID:24098863
JuPOETs: a constrained multiobjective optimization approach to estimate biochemical model ensembles in the Julia programming language.

PubMed

Bassen, David M; Vilkhovoy, Michael; Minot, Mason; Butcher, Jonathan T; Varner, Jeffrey D

2017-01-25

Ensemble modeling is a promising approach for obtaining robust predictions and coarse grained population behavior in deterministic mathematical models. Ensemble approaches address model uncertainty by using parameter or model families instead of single best-fit parameters or fixed model structures. Parameter ensembles can be selected based upon simulation error, along with other criteria such as diversity or steady-state performance. Simulations using parameter ensembles can estimate confidence intervals on model variables, and robustly constrain model predictions, despite having many poorly constrained parameters. In this software note, we present a multiobjective based technique to estimate parameter or models ensembles, the Pareto Optimal Ensemble Technique in the Julia programming language (JuPOETs). JuPOETs integrates simulated annealing with Pareto optimality to estimate ensembles on or near the optimal tradeoff surface between competing training objectives. We demonstrate JuPOETs on a suite of multiobjective problems, including test functions with parameter bounds and system constraints as well as for the identification of a proof-of-concept biochemical model with four conflicting training objectives. JuPOETs identified optimal or near optimal solutions approximately six-fold faster than a corresponding implementation in Octave for the suite of test functions. For the proof-of-concept biochemical model, JuPOETs produced an ensemble of parameters that gave both the mean of the training data for conflicting data sets, while simultaneously estimating parameter sets that performed well on each of the individual objective functions. JuPOETs is a promising approach for the estimation of parameter and model ensembles using multiobjective optimization. JuPOETs can be adapted to solve many problem types, including mixed binary and continuous variable types, bilevel optimization problems and constrained problems without altering the base algorithm. JuPOETs is open source, available under an MIT license, and can be installed using the Julia package manager from the JuPOETs GitHub repository.
A comparison of ensemble post-processing approaches that preserve correlation structures

NASA Astrophysics Data System (ADS)

Schefzik, Roman; Van Schaeybroeck, Bert; Vannitsem, Stéphane

2016-04-01

Despite the fact that ensemble forecasts address the major sources of uncertainty, they exhibit biases and dispersion errors and therefore are known to improve by calibration or statistical post-processing. For instance the ensemble model output statistics (EMOS) method, also known as non-homogeneous regression approach (Gneiting et al., 2005) is known to strongly improve forecast skill. EMOS is based on fitting and adjusting a parametric probability density function (PDF). However, EMOS and other common post-processing approaches apply to a single weather quantity at a single location for a single look-ahead time. They are therefore unable of taking into account spatial, inter-variable and temporal dependence structures. Recently many research efforts have been invested in designing post-processing methods that resolve this drawback but also in verification methods that enable the detection of dependence structures. New verification methods are applied on two classes of post-processing methods, both generating physically coherent ensembles. A first class uses the ensemble copula coupling (ECC) that starts from EMOS but adjusts the rank structure (Schefzik et al., 2013). The second class is a member-by-member post-processing (MBM) approach that maps each raw ensemble member to a corrected one (Van Schaeybroeck and Vannitsem, 2015). We compare variants of the EMOS-ECC and MBM classes and highlight a specific theoretical connection between them. All post-processing variants are applied in the context of the ensemble system of the European Centre of Weather Forecasts (ECMWF) and compared using multivariate verification tools including the energy score, the variogram score (Scheuerer and Hamill, 2015) and the band depth rank histogram (Thorarinsdottir et al., 2015). Gneiting, Raftery, Westveld, and Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., {133}, 1098-1118. Scheuerer and Hamill, 2015. Variogram-based proper scoring rules for probabilistic forecasts of multivariate quantities. Mon. Wea. Rev. {143},1321-1334. Schefzik, Thorarinsdottir, Gneiting. Uncertainty quantification in complex simulation models using ensemble copula coupling. Statistical Science {28},616-640, 2013. Thorarinsdottir, M. Scheuerer, and C. Heinz, 2015. Assessing the calibration of high-dimensional ensemble forecasts using rank histograms, arXiv:1310.0236. Van Schaeybroeck and Vannitsem, 2015: Ensemble post-processing using member-by-member approaches: theoretical aspects. Q.J.R. Meteorol. Soc., 141: 807-818.
Generalized multiple kernel learning with data-dependent priors.

PubMed

Mao, Qi; Tsang, Ivor W; Gao, Shenghua; Wang, Li

2015-06-01

Multiple kernel learning (MKL) and classifier ensemble are two mainstream methods for solving learning problems in which some sets of features/views are more informative than others, or the features/views within a given set are inconsistent. In this paper, we first present a novel probabilistic interpretation of MKL such that maximum entropy discrimination with a noninformative prior over multiple views is equivalent to the formulation of MKL. Instead of using the noninformative prior, we introduce a novel data-dependent prior based on an ensemble of kernel predictors, which enhances the prediction performance of MKL by leveraging the merits of the classifier ensemble. With the proposed probabilistic framework of MKL, we propose a hierarchical Bayesian model to learn the proposed data-dependent prior and classification model simultaneously. The resultant problem is convex and other information (e.g., instances with either missing views or missing labels) can be seamlessly incorporated into the data-dependent priors. Furthermore, a variety of existing MKL models can be recovered under the proposed MKL framework and can be readily extended to incorporate these priors. Extensive experiments demonstrate the benefits of our proposed framework in supervised and semisupervised settings, as well as in tasks with partial correspondence among multiple views.
Stress affects the neural ensemble for integrating new information and prior knowledge.

PubMed

Vogel, Susanne; Kluen, Lisa Marieke; Fernández, Guillén; Schwabe, Lars

2018-06-01

Prior knowledge, represented as a schema, facilitates memory encoding. This schema-related learning is assumed to rely on the medial prefrontal cortex (mPFC) that rapidly integrates new information into the schema, whereas schema-incongruent or novel information is encoded by the hippocampus. Stress is a powerful modulator of prefrontal and hippocampal functioning and first studies suggest a stress-induced deficit of schema-related learning. However, the underlying neural mechanism is currently unknown. To investigate the neural basis of a stress-induced schema-related learning impairment, participants first acquired a schema. One day later, they underwent a stress induction or a control procedure before learning schema-related and novel information in the MRI scanner. In line with previous studies, learning schema-related compared to novel information activated the mPFC, angular gyrus, and precuneus. Stress, however, affected the neural ensemble activated during learning. Whereas the control group distinguished between sets of brain regions for related and novel information, stressed individuals engaged the hippocampus even when a relevant schema was present. Additionally, stressed participants displayed aberrant functional connectivity between brain regions involved in schema processing when encoding novel information. The failure to segregate functional connectivity patterns depending on the presence of prior knowledge was linked to impaired performance after stress. Our results show that stress affects the neural ensemble underlying the efficient use of schemas during learning. These findings may have relevant implications for clinical and educational settings. Copyright © 2018 Elsevier Inc. All rights reserved.
Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification.

PubMed

Jowkar, Gholam-Hossein; Mansoori, Eghbal G

2016-10-01

Identification of disease genes, using computational methods, is an important issue in biomedical and bioinformatics research. According to observations that diseases with the same or similar phenotype have the same biological characteristics, researchers have tried to identify genes by using machine learning tools. In recent attempts, some semi-supervised learning methods, called positive-unlabeled learning, is used for disease gene identification. In this paper, we present a Perceptron ensemble of graph-based positive-unlabeled learning (PEGPUL) on three types of biological attributes: gene ontologies, protein domains and protein-protein interaction networks. In our method, a reliable set of positive and negative genes are extracted using co-training schema. Then, the similarity graph of genes is built using metric learning by concentrating on multi-rank-walk method to perform inference from labeled genes. At last, a Perceptron ensemble is learned from three weighted classifiers: multilevel support vector machine, k-nearest neighbor and decision tree. The main contributions of this paper are: (i) incorporating the statistical properties of gene data through choosing proper metrics, (ii) statistical evaluation of biological features, and (iii) noise robustness characteristic of PEGPUL via using multilevel schema. In order to assess PEGPUL, we have applied it on 12950 disease genes with 949 positive genes from six class of diseases and 12001 unlabeled genes. Compared with some popular disease gene identification methods, the experimental results show that PEGPUL has reasonable performance. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bayesian ensemble refinement by replica simulations and reweighting.

PubMed

Hummer, Gerhard; Köfinger, Jürgen

2015-12-28

We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.
Bayesian ensemble refinement by replica simulations and reweighting

NASA Astrophysics Data System (ADS)

Hummer, Gerhard; Köfinger, Jürgen

2015-12-01

We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.
Genetic programming based ensemble system for microarray data classification.

PubMed

Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To

2015-01-01

Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved.
Genetic Programming Based Ensemble System for Microarray Data Classification

PubMed Central

Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To

2015-01-01

Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved. PMID:25810748
Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques

ERIC Educational Resources Information Center

Lykourentzou, Ioanna; Giannoukos, Ioannis; Nikolopoulos, Vassilis; Mpardis, George; Loumos, Vassili

2009-01-01

In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to…
Impacts of calibration strategies and ensemble methods on ensemble flood forecasting over Lanjiang basin, Southeast China

NASA Astrophysics Data System (ADS)

Liu, Li; Xu, Yue-Ping

2017-04-01

Ensemble flood forecasting driven by numerical weather prediction products is becoming more commonly used in operational flood forecasting applications.In this study, a hydrological ensemble flood forecasting system based on Variable Infiltration Capacity (VIC) model and quantitative precipitation forecasts from TIGGE dataset is constructed for Lanjiang Basin, Southeast China. The impacts of calibration strategies and ensemble methods on the performance of the system are then evaluated.The hydrological model is optimized by parallel programmed ɛ-NSGAII multi-objective algorithm and two respectively parameterized models are determined to simulate daily flows and peak flows coupled with a modular approach.The results indicatethat the ɛ-NSGAII algorithm permits more efficient optimization and rational determination on parameter setting.It is demonstrated that the multimodel ensemble streamflow mean have better skills than the best singlemodel ensemble mean (ECMWF) and the multimodel ensembles weighted on members and skill scores outperform other multimodel ensembles. For typical flood event, it is proved that the flood can be predicted 3-4 days in advance, but the flows in rising limb can be captured with only 1-2 days ahead due to the flash feature. With respect to peak flows selected by Peaks Over Threshold approach, the ensemble means from either singlemodel or multimodels are generally underestimated as the extreme values are smoothed out by ensemble process.
Benchmarking Deep Learning Models on Large Healthcare Datasets.

PubMed

Purushotham, Sanjay; Meng, Chuizheng; Che, Zhengping; Liu, Yan

2018-06-04

Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the 'raw' clinical time series data is used as input features to the models. Copyright © 2018 Elsevier Inc. All rights reserved.
MULTI-SOURCE FEATURE LEARNING FOR JOINT ANALYSIS OF INCOMPLETE MULTIPLE HETEROGENEOUS NEUROIMAGING DATA

PubMed Central

Yuan, Lei; Wang, Yalin; Thompson, Paul M.; Narayan, Vaibhav A.; Ye, Jieping

2012-01-01

Analysis of incomplete data is a big challenge when integrating large-scale brain imaging datasets from different imaging modalities. In the Alzheimer’s Disease Neuroimaging Initiative (ADNI), for example, over half of the subjects lack cerebrospinal fluid (CSF) measurements; an independent half of the subjects do not have fluorodeoxyglucose positron emission tomography (FDG-PET) scans; many lack proteomics measurements. Traditionally, subjects with missing measures are discarded, resulting in a severe loss of available information. In this paper, we address this problem by proposing an incomplete Multi-Source Feature (iMSF) learning method where all the samples (with at least one available data source) can be used. To illustrate the proposed approach, we classify patients from the ADNI study into groups with Alzheimer’s disease (AD), mild cognitive impairment (MCI) and normal controls, based on the multi-modality data. At baseline, ADNI’s 780 participants (172 AD, 397 MCI, 211 NC), have at least one of four data types: magnetic resonance imaging (MRI), FDG-PET, CSF and proteomics. These data are used to test our algorithm. Depending on the problem being solved, we divide our samples according to the availability of data sources, and we learn shared sets of features with state-of-the-art sparse learning methods. To build a practical and robust system, we construct a classifier ensemble by combining our method with four other methods for missing value estimation. Comprehensive experiments with various parameters show that our proposed iMSF method and the ensemble model yield stable and promising results. PMID:22498655
Ensemble Data Mining Methods

NASA Technical Reports Server (NTRS)

Oza, Nikunj C.

2004-01-01

Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: each member of the committee should be as competent as possible, but the members should be complementary to one another. If the members are not complementary, Le., if they always agree, then the committee is unnecessary---any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.
Elucidating Ligand-Modulated Conformational Landscape of GPCRs Using Cloud-Computing Approaches.

PubMed

Shukla, Diwakar; Lawrenz, Morgan; Pande, Vijay S

2015-01-01

G-protein-coupled receptors (GPCRs) are a versatile family of membrane-bound signaling proteins. Despite the recent successes in obtaining crystal structures of GPCRs, much needs to be learned about the conformational changes associated with their activation. Furthermore, the mechanism by which ligands modulate the activation of GPCRs has remained elusive. Molecular simulations provide a way of obtaining detailed an atomistic description of GPCR activation dynamics. However, simulating GPCR activation is challenging due to the long timescales involved and the associated challenge of gaining insights from the "Big" simulation datasets. Here, we demonstrate how cloud-computing approaches have been used to tackle these challenges and obtain insights into the activation mechanism of GPCRs. In particular, we review the use of Markov state model (MSM)-based sampling algorithms for sampling milliseconds of dynamics of a major drug target, the G-protein-coupled receptor β2-AR. MSMs of agonist and inverse agonist-bound β2-AR reveal multiple activation pathways and how ligands function via modulation of the ensemble of activation pathways. We target this ensemble of conformations with computer-aided drug design approaches, with the goal of designing drugs that interact more closely with diverse receptor states, for overall increased efficacy and specificity. We conclude by discussing how cloud-based approaches present a powerful and broadly available tool for studying the complex biological systems routinely. © 2015 Elsevier Inc. All rights reserved.
A Simple Approach to Account for Climate Model Interdependence in Multi-Model Ensembles

NASA Astrophysics Data System (ADS)

Herger, N.; Abramowitz, G.; Angelil, O. M.; Knutti, R.; Sanderson, B.

2016-12-01

Multi-model ensembles are an indispensable tool for future climate projection and its uncertainty quantification. Ensembles containing multiple climate models generally have increased skill, consistency and reliability. Due to the lack of agreed-on alternatives, most scientists use the equally-weighted multi-model mean as they subscribe to model democracy ("one model, one vote").Different research groups are known to share sections of code, parameterizations in their model, literature, or even whole model components. Therefore, individual model runs do not represent truly independent estimates. Ignoring this dependence structure might lead to a false model consensus, wrong estimation of uncertainty and effective number of independent models.Here, we present a way to partially address this problem by selecting a subset of CMIP5 model runs so that its climatological mean minimizes the RMSE compared to a given observation product. Due to the cancelling out of errors, regional biases in the ensemble mean are reduced significantly.Using a model-as-truth experiment we demonstrate that those regional biases persist into the future and we are not fitting noise, thus providing improved observationally-constrained projections of the 21st century. The optimally selected ensemble shows significantly higher global mean surface temperature projections than the original ensemble, where all the model runs are considered. Moreover, the spread is decreased well beyond that expected from the decreased ensemble size.Several previous studies have recommended an ensemble selection approach based on performance ranking of the model runs. Here, we show that this approach can perform even worse than randomly selecting ensemble members and can thus be harmful. We suggest that accounting for interdependence in the ensemble selection process is a necessary step for robust projections for use in impact assessments, adaptation and mitigation of climate change.
Designing Ensemble Based Security Framework for M-Learning System

ERIC Educational Resources Information Center

Mahalingam, Sheila; Abdollah, Mohd Faizal; bin Sahibuddin, Shahrin

2014-01-01

Mobile Learning has a potential to improve efficiency in the education sector and expand educational opportunities to underserved remote area in higher learning institutions. However there are multi challenges in different altitude faced when introducing and implementing m-learning. Despite the evolution of technology changes in education,…
Peer-Teaching in the Secondary Music Ensemble

ERIC Educational Resources Information Center

Johnson, Erik

2015-01-01

Peer-teaching is an instructional technique that has been used by teachers world-wide to successfully engage, exercise and deepen student learning. Yet, in some instances, teachers find the application of peer-teaching in large music ensembles at the secondary level to be daunting. This article is meant to be a practical resource for secondary…

Dimensionality Reduction Through Classifier Ensembles

NASA Technical Reports Server (NTRS)

Oza, Nikunj C.; Tumer, Kagan; Norwig, Peter (Technical Monitor)

1999-01-01

In data mining, one often needs to analyze datasets with a very large number of attributes. Performing machine learning directly on such data sets is often impractical because of extensive run times, excessive complexity of the fitted model (often leading to overfitting), and the well-known "curse of dimensionality." In practice, to avoid such problems, feature selection and/or extraction are often used to reduce data dimensionality prior to the learning step. However, existing feature selection/extraction algorithms either evaluate features by their effectiveness across the entire data set or simply disregard class information altogether (e.g., principal component analysis). Furthermore, feature extraction algorithms such as principal components analysis create new features that are often meaningless to human users. In this article, we present input decimation, a method that provides "feature subsets" that are selected for their ability to discriminate among the classes. These features are subsequently used in ensembles of classifiers, yielding results superior to single classifiers, ensembles that use the full set of features, and ensembles based on principal component analysis on both real and synthetic datasets.
Predicting drug-induced liver injury using ensemble learning methods and molecular fingerprints.

PubMed

Ai, Haixin; Chen, Wen; Zhang, Li; Huang, Liangchao; Yin, Zimo; Hu, Huan; Zhao, Qi; Zhao, Jian; Liu, Hongsheng

2018-05-21

Drug-induced liver injury (DILI) is a major safety concern in the drug-development process, and various methods have been proposed to predict the hepatotoxicity of compounds during the early stages of drug trials. In this study, we developed an ensemble model using three machine learning algorithms and 12 molecular fingerprints from a dataset containing 1,241 diverse compounds. The ensemble model achieved an average accuracy of 71.1±2.6%, sensitivity of 79.9±3.6%, specificity of 60.3±4.8%, and area under the receiver operating characteristic curve (AUC) of 0.764±0.026 in five-fold cross-validation and an accuracy of 84.3%, sensitivity of 86.9%, specificity of 75.4%, and AUC of 0.904 in an external validation dataset of 286 compounds collected from the Liver Toxicity Knowledge Base (LTKB). Compared with previous methods, the ensemble model achieved relatively high accuracy and sensitivity. We also identified several substructures related to DILI. In addition, we provide a web server offering access to our models (http://ccsipb.lnu.edu.cn/toxicity/HepatoPred-EL/).
The Development of Target-Specific Pose Filter Ensembles To Boost Ligand Enrichment for Structure-Based Virtual Screening.

PubMed

Xia, Jie; Hsieh, Jui-Hua; Hu, Huabin; Wu, Song; Wang, Xiang Simon

2017-06-26

Structure-based virtual screening (SBVS) has become an indispensable technique for hit identification at the early stage of drug discovery. However, the accuracy of current scoring functions is not high enough to confer success to every target and thus remains to be improved. Previously, we had developed binary pose filters (PFs) using knowledge derived from the protein-ligand interface of a single X-ray structure of a specific target. This novel approach had been validated as an effective way to improve ligand enrichment. Continuing from it, in the present work we attempted to incorporate knowledge collected from diverse protein-ligand interfaces of multiple crystal structures of the same target to build PF ensembles (PFEs). Toward this end, we first constructed a comprehensive data set to meet the requirements of ensemble modeling and validation. This set contains 10 diverse targets, 118 well-prepared X-ray structures of protein-ligand complexes, and large benchmarking actives/decoys sets. Notably, we designed a unique workflow of two-layer classifiers based on the concept of ensemble learning and applied it to the construction of PFEs for all of the targets. Through extensive benchmarking studies, we demonstrated that (1) coupling PFE with Chemgauss4 significantly improves the early enrichment of Chemgauss4 itself and (2) PFEs show greater consistency in boosting early enrichment and larger overall enrichment than our prior PFs. In addition, we analyzed the pairwise topological similarities among cognate ligands used to construct PFEs and found that it is the higher chemical diversity of the cognate ligands that leads to the improved performance of PFEs. Taken together, the results so far prove that the incorporation of knowledge from diverse protein-ligand interfaces by ensemble modeling is able to enhance the screening competence of SBVS scoring functions.
Closed-loop and robust control of quantum systems.

PubMed

Chen, Chunlin; Wang, Lin-Cheng; Wang, Yuanlong

2013-01-01

For most practical quantum control systems, it is important and difficult to attain robustness and reliability due to unavoidable uncertainties in the system dynamics or models. Three kinds of typical approaches (e.g., closed-loop learning control, feedback control, and robust control) have been proved to be effective to solve these problems. This work presents a self-contained survey on the closed-loop and robust control of quantum systems, as well as a brief introduction to a selection of basic theories and methods in this research area, to provide interested readers with a general idea for further studies. In the area of closed-loop learning control of quantum systems, we survey and introduce such learning control methods as gradient-based methods, genetic algorithms (GA), and reinforcement learning (RL) methods from a unified point of view of exploring the quantum control landscapes. For the feedback control approach, the paper surveys three control strategies including Lyapunov control, measurement-based control, and coherent-feedback control. Then such topics in the field of quantum robust control as H(∞) control, sliding mode control, quantum risk-sensitive control, and quantum ensemble control are reviewed. The paper concludes with a perspective of future research directions that are likely to attract more attention.
EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation.

PubMed

Zhou, Jiyun; Lu, Qin; Xu, Ruifeng; He, Yulan; Wang, Hongpeng

2017-08-29

Prediction of DNA-binding residue is important for understanding the protein-DNA recognition mechanism. Many computational methods have been proposed for the prediction, but most of them do not consider the relationships of evolutionary information between residues. In this paper, we first propose a novel residue encoding method, referred to as the Position Specific Score Matrix (PSSM) Relation Transformation (PSSM-RT), to encode residues by utilizing the relationships of evolutionary information between residues. PDNA-62 and PDNA-224 are used to evaluate PSSM-RT and two existing PSSM encoding methods by five-fold cross-validation. Performance evaluations indicate that PSSM-RT is more effective than previous methods. This validates the point that the relationship of evolutionary information between residues is indeed useful in DNA-binding residue prediction. An ensemble learning classifier (EL_PSSM-RT) is also proposed by combining ensemble learning model and PSSM-RT to better handle the imbalance between binding and non-binding residues in datasets. EL_PSSM-RT is evaluated by five-fold cross-validation using PDNA-62 and PDNA-224 as well as two independent datasets TS-72 and TS-61. Performance comparisons with existing predictors on the four datasets demonstrate that EL_PSSM-RT is the best-performing method among all the predicting methods with improvement between 0.02-0.07 for MCC, 4.18-21.47% for ST and 0.013-0.131 for AUC. Furthermore, we analyze the importance of the pair-relationships extracted by PSSM-RT and the results validates the usefulness of PSSM-RT for encoding DNA-binding residues. We propose a novel prediction method for the prediction of DNA-binding residue with the inclusion of relationship of evolutionary information and ensemble learning. Performance evaluation shows that the relationship of evolutionary information between residues is indeed useful in DNA-binding residue prediction and ensemble learning can be used to address the data imbalance issue between binding and non-binding residues. A web service of EL_PSSM-RT ( http://hlt.hitsz.edu.cn:8080/PSSM-RT_SVM/ ) is provided for free access to the biological research community.
A New Multivariate Approach in Generating Ensemble Meteorological Forcings for Hydrological Forecasting

NASA Astrophysics Data System (ADS)

Khajehei, Sepideh; Moradkhani, Hamid

2015-04-01

Producing reliable and accurate hydrologic ensemble forecasts are subject to various sources of uncertainty, including meteorological forcing, initial conditions, model structure, and model parameters. Producing reliable and skillful precipitation ensemble forecasts is one approach to reduce the total uncertainty in hydrological applications. Currently, National Weather Prediction (NWP) models are developing ensemble forecasts for various temporal ranges. It is proven that raw products from NWP models are biased in mean and spread. Given the above state, there is a need for methods that are able to generate reliable ensemble forecasts for hydrological applications. One of the common techniques is to apply statistical procedures in order to generate ensemble forecast from NWP-generated single-value forecasts. The procedure is based on the bivariate probability distribution between the observation and single-value precipitation forecast. However, one of the assumptions of the current method is fitting Gaussian distribution to the marginal distributions of observed and modeled climate variable. Here, we have described and evaluated a Bayesian approach based on Copula functions to develop an ensemble precipitation forecast from the conditional distribution of single-value precipitation forecasts. Copula functions are known as the multivariate joint distribution of univariate marginal distributions, which are presented as an alternative procedure in capturing the uncertainties related to meteorological forcing. Copulas are capable of modeling the joint distribution of two variables with any level of correlation and dependency. This study is conducted over a sub-basin in the Columbia River Basin in USA using the monthly precipitation forecasts from Climate Forecast System (CFS) with 0.5x0.5 Deg. spatial resolution to reproduce the observations. The verification is conducted on a different period and the superiority of the procedure is compared with Ensemble Pre-Processor approach currently used by National Weather Service River Forecast Centers in USA.
SSEL-ADE: A semi-supervised ensemble learning framework for extracting adverse drug events from social media.

PubMed

Liu, Jing; Zhao, Songzheng; Wang, Gang

2018-01-01

With the development of Web 2.0 technology, social media websites have become lucrative but under-explored data sources for extracting adverse drug events (ADEs), which is a serious health problem. Besides ADE, other semantic relation types (e.g., drug indication and beneficial effect) could hold between the drug and adverse event mentions, making ADE relation extraction - distinguishing ADE relationship from other relation types - necessary. However, conducting ADE relation extraction in social media environment is not a trivial task because of the expertise-dependent, time-consuming and costly annotation process, and the feature space's high-dimensionality attributed to intrinsic characteristics of social media data. This study aims to develop a framework for ADE relation extraction using patient-generated content in social media with better performance than that delivered by previous efforts. To achieve the objective, a general semi-supervised ensemble learning framework, SSEL-ADE, was developed. The framework exploited various lexical, semantic, and syntactic features, and integrated ensemble learning and semi-supervised learning. A series of experiments were conducted to verify the effectiveness of the proposed framework. Empirical results demonstrate the effectiveness of each component of SSEL-ADE and reveal that our proposed framework outperforms most of existing ADE relation extraction methods The SSEL-ADE can facilitate enhanced ADE relation extraction performance, thereby providing more reliable support for pharmacovigilance. Moreover, the proposed semi-supervised ensemble methods have the potential of being applied to effectively deal with other social media-based problems. Copyright © 2017 Elsevier B.V. All rights reserved.
Dynamic Dimensionality Selection for Bayesian Classifier Ensembles

DTIC Science & Technology

2015-03-19

learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but much more...classifier, Generative learning, Discriminative learning, Naïve Bayes, Feature selection, Logistic regression , higher order attribute independence 16...discriminative learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but
Learners' Ensemble Based Security Conceptual Model for M-Learning System in Malaysian Higher Learning Institution

ERIC Educational Resources Information Center

Mahalingam, Sheila; Abdollah, Faizal Mohd; Sahib, Shahrin

2014-01-01

M-Learning has a potential to improve efficiency in the education sector and has a tendency to grow advance and transform the learning environment in the future. Yet there are challenges in many areas faced when introducing and implementing m-learning. The learner centered attribute in mobile learning implies deployment in untrustworthy learning…
Can machine learning complement traditional medical device surveillance? A case study of dual-chamber implantable cardioverter-defibrillators.

PubMed

Ross, Joseph S; Bates, Jonathan; Parzynski, Craig S; Akar, Joseph G; Curtis, Jeptha P; Desai, Nihar R; Freeman, James V; Gamble, Ginger M; Kuntz, Richard; Li, Shu-Xia; Marinac-Dabic, Danica; Masoudi, Frederick A; Normand, Sharon-Lise T; Ranasinghe, Isuru; Shaw, Richard E; Krumholz, Harlan M

2017-01-01

Machine learning methods may complement traditional analytic methods for medical device surveillance. Using data from the National Cardiovascular Data Registry for implantable cardioverter-defibrillators (ICDs) linked to Medicare administrative claims for longitudinal follow-up, we applied three statistical approaches to safety-signal detection for commonly used dual-chamber ICDs that used two propensity score (PS) models: one specified by subject-matter experts (PS-SME), and the other one by machine learning-based selection (PS-ML). The first approach used PS-SME and cumulative incidence (time-to-event), the second approach used PS-SME and cumulative risk (Data Extraction and Longitudinal Trend Analysis [DELTA]), and the third approach used PS-ML and cumulative risk (embedded feature selection). Safety-signal surveillance was conducted for eleven dual-chamber ICD models implanted at least 2,000 times over 3 years. Between 2006 and 2010, there were 71,948 Medicare fee-for-service beneficiaries who received dual-chamber ICDs. Cumulative device-specific unadjusted 3-year event rates varied for three surveyed safety signals: death from any cause, 12.8%-20.9%; nonfatal ICD-related adverse events, 19.3%-26.3%; and death from any cause or nonfatal ICD-related adverse event, 27.1%-37.6%. Agreement among safety signals detected/not detected between the time-to-event and DELTA approaches was 90.9% (360 of 396, k =0.068), between the time-to-event and embedded feature-selection approaches was 91.7% (363 of 396, k =-0.028), and between the DELTA and embedded feature selection approaches was 88.1% (349 of 396, k =-0.042). Three statistical approaches, including one machine learning method, identified important safety signals, but without exact agreement. Ensemble methods may be needed to detect all safety signals for further evaluation during medical device surveillance.
Modality-Driven Classification and Visualization of Ensemble Variance

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bensema, Kevin; Gosink, Luke; Obermaier, Harald

Advances in computational power now enable domain scientists to address conceptual and parametric uncertainty by running simulations multiple times in order to sufficiently sample the uncertain input space. While this approach helps address conceptual and parametric uncertainties, the ensemble datasets produced by this technique present a special challenge to visualization researchers as the ensemble dataset records a distribution of possible values for each location in the domain. Contemporary visualization approaches that rely solely on summary statistics (e.g., mean and variance) cannot convey the detailed information encoded in ensemble distributions that are paramount to ensemble analysis; summary statistics provide no informationmore » about modality classification and modality persistence. To address this problem, we propose a novel technique that classifies high-variance locations based on the modality of the distribution of ensemble predictions. Additionally, we develop a set of confidence metrics to inform the end-user of the quality of fit between the distribution at a given location and its assigned class. We apply a similar method to time-varying ensembles to illustrate the relationship between peak variance and bimodal or multimodal behavior. These classification schemes enable a deeper understanding of the behavior of the ensemble members by distinguishing between distributions that can be described by a single tendency and distributions which reflect divergent trends in the ensemble.« less
Lessons from Climate Modeling on the Design and Use of Ensembles for Crop Modeling

NASA Technical Reports Server (NTRS)

Wallach, Daniel; Mearns, Linda O.; Ruane, Alexander C.; Roetter, Reimund P.; Asseng, Senthold

2016-01-01

Working with ensembles of crop models is a recent but important development in crop modeling which promises to lead to better uncertainty estimates for model projections and predictions, better predictions using the ensemble mean or median, and closer collaboration within the modeling community. There are numerous open questions about the best way to create and analyze such ensembles. Much can be learned from the field of climate modeling, given its much longer experience with ensembles. We draw on that experience to identify questions and make propositions that should help make ensemble modeling with crop models more rigorous and informative. The propositions include defining criteria for acceptance of models in a crop MME, exploring criteria for evaluating the degree of relatedness of models in a MME, studying the effect of number of models in the ensemble, development of a statistical model of model sampling, creation of a repository for MME results, studies of possible differential weighting of models in an ensemble, creation of single model ensembles based on sampling from the uncertainty distribution of parameter values or inputs specifically oriented toward uncertainty estimation, the creation of super ensembles that sample more than one source of uncertainty, the analysis of super ensemble results to obtain information on total uncertainty and the separate contributions of different sources of uncertainty and finally further investigation of the use of the multi-model mean or median as a predictor.
A Comparison of Perturbed Initial Conditions and Multiphysics Ensembles in a Severe Weather Episode in Spain

NASA Technical Reports Server (NTRS)

Tapiador, Francisco; Tao, Wei-Kuo; Angelis, Carlos F.; Martinez, Miguel A.; Cecilia Marcos; Antonio Rodriguez; Hou, Arthur; Jong Shi, Jain

2012-01-01

Ensembles of numerical model forecasts are of interest to operational early warning forecasters as the spread of the ensemble provides an indication of the uncertainty of the alerts, and the mean value is deemed to outperform the forecasts of the individual models. This paper explores two ensembles on a severe weather episode in Spain, aiming to ascertain the relative usefulness of each one. One ensemble uses sensible choices of physical parameterizations (precipitation microphysics, land surface physics, and cumulus physics) while the other follows a perturbed initial conditions approach. The results show that, depending on the parameterizations, large differences can be expected in terms of storm location, spatial structure of the precipitation field, and rain intensity. It is also found that the spread of the perturbed initial conditions ensemble is smaller than the dispersion due to physical parameterizations. This confirms that in severe weather situations operational forecasts should address moist physics deficiencies to realize the full benefits of the ensemble approach, in addition to optimizing initial conditions. The results also provide insights into differences in simulations arising from ensembles of weather models using several combinations of different physical parameterizations.
Distinct learning-induced changes in stimulus selectivity and interactions of GABAergic interneuron classes in visual cortex.

PubMed

Khan, Adil G; Poort, Jasper; Chadwick, Angus; Blot, Antonin; Sahani, Maneesh; Mrsic-Flogel, Thomas D; Hofer, Sonja B

2018-06-01

How learning enhances neural representations for behaviorally relevant stimuli via activity changes of cortical cell types remains unclear. We simultaneously imaged responses of pyramidal cells (PYR) along with parvalbumin (PV), somatostatin (SOM), and vasoactive intestinal peptide (VIP) inhibitory interneurons in primary visual cortex while mice learned to discriminate visual patterns. Learning increased selectivity for task-relevant stimuli of PYR, PV and SOM subsets but not VIP cells. Strikingly, PV neurons became as selective as PYR cells, and their functional interactions reorganized, leading to the emergence of stimulus-selective PYR-PV ensembles. Conversely, SOM activity became strongly decorrelated from the network, and PYR-SOM coupling before learning predicted selectivity increases in individual PYR cells. Thus, learning differentially shapes the activity and interactions of multiple cell classes: while SOM inhibition may gate selectivity changes, PV interneurons become recruited into stimulus-specific ensembles and provide more selective inhibition as the network becomes better at discriminating behaviorally relevant stimuli.
A comparison of breeding and ensemble transform vectors for global ensemble generation

NASA Astrophysics Data System (ADS)

Deng, Guo; Tian, Hua; Li, Xiaoli; Chen, Jing; Gong, Jiandong; Jiao, Meiyan

2012-02-01

To compare the initial perturbation techniques using breeding vectors and ensemble transform vectors, three ensemble prediction systems using both initial perturbation methods but with different ensemble member sizes based on the spectral model T213/L31 are constructed at the National Meteorological Center, China Meteorological Administration (NMC/CMA). A series of ensemble verification scores such as forecast skill of the ensemble mean, ensemble resolution, and ensemble reliability are introduced to identify the most important attributes of ensemble forecast systems. The results indicate that the ensemble transform technique is superior to the breeding vector method in light of the evaluation of anomaly correlation coefficient (ACC), which is a deterministic character of the ensemble mean, the root-mean-square error (RMSE) and spread, which are of probabilistic attributes, and the continuous ranked probability score (CRPS) and its decomposition. The advantage of the ensemble transform approach is attributed to its orthogonality among ensemble perturbations as well as its consistence with the data assimilation system. Therefore, this study may serve as a reference for configuration of the best ensemble prediction system to be used in operation.
Adaptive Fault Detection on Liquid Propulsion Systems with Virtual Sensors: Algorithms and Architectures

NASA Technical Reports Server (NTRS)

Matthews, Bryan L.; Srivastava, Ashok N.

2010-01-01

Prior to the launch of STS-119 NASA had completed a study of an issue in the flow control valve (FCV) in the Main Propulsion System of the Space Shuttle using an adaptive learning method known as Virtual Sensors. Virtual Sensors are a class of algorithms that estimate the value of a time series given other potentially nonlinearly correlated sensor readings. In the case presented here, the Virtual Sensors algorithm is based on an ensemble learning approach and takes sensor readings and control signals as input to estimate the pressure in a subsystem of the Main Propulsion System. Our results indicate that this method can detect faults in the FCV at the time when they occur. We use the standard deviation of the predictions of the ensemble as a measure of uncertainty in the estimate. This uncertainty estimate was crucial to understanding the nature and magnitude of transient characteristics during startup of the engine. This paper overviews the Virtual Sensors algorithm and discusses results on a comprehensive set of Shuttle missions and also discusses the architecture necessary for deploying such algorithms in a real-time, closed-loop system or a human-in-the-loop monitoring system. These results were presented at a Flight Readiness Review of the Space Shuttle in early 2009.
Graph Representations of Flow and Transport in Fracture Networks using Machine Learning

NASA Astrophysics Data System (ADS)

Srinivasan, G.; Viswanathan, H. S.; Karra, S.; O'Malley, D.; Godinez, H. C.; Hagberg, A.; Osthus, D.; Mohd-Yusof, J.

2017-12-01

Flow and transport of fluids through fractured systems is governed by the properties and interactions at the micro-scale. Retaining information about the micro-structure such as fracture length, orientation, aperture and connectivity in mesh-based computational models results in solving for millions to billions of degrees of freedom and quickly renders the problem computationally intractable. Our approach depicts fracture networks graphically, by mapping fractures to nodes and intersections to edges, thereby greatly reducing computational burden. Additionally, we use machine learning techniques to build simulators on the graph representation, trained on data from the mesh-based high fidelity simulations to speed up computation by orders of magnitude. We demonstrate our methodology on ensembles of discrete fracture networks, dividing up the data into training and validation sets. Our machine learned graph-based solvers result in over 3 orders of magnitude speedup without any significant sacrifice in accuracy.
Conducting Expressively: Navigating Seven Misconceptions That Inhibit Meaningful Connection to Ensemble and Sound

ERIC Educational Resources Information Center

Snyder, Courtney

2016-01-01

When expressivity (ignited by imagination) is incorporated into the learning process for both the conductor (teacher) and player (student), the qualities of movement, communication, instruction, and ensemble sound all change for the better, often with less work. Expressive conducting allows the conductor to feel more connected to the music and the…
Selected Influences on Solo and Small-Ensemble Festival Ratings: Replication and Extension

ERIC Educational Resources Information Center

Bergee, Martin J.; McWhirter, Jamila L.

2005-01-01

Festival performance is no trivial endeavor. At one midwestern state festival alone, 10,938 events received a rating over a 3-year period (2001-2003). Such an extensive level of participation justifies sustained study. To learn more about variables that may underlie success at solo and small ensemble evaluative festivals, Bergee and Platt (2003)…
NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.

PubMed

Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan

2014-01-01

One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.

Conformational B-cell epitopes prediction from sequences using cost-sensitive ensemble classifiers and spatial clustering.

PubMed

Zhang, Jian; Zhao, Xiaowei; Sun, Pingping; Gao, Bo; Ma, Zhiqiang

2014-01-01

B-cell epitopes are regions of the antigen surface which can be recognized by certain antibodies and elicit the immune response. Identification of epitopes for a given antigen chain finds vital applications in vaccine and drug research. Experimental prediction of B-cell epitopes is time-consuming and resource intensive, which may benefit from the computational approaches to identify B-cell epitopes. In this paper, a novel cost-sensitive ensemble algorithm is proposed for predicting the antigenic determinant residues and then a spatial clustering algorithm is adopted to identify the potential epitopes. Firstly, we explore various discriminative features from primary sequences. Secondly, cost-sensitive ensemble scheme is introduced to deal with imbalanced learning problem. Thirdly, we adopt spatial algorithm to tell which residues may potentially form the epitopes. Based on the strategies mentioned above, a new predictor, called CBEP (conformational B-cell epitopes prediction), is proposed in this study. CBEP achieves good prediction performance with the mean AUC scores (AUCs) of 0.721 and 0.703 on two benchmark datasets (bound and unbound) using the leave-one-out cross-validation (LOOCV). When compared with previous prediction tools, CBEP produces higher sensitivity and comparable specificity values. A web server named CBEP which implements the proposed method is available for academic use.
A framework of multitemplate ensemble for fingerprint verification

NASA Astrophysics Data System (ADS)

Yin, Yilong; Ning, Yanbin; Ren, Chunxiao; Liu, Li

2012-12-01

How to improve performance of an automatic fingerprint verification system (AFVS) is always a big challenge in biometric verification field. Recently, it becomes popular to improve the performance of AFVS using ensemble learning approach to fuse related information of fingerprints. In this article, we propose a novel framework of fingerprint verification which is based on the multitemplate ensemble method. This framework is consisted of three stages. In the first stage, enrollment stage, we adopt an effective template selection method to select those fingerprints which best represent a finger, and then, a polyhedron is created by the matching results of multiple template fingerprints and a virtual centroid of the polyhedron is given. In the second stage, verification stage, we measure the distance between the centroid of the polyhedron and a query image. In the final stage, a fusion rule is used to choose a proper distance from a distance set. The experimental results on the FVC2004 database prove the improvement on the effectiveness of the new framework in fingerprint verification. With a minutiae-based matching method, the average EER of four databases in FVC2004 drops from 10.85 to 0.88, and with a ridge-based matching method, the average EER of these four databases also decreases from 14.58 to 2.51.
Ensemble of trees approaches to risk adjustment for evaluating a hospital's performance.

PubMed

Liu, Yang; Traskin, Mikhail; Lorch, Scott A; George, Edward I; Small, Dylan

2015-03-01

A commonly used method for evaluating a hospital's performance on an outcome is to compare the hospital's observed outcome rate to the hospital's expected outcome rate given its patient (case) mix and service. The process of calculating the hospital's expected outcome rate given its patient mix and service is called risk adjustment (Iezzoni 1997). Risk adjustment is critical for accurately evaluating and comparing hospitals' performances since we would not want to unfairly penalize a hospital just because it treats sicker patients. The key to risk adjustment is accurately estimating the probability of an Outcome given patient characteristics. For cases with binary outcomes, the method that is commonly used in risk adjustment is logistic regression. In this paper, we consider ensemble of trees methods as alternatives for risk adjustment, including random forests and Bayesian additive regression trees (BART). Both random forests and BART are modern machine learning methods that have been shown recently to have excellent performance for prediction of outcomes in many settings. We apply these methods to carry out risk adjustment for the performance of neonatal intensive care units (NICU). We show that these ensemble of trees methods outperform logistic regression in predicting mortality among babies treated in NICU, and provide a superior method of risk adjustment compared to logistic regression.
Compression of deep convolutional neural network for computer-aided diagnosis of masses in digital breast tomosynthesis

NASA Astrophysics Data System (ADS)

Samala, Ravi K.; Chan, Heang-Ping; Hadjiiski, Lubomir; Helvie, Mark A.; Richter, Caleb; Cha, Kenny

2018-02-01

Deep-learning models are highly parameterized, causing difficulty in inference and transfer learning. We propose a layered pathway evolution method to compress a deep convolutional neural network (DCNN) for classification of masses in DBT while maintaining the classification accuracy. Two-stage transfer learning was used to adapt the ImageNet-trained DCNN to mammography and then to DBT. In the first-stage transfer learning, transfer learning from ImageNet trained DCNN was performed using mammography data. In the second-stage transfer learning, the mammography-trained DCNN was trained on the DBT data using feature extraction from fully connected layer, recursive feature elimination and random forest classification. The layered pathway evolution encapsulates the feature extraction to the classification stages to compress the DCNN. Genetic algorithm was used in an iterative approach with tournament selection driven by count-preserving crossover and mutation to identify the necessary nodes in each convolution layer while eliminating the redundant nodes. The DCNN was reduced by 99% in the number of parameters and 95% in mathematical operations in the convolutional layers. The lesion-based area under the receiver operating characteristic curve on an independent DBT test set from the original and the compressed network resulted in 0.88+/-0.05 and 0.90+/-0.04, respectively. The difference did not reach statistical significance. We demonstrated a DCNN compression approach without additional fine-tuning or loss of performance for classification of masses in DBT. The approach can be extended to other DCNNs and transfer learning tasks. An ensemble of these smaller and focused DCNNs has the potential to be used in multi-target transfer learning.
Avoiding the ensemble decorrelation problem using member-by-member post-processing

NASA Astrophysics Data System (ADS)

Van Schaeybroeck, Bert; Vannitsem, Stéphane

2014-05-01

Forecast calibration or post-processing has become a standard tool in atmospheric and climatological science due to the presence of systematic initial condition and model errors. For ensemble forecasts the most competitive methods derive from the assumption of a fixed ensemble distribution. However, when independently applying such 'statistical' methods at different locations, lead times or for multiple variables the correlation structure for individual ensemble members is destroyed. Instead of reastablishing the correlation structure as in Schefzik et al. (2013) we instead propose a calibration method that avoids such problem by correcting each ensemble member individually. Moreover, we analyse the fundamental mechanisms by which the probabilistic ensemble skill can be enhanced. In terms of continuous ranked probability score, our member-by-member approach amounts to skill gain that extends for lead times far beyond the error doubling time and which is as good as the one of the most competitive statistical approach, non-homogeneous Gaussian regression (Gneiting et al. 2005). Besides the conservation of correlation structure, additional benefits arise including the fact that higher-order ensemble moments like kurtosis and skewness are inherited from the uncorrected forecasts. Our detailed analysis is performed in the context of the Kuramoto-Sivashinsky equation and different simple models but the results extent succesfully to the ensemble forecast of the European Centre for Medium-Range Weather Forecasts (Van Schaeybroeck and Vannitsem, 2013, 2014) . References [1] Gneiting, T., Raftery, A. E., Westveld, A., Goldman, T., 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133, 1098-1118. [2] Schefzik, R., T.L. Thorarinsdottir, and T. Gneiting, 2013: Uncertainty Quantification in Complex Simulation Models Using Ensemble Copula Coupling. To appear in Statistical Science 28. [3] Van Schaeybroeck, B., and S. Vannitsem, 2013: Reliable probabilities through statistical post-processing of ensemble forecasts. Proceedings of the European Conference on Complex Systems 2012, Springer proceedings on complexity, XVI, p. 347-352. [4] Van Schaeybroeck, B., and S. Vannitsem, 2014: Ensemble post-processing using member-by-member approaches: theoretical aspects, under review.
A Hyper-Heuristic Ensemble Method for Static Job-Shop Scheduling.

PubMed

Hart, Emma; Sim, Kevin

2016-01-01

We describe a new hyper-heuristic method NELLI-GP for solving job-shop scheduling problems (JSSP) that evolves an ensemble of heuristics. The ensemble adopts a divide-and-conquer approach in which each heuristic solves a unique subset of the instance set considered. NELLI-GP extends an existing ensemble method called NELLI by introducing a novel heuristic generator that evolves heuristics composed of linear sequences of dispatching rules: each rule is represented using a tree structure and is itself evolved. Following a training period, the ensemble is shown to outperform both existing dispatching rules and a standard genetic programming algorithm on a large set of new test instances. In addition, it obtains superior results on a set of 210 benchmark problems from the literature when compared to two state-of-the-art hyper-heuristic approaches. Further analysis of the relationship between heuristics in the evolved ensemble and the instances each solves provides new insights into features that might describe similar instances.
Gridded Calibration of Ensemble Wind Vector Forecasts Using Ensemble Model Output Statistics

NASA Astrophysics Data System (ADS)

Lazarus, S. M.; Holman, B. P.; Splitt, M. E.

2017-12-01

A computationally efficient method is developed that performs gridded post processing of ensemble wind vector forecasts. An expansive set of idealized WRF model simulations are generated to provide physically consistent high resolution winds over a coastal domain characterized by an intricate land / water mask. Ensemble model output statistics (EMOS) is used to calibrate the ensemble wind vector forecasts at observation locations. The local EMOS predictive parameters (mean and variance) are then spread throughout the grid utilizing flow-dependent statistical relationships extracted from the downscaled WRF winds. Using data withdrawal and 28 east central Florida stations, the method is applied to one year of 24 h wind forecasts from the Global Ensemble Forecast System (GEFS). Compared to the raw GEFS, the approach improves both the deterministic and probabilistic forecast skill. Analysis of multivariate rank histograms indicate the post processed forecasts are calibrated. Two downscaling case studies are presented, a quiescent easterly flow event and a frontal passage. Strengths and weaknesses of the approach are presented and discussed.
Active relearning for robust supervised classification of pulmonary emphysema

NASA Astrophysics Data System (ADS)

Raghunath, Sushravya; Rajagopalan, Srinivasan; Karwoski, Ronald A.; Bartholmai, Brian J.; Robb, Richard A.

2012-03-01

Radiologists are adept at recognizing the appearance of lung parenchymal abnormalities in CT scans. However, the inconsistent differential diagnosis, due to subjective aggregation, mandates supervised classification. Towards optimizing Emphysema classification, we introduce a physician-in-the-loop feedback approach in order to minimize uncertainty in the selected training samples. Using multi-view inductive learning with the training samples, an ensemble of Support Vector Machine (SVM) models, each based on a specific pair-wise dissimilarity metric, was constructed in less than six seconds. In the active relearning phase, the ensemble-expert label conflicts were resolved by an expert. This just-in-time feedback with unoptimized SVMs yielded 15% increase in classification accuracy and 25% reduction in the number of support vectors. The generality of relearning was assessed in the optimized parameter space of six different classifiers across seven dissimilarity metrics. The resultant average accuracy improved to 21%. The co-operative feedback method proposed here could enhance both diagnostic and staging throughput efficiency in chest radiology practice.
Pattern Recognition of Momentary Mental Workload Based on Multi-Channel Electrophysiological Data and Ensemble Convolutional Neural Networks.

PubMed

Zhang, Jianhua; Li, Sunan; Wang, Rubin

2017-01-01

In this paper, we deal with the Mental Workload (MWL) classification problem based on the measured physiological data. First we discussed the optimal depth (i.e., the number of hidden layers) and parameter optimization algorithms for the Convolutional Neural Networks (CNN). The base CNNs designed were tested according to five classification performance indices, namely Accuracy, Precision, F-measure, G-mean, and required training time. Then we developed an Ensemble Convolutional Neural Network (ECNN) to enhance the accuracy and robustness of the individual CNN model. For the ECNN design, three model aggregation approaches (weighted averaging, majority voting and stacking) were examined and a resampling strategy was used to enhance the diversity of individual CNN models. The results of MWL classification performance comparison indicated that the proposed ECNN framework can effectively improve MWL classification performance and is featured by entirely automatic feature extraction and MWL classification, when compared with traditional machine learning methods.
Neural signatures of attention: insights from decoding population activity patterns.

PubMed

Sapountzis, Panagiotis; Gregoriou, Georgia G

2018-01-01

Understanding brain function and the computations that individual neurons and neuronal ensembles carry out during cognitive functions is one of the biggest challenges in neuroscientific research. To this end, invasive electrophysiological studies have provided important insights by recording the activity of single neurons in behaving animals. To average out noise, responses are typically averaged across repetitions and across neurons that are usually recorded on different days. However, the brain makes decisions on short time scales based on limited exposure to sensory stimulation by interpreting responses of populations of neurons on a moment to moment basis. Recent studies have employed machine-learning algorithms in attention and other cognitive tasks to decode the information content of distributed activity patterns across neuronal ensembles on a single trial basis. Here, we review results from studies that have used pattern-classification decoding approaches to explore the population representation of cognitive functions. These studies have offered significant insights into population coding mechanisms. Moreover, we discuss how such advances can aid the development of cognitive brain-computer interfaces.
Generic Learning-Based Ensemble Framework for Small Sample Size Face Recognition in Multi-Camera Networks.

PubMed

Zhang, Cuicui; Liang, Xuefeng; Matsuyama, Takashi

2014-12-08

Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the "small sample size" (SSS) problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1) how to define diverse base classifiers from the small data; (2) how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0-1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system.
Generic Learning-Based Ensemble Framework for Small Sample Size Face Recognition in Multi-Camera Networks

PubMed Central

Zhang, Cuicui; Liang, Xuefeng; Matsuyama, Takashi

2014-01-01

Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the “small sample size” (SSS) problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1) how to define diverse base classifiers from the small data; (2) how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0–1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system. PMID:25494350
Locally Weighted Ensemble Clustering.

PubMed

Huang, Dong; Wang, Chang-Dong; Lai, Jian-Huang

2018-05-01

Due to its ability to combine multiple base clusterings into a probably better and more robust clustering, the ensemble clustering technique has been attracting increasing attention in recent years. Despite the significant success, one limitation to most of the existing ensemble clustering methods is that they generally treat all base clusterings equally regardless of their reliability, which makes them vulnerable to low-quality base clusterings. Although some efforts have been made to (globally) evaluate and weight the base clusterings, yet these methods tend to view each base clustering as an individual and neglect the local diversity of clusters inside the same base clustering. It remains an open problem how to evaluate the reliability of clusters and exploit the local diversity in the ensemble to enhance the consensus performance, especially, in the case when there is no access to data features or specific assumptions on data distribution. To address this, in this paper, we propose a novel ensemble clustering approach based on ensemble-driven cluster uncertainty estimation and local weighting strategy. In particular, the uncertainty of each cluster is estimated by considering the cluster labels in the entire ensemble via an entropic criterion. A novel ensemble-driven cluster validity measure is introduced, and a locally weighted co-association matrix is presented to serve as a summary for the ensemble of diverse clusters. With the local diversity in ensembles exploited, two novel consensus functions are further proposed. Extensive experiments on a variety of real-world datasets demonstrate the superiority of the proposed approach over the state-of-the-art.
Learning About Climate and Atmospheric Models Through Machine Learning

NASA Astrophysics Data System (ADS)

Lucas, D. D.

2017-12-01

From the analysis of ensemble variability to improving simulation performance, machine learning algorithms can play a powerful role in understanding the behavior of atmospheric and climate models. To learn about model behavior, we create training and testing data sets through ensemble techniques that sample different model configurations and values of input parameters, and then use supervised machine learning to map the relationships between the inputs and outputs. Following this procedure, we have used support vector machines, random forests, gradient boosting and other methods to investigate a variety of atmospheric and climate model phenomena. We have used machine learning to predict simulation crashes, estimate the probability density function of climate sensitivity, optimize simulations of the Madden Julian oscillation, assess the impacts of weather and emissions uncertainty on atmospheric dispersion, and quantify the effects of model resolution changes on precipitation. This presentation highlights recent examples of our applications of machine learning to improve the understanding of climate and atmospheric models. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Addressing uncertainty in atomistic machine learning.

PubMed

Peterson, Andrew A; Christensen, Rune; Khorshidi, Alireza

2017-05-10

Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility of the predictions. In this perspective, we address the types of errors that might arise in atomistic machine learning, the unique aspects of atomistic simulations that make machine-learning challenging, and highlight how uncertainty analysis can be used to assess the validity of machine-learning predictions. We suggest this will allow researchers to more fully use machine learning for the routine acceleration of large, high-accuracy, or extended-time simulations. In our demonstrations, we use a bootstrap ensemble of neural network-based calculators, and show that the width of the ensemble can provide an estimate of the uncertainty when the width is comparable to that in the training data. Intriguingly, we also show that the uncertainty can be localized to specific atoms in the simulation, which may offer hints for the generation of training data to strategically improve the machine-learned representation.
Statistical Methods in Ai: Rare Event Learning Using Associative Rules and Higher-Order Statistics

NASA Astrophysics Data System (ADS)

Iyer, V.; Shetty, S.; Iyengar, S. S.

2015-07-01

Rare event learning has not been actively researched since lately due to the unavailability of algorithms which deal with big samples. The research addresses spatio-temporal streams from multi-resolution sensors to find actionable items from a perspective of real-time algorithms. This computing framework is independent of the number of input samples, application domain, labelled or label-less streams. A sampling overlap algorithm such as Brooks-Iyengar is used for dealing with noisy sensor streams. We extend the existing noise pre-processing algorithms using Data-Cleaning trees. Pre-processing using ensemble of trees using bagging and multi-target regression showed robustness to random noise and missing data. As spatio-temporal streams are highly statistically correlated, we prove that a temporal window based sampling from sensor data streams converges after n samples using Hoeffding bounds. Which can be used for fast prediction of new samples in real-time. The Data-cleaning tree model uses a nonparametric node splitting technique, which can be learned in an iterative way which scales linearly in memory consumption for any size input stream. The improved task based ensemble extraction is compared with non-linear computation models using various SVM kernels for speed and accuracy. We show using empirical datasets the explicit rule learning computation is linear in time and is only dependent on the number of leafs present in the tree ensemble. The use of unpruned trees (t) in our proposed ensemble always yields minimum number (m) of leafs keeping pre-processing computation to n × t log m compared to N2 for Gram Matrix. We also show that the task based feature induction yields higher Qualify of Data (QoD) in the feature space compared to kernel methods using Gram Matrix.
Application of an Ensemble Smoother to Precipitation Assimilation

NASA Technical Reports Server (NTRS)

Zhang, Sara; Zupanski, Dusanka; Hou, Arthur; Zupanski, Milija

2008-01-01

Assimilation of precipitation in a global modeling system poses a special challenge in that the observation operators for precipitation processes are highly nonlinear. In the variational approach, substantial development work and model simplifications are required to include precipitation-related physical processes in the tangent linear model and its adjoint. An ensemble based data assimilation algorithm "Maximum Likelihood Ensemble Smoother (MLES)" has been developed to explore the ensemble representation of the precipitation observation operator with nonlinear convection and large-scale moist physics. An ensemble assimilation system based on the NASA GEOS-5 GCM has been constructed to assimilate satellite precipitation data within the MLES framework. The configuration of the smoother takes the time dimension into account for the relationship between state variables and observable rainfall. The full nonlinear forward model ensembles are used to represent components involving the observation operator and its transpose. Several assimilation experiments using satellite precipitation observations have been carried out to investigate the effectiveness of the ensemble representation of the nonlinear observation operator and the data impact of assimilating rain retrievals from the TMI and SSM/I sensors. Preliminary results show that this ensemble assimilation approach is capable of extracting information from nonlinear observations to improve the analysis and forecast if ensemble size is adequate, and a suitable localization scheme is applied. In addition to a dynamically consistent precipitation analysis, the assimilation system produces a statistical estimate of the analysis uncertainty.
Brain-Wide Maps of "Fos" Expression during Fear Learning and Recall

ERIC Educational Resources Information Center

Cho, Jin-Hyung; Rendall, Sam D.; Gray, Jesse M.

2017-01-01

"Fos" induction during learning labels neuronal ensembles in the hippocampus that encode a specific physical environment, revealing a memory trace. In the cortex and other regions, the extent to which "Fos" induction during learning reveals specific sensory representations is unknown. Here we generate high-quality brain-wide…
Global Optimization Ensemble Model for Classification Methods

PubMed Central

Anwar, Hina; Qamar, Usman; Muzaffar Qureshi, Abdul Wahab

2014-01-01

Supervised learning is the process of data mining for deducing rules from training datasets. A broad array of supervised learning algorithms exists, every one of them with its own advantages and drawbacks. There are some basic issues that affect the accuracy of classifier while solving a supervised learning problem, like bias-variance tradeoff, dimensionality of input space, and noise in the input data space. All these problems affect the accuracy of classifier and are the reason that there is no global optimal method for classification. There is not any generalized improvement method that can increase the accuracy of any classifier while addressing all the problems stated above. This paper proposes a global optimization ensemble model for classification methods (GMC) that can improve the overall accuracy for supervised learning problems. The experimental results on various public datasets showed that the proposed model improved the accuracy of the classification models from 1% to 30% depending upon the algorithm complexity. PMID:24883382
Ensemble learning in fixed expansion layer networks for mitigating catastrophic forgetting.

PubMed

Coop, Robert; Mishtal, Aaron; Arel, Itamar

2013-10-01

Catastrophic forgetting is a well-studied attribute of most parameterized supervised learning systems. A variation of this phenomenon, in the context of feedforward neural networks, arises when nonstationary inputs lead to loss of previously learned mappings. The majority of the schemes proposed in the literature for mitigating catastrophic forgetting were not data driven and did not scale well. We introduce the fixed expansion layer (FEL) feedforward neural network, which embeds a sparsely encoding hidden layer to help mitigate forgetting of prior learned representations. In addition, we investigate a novel framework for training ensembles of FEL networks, based on exploiting an information-theoretic measure of diversity between FEL learners, to further control undesired plasticity. The proposed methodology is demonstrated on a basic classification task, clearly emphasizing its advantages over existing techniques. The architecture proposed can be enhanced to address a range of computational intelligence tasks, such as regression problems and system control.

A target recognition method for maritime surveillance radars based on hybrid ensemble selection

NASA Astrophysics Data System (ADS)

Fan, Xueman; Hu, Shengliang; He, Jingbo

2017-11-01

In order to improve the generalisation ability of the maritime surveillance radar, a novel ensemble selection technique, termed Optimisation and Dynamic Selection (ODS), is proposed. During the optimisation phase, the non-dominated sorting genetic algorithm II for multi-objective optimisation is used to find the Pareto front, i.e. a set of ensembles of classifiers representing different tradeoffs between the classification error and diversity. During the dynamic selection phase, the meta-learning method is used to predict whether a candidate ensemble is competent enough to classify a query instance based on three different aspects, namely, feature space, decision space and the extent of consensus. The classification performance and time complexity of ODS are compared against nine other ensemble methods using a self-built full polarimetric high resolution range profile data-set. The experimental results clearly show the effectiveness of ODS. In addition, the influence of the selection of diversity measures is studied concurrently.
Another Perspective: The iPad Is a REAL Musical Instrument

ERIC Educational Resources Information Center

Williams, David A.

2014-01-01

This article looks at the iPad's role as a musical instrument through the lens of a live performance ensemble that performs primarily on iPads. It also offers an overview of a pedagogical model used by this ensemble, which emphasizes musician autonomy in small groups, where music is learned primarily through aural means and concerts are…
Ensemble transcript interaction networks: a case study on Alzheimer's disease.

PubMed

Armañanzas, Rubén; Larrañaga, Pedro; Bielza, Concha

2012-10-01

Systems biology techniques are a topic of recent interest within the neurological field. Computational intelligence (CI) addresses this holistic perspective by means of consensus or ensemble techniques ultimately capable of uncovering new and relevant findings. In this paper, we propose the application of a CI approach based on ensemble Bayesian network classifiers and multivariate feature subset selection to induce probabilistic dependences that could match or unveil biological relationships. The research focuses on the analysis of high-throughput Alzheimer's disease (AD) transcript profiling. The analysis is conducted from two perspectives. First, we compare the expression profiles of hippocampus subregion entorhinal cortex (EC) samples of AD patients and controls. Second, we use the ensemble approach to study four types of samples: EC and dentate gyrus (DG) samples from both patients and controls. Results disclose transcript interaction networks with remarkable structures and genes not directly related to AD by previous studies. The ensemble is able to identify a variety of transcripts that play key roles in other neurological pathologies. Classical statistical assessment by means of non-parametric tests confirms the relevance of the majority of the transcripts. The ensemble approach pinpoints key metabolic mechanisms that could lead to new findings in the pathogenesis and development of AD. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
RNA sequencing from neural ensembles activated during fear conditioning in the mouse temporal association cortex

PubMed Central

Cho, Jin-Hyung; Huang, Ben S.; Gray, Jesse M.

2016-01-01

The stable formation of remote fear memories is thought to require neuronal gene induction in cortical ensembles that are activated during learning. However, the set of genes expressed specifically in these activated ensembles is not known; knowledge of such transcriptional profiles may offer insights into the molecular program underlying stable memory formation. Here we use RNA-Seq to identify genes whose expression is enriched in activated cortical ensembles labeled during associative fear learning. We first establish that mouse temporal association cortex (TeA) is required for remote recall of auditory fear memories. We then perform RNA-Seq in TeA neurons that are labeled by the activity reporter Arc-dVenus during learning. We identify 944 genes with enriched expression in Arc-dVenus+ neurons. These genes include markers of L2/3, L5b, and L6 excitatory neurons but not glial or inhibitory markers, confirming Arc-dVenus to be an excitatory neuron-specific but non-layer-specific activity reporter. Cross comparisons to other transcriptional profiles show that 125 of the enriched genes are also activity-regulated in vitro or induced by visual stimulus in the visual cortex, suggesting that they may be induced generally in the cortex in an experience-dependent fashion. Prominent among the enriched genes are those encoding potassium channels that down-regulate neuronal activity, suggesting the possibility that part of the molecular program induced by fear conditioning may initiate homeostatic plasticity. PMID:27557751
An ensemble predictive modeling framework for breast cancer classification.

PubMed

Nagarajan, Radhakrishnan; Upreti, Meenakshi

2017-12-01

Molecular changes often precede clinical presentation of diseases and can be useful surrogates with potential to assist in informed clinical decision making. Recent studies have demonstrated the usefulness of modeling approaches such as classification that can predict the clinical outcomes from molecular expression profiles. While useful, a majority of these approaches implicitly use all molecular markers as features in the classification process often resulting in sparse high-dimensional projection of the samples often comparable to that of the sample size. In this study, a variant of the recently proposed ensemble classification approach is used for predicting good and poor-prognosis breast cancer samples from their molecular expression profiles. In contrast to traditional single and ensemble classifiers, the proposed approach uses multiple base classifiers with varying feature sets obtained from two-dimensional projection of the samples in conjunction with a majority voting strategy for predicting the class labels. In contrast to our earlier implementation, base classifiers in the ensembles are chosen based on maximal sensitivity and minimal redundancy by choosing only those with low average cosine distance. The resulting ensemble sets are subsequently modeled as undirected graphs. Performance of four different classification algorithms is shown to be better within the proposed ensemble framework in contrast to using them as traditional single classifier systems. Significance of a subset of genes with high-degree centrality in the network abstractions across the poor-prognosis samples is also discussed. Copyright © 2017 Elsevier Inc. All rights reserved.
Addressing model uncertainty through stochastic parameter perturbations within the High Resolution Rapid Refresh (HRRR) ensemble

NASA Astrophysics Data System (ADS)

Wolff, J.; Jankov, I.; Beck, J.; Carson, L.; Frimel, J.; Harrold, M.; Jiang, H.

2016-12-01

It is well known that global and regional numerical weather prediction ensemble systems are under-dispersive, producing unreliable and overconfident ensemble forecasts. Typical approaches to alleviate this problem include the use of multiple dynamic cores, multiple physics suite configurations, or a combination of the two. While these approaches may produce desirable results, they have practical and theoretical deficiencies and are more difficult and costly to maintain. An active area of research that promotes a more unified and sustainable system for addressing the deficiencies in ensemble modeling is the use of stochastic physics to represent model-related uncertainty. Stochastic approaches include Stochastic Parameter Perturbations (SPP), Stochastic Kinetic Energy Backscatter (SKEB), Stochastic Perturbation of Physics Tendencies (SPPT), or some combination of all three. The focus of this study is to assess the model performance within a convection-permitting ensemble at 3-km grid spacing across the Contiguous United States (CONUS) when using stochastic approaches. For this purpose, the test utilized a single physics suite configuration based on the operational High-Resolution Rapid Refresh (HRRR) model, with ensemble members produced by employing stochastic methods. Parameter perturbations were employed in the Rapid Update Cycle (RUC) land surface model and Mellor-Yamada-Nakanishi-Niino (MYNN) planetary boundary layer scheme. Results will be presented in terms of bias, error, spread, skill, accuracy, reliability, and sharpness using the Model Evaluation Tools (MET) verification package. Due to the high level of complexity of running a frequently updating (hourly), high spatial resolution (3 km), large domain (CONUS) ensemble system, extensive high performance computing (HPC) resources were needed to meet this objective. Supercomputing resources were provided through the National Center for Atmospheric Research (NCAR) Strategic Capability (NSC) project support, allowing for a more extensive set of tests over multiple seasons, consequently leading to more robust results. Through the use of these stochastic innovations and powerful supercomputing at NCAR, further insights and advancements in ensemble forecasting at convection-permitting scales will be possible.
Reproducing multi-model ensemble average with Ensemble-averaged Reconstructed Forcings (ERF) in regional climate modeling

NASA Astrophysics Data System (ADS)

Erfanian, A.; Fomenko, L.; Wang, G.

2016-12-01

Multi-model ensemble (MME) average is considered the most reliable for simulating both present-day and future climates. It has been a primary reference for making conclusions in major coordinated studies i.e. IPCC Assessment Reports and CORDEX. The biases of individual models cancel out each other in MME average, enabling the ensemble mean to outperform individual members in simulating the mean climate. This enhancement however comes with tremendous computational cost, which is especially inhibiting for regional climate modeling as model uncertainties can originate from both RCMs and the driving GCMs. Here we propose the Ensemble-based Reconstructed Forcings (ERF) approach to regional climate modeling that achieves a similar level of bias reduction at a fraction of cost compared with the conventional MME approach. The new method constructs a single set of initial and boundary conditions (IBCs) by averaging the IBCs of multiple GCMs, and drives the RCM with this ensemble average of IBCs to conduct a single run. Using a regional climate model (RegCM4.3.4-CLM4.5), we tested the method over West Africa for multiple combination of (up to six) GCMs. Our results indicate that the performance of the ERF method is comparable to that of the MME average in simulating the mean climate. The bias reduction seen in ERF simulations is achieved by using more realistic IBCs in solving the system of equations underlying the RCM physics and dynamics. This endows the new method with a theoretical advantage in addition to reducing computational cost. The ERF output is an unaltered solution of the RCM as opposed to a climate state that might not be physically plausible due to the averaging of multiple solutions with the conventional MME approach. The ERF approach should be considered for use in major international efforts such as CORDEX. Key words: Multi-model ensemble, ensemble analysis, ERF, regional climate modeling
Applications of Bayesian Procrustes shape analysis to ensemble radar reflectivity nowcast verification

NASA Astrophysics Data System (ADS)

Fox, Neil I.; Micheas, Athanasios C.; Peng, Yuqiang

2016-07-01

This paper introduces the use of Bayesian full Procrustes shape analysis in object-oriented meteorological applications. In particular, the Procrustes methodology is used to generate mean forecast precipitation fields from a set of ensemble forecasts. This approach has advantages over other ensemble averaging techniques in that it can produce a forecast that retains the morphological features of the precipitation structures and present the range of forecast outcomes represented by the ensemble. The production of the ensemble mean avoids the problems of smoothing that result from simple pixel or cell averaging, while producing credible sets that retain information on ensemble spread. Also in this paper, the full Bayesian Procrustes scheme is used as an object verification tool for precipitation forecasts. This is an extension of a previously presented Procrustes shape analysis based verification approach into a full Bayesian format designed to handle the verification of precipitation forecasts that match objects from an ensemble of forecast fields to a single truth image. The methodology is tested on radar reflectivity nowcasts produced in the Warning Decision Support System - Integrated Information (WDSS-II) by varying parameters in the K-means cluster tracking scheme.
Ensemble Sampling vs. Time Sampling in Molecular Dynamics Simulations of Thermal Conductivity

DOE PAGES

Gordiz, Kiarash; Singh, David J.; Henry, Asegun

2015-01-29

In this report we compare time sampling and ensemble averaging as two different methods available for phase space sampling. For the comparison, we calculate thermal conductivities of solid argon and silicon structures, using equilibrium molecular dynamics. We introduce two different schemes for the ensemble averaging approach, and show that both can reduce the total simulation time as compared to time averaging. It is also found that velocity rescaling is an efficient mechanism for phase space exploration. Although our methodology is tested using classical molecular dynamics, the ensemble generation approaches may find their greatest utility in computationally expensive simulations such asmore » first principles molecular dynamics. For such simulations, where each time step is costly, time sampling can require long simulation times because each time step must be evaluated sequentially and therefore phase space averaging is achieved through sequential operations. On the other hand, with ensemble averaging, phase space sampling can be achieved through parallel operations, since each ensemble is independent. For this reason, particularly when using massively parallel architectures, ensemble sampling can result in much shorter simulation times and exhibits similar overall computational effort.« less
Insight and Evidence Motivating the Simplification of Dual-Analysis Hybrid Systems into Single-Analysis Hybrid Systems

NASA Technical Reports Server (NTRS)

Todling, Ricardo; Diniz, F. L. R.; Takacs, L. L.; Suarez, M. J.

2018-01-01

Many hybrid data assimilation systems currently used for NWP employ some form of dual-analysis system approach. Typically a hybrid variational analysis is responsible for creating initial conditions for high-resolution forecasts, and an ensemble analysis system is responsible for creating sample perturbations used to form the flow-dependent part of the background error covariance required in the hybrid analysis component. In many of these, the two analysis components employ different methodologies, e.g., variational and ensemble Kalman filter. In such cases, it is not uncommon to have observations treated rather differently between the two analyses components; recentering of the ensemble analysis around the hybrid analysis is used to compensated for such differences. Furthermore, in many cases, the hybrid variational high-resolution system implements some type of four-dimensional approach, whereas the underlying ensemble system relies on a three-dimensional approach, which again introduces discrepancies in the overall system. Connected to these is the expectation that one can reliably estimate observation impact on forecasts issued from hybrid analyses by using an ensemble approach based on the underlying ensemble strategy of dual-analysis systems. Just the realization that the ensemble analysis makes substantially different use of observations as compared to their hybrid counterpart should serve as enough evidence of the implausibility of such expectation. This presentation assembles numerous anecdotal evidence to illustrate the fact that hybrid dual-analysis systems must, at the very minimum, strive for consistent use of the observations in both analysis sub-components. Simpler than that, this work suggests that hybrid systems can reliably be constructed without the need to employ a dual-analysis approach. In practice, the idea of relying on a single analysis system is appealing from a cost-maintenance perspective. More generally, single-analysis systems avoid contradictions such as having to choose one sub-component to generate performance diagnostics to another, possibly not fully consistent, component.
Measuring social interaction in music ensembles

PubMed Central

D'Ausilio, Alessandro; Badino, Leonardo; Camurri, Antonio; Fadiga, Luciano

2016-01-01

Music ensembles are an ideal test-bed for quantitative analysis of social interaction. Music is an inherently social activity, and music ensembles offer a broad variety of scenarios which are particularly suitable for investigation. Small ensembles, such as string quartets, are deemed a significant example of self-managed teams, where all musicians contribute equally to a task. In bigger ensembles, such as orchestras, the relationship between a leader (the conductor) and a group of followers (the musicians) clearly emerges. This paper presents an overview of recent research on social interaction in music ensembles with a particular focus on (i) studies from cognitive neuroscience; and (ii) studies adopting a computational approach for carrying out automatic quantitative analysis of ensemble music performances. PMID:27069054
Measuring social interaction in music ensembles.

PubMed

Volpe, Gualtiero; D'Ausilio, Alessandro; Badino, Leonardo; Camurri, Antonio; Fadiga, Luciano

2016-05-05

Music ensembles are an ideal test-bed for quantitative analysis of social interaction. Music is an inherently social activity, and music ensembles offer a broad variety of scenarios which are particularly suitable for investigation. Small ensembles, such as string quartets, are deemed a significant example of self-managed teams, where all musicians contribute equally to a task. In bigger ensembles, such as orchestras, the relationship between a leader (the conductor) and a group of followers (the musicians) clearly emerges. This paper presents an overview of recent research on social interaction in music ensembles with a particular focus on (i) studies from cognitive neuroscience; and (ii) studies adopting a computational approach for carrying out automatic quantitative analysis of ensemble music performances. © 2016 The Author(s).
Method and apparatus for quantum information processing using entangled neutral-atom qubits

DOEpatents

Jau, Yuan Yu; Biedermann, Grant; Deutsch, Ivan

2018-04-03

A method for preparing an entangled quantum state of an atomic ensemble is provided. The method includes loading each atom of the atomic ensemble into a respective optical trap; placing each atom of the atomic ensemble into a same first atomic quantum state by impingement of pump radiation; approaching the atoms of the atomic ensemble to within a dipole-dipole interaction length of each other; Rydberg-dressing the atomic ensemble; during the Rydberg-dressing operation, exciting the atomic ensemble with a Raman pulse tuned to stimulate a ground-state hyperfine transition from the first atomic quantum state to a second atomic quantum state; and separating the atoms of the atomic ensemble by more than a dipole-dipole interaction length.
Multiclass Classification by Adaptive Network of Dendritic Neurons with Binary Synapses Using Structural Plasticity

PubMed Central

Hussain, Shaista; Basu, Arindam

2016-01-01

The development of power-efficient neuromorphic devices presents the challenge of designing spike pattern classification algorithms which can be implemented on low-precision hardware and can also achieve state-of-the-art performance. In our pursuit of meeting this challenge, we present a pattern classification model which uses a sparse connection matrix and exploits the mechanism of nonlinear dendritic processing to achieve high classification accuracy. A rate-based structural learning rule for multiclass classification is proposed which modifies a connectivity matrix of binary synaptic connections by choosing the best “k” out of “d” inputs to make connections on every dendritic branch (k < < d). Because learning only modifies connectivity, the model is well suited for implementation in neuromorphic systems using address-event representation (AER). We develop an ensemble method which combines several dendritic classifiers to achieve enhanced generalization over individual classifiers. We have two major findings: (1) Our results demonstrate that an ensemble created with classifiers comprising moderate number of dendrites performs better than both ensembles of perceptrons and of complex dendritic trees. (2) In order to determine the moderate number of dendrites required for a specific classification problem, a two-step solution is proposed. First, an adaptive approach is proposed which scales the relative size of the dendritic trees of neurons for each class. It works by progressively adding dendrites with fixed number of synapses to the network, thereby allocating synaptic resources as per the complexity of the given problem. As a second step, theoretical capacity calculations are used to convert each neuronal dendritic tree to its optimal topology where dendrites of each class are assigned different number of synapses. The performance of the model is evaluated on classification of handwritten digits from the benchmark MNIST dataset and compared with other spike classifiers. We show that our system can achieve classification accuracy within 1 − 2% of other reported spike-based classifiers while using much less synaptic resources (only 7%) compared to that used by other methods. Further, an ensemble classifier created with adaptively learned sizes can attain accuracy of 96.4% which is at par with the best reported performance of spike-based classifiers. Moreover, the proposed method achieves this by using about 20% of the synapses used by other spike algorithms. We also present results of applying our algorithm to classify the MNIST-DVS dataset collected from a real spike-based image sensor and show results comparable to the best reported ones (88.1% accuracy). For VLSI implementations, we show that the reduced synaptic memory can save upto 4X area compared to conventional crossbar topologies. Finally, we also present a biologically realistic spike-based version for calculating the correlations required by the structural learning rule and demonstrate the correspondence between the rate-based and spike-based methods of learning. PMID:27065782
Bioinformatics in proteomics: application, terminology, and pitfalls.

PubMed

Wiemer, Jan C; Prokudin, Alexander

2004-01-01

Bioinformatics applies data mining, i.e., modern computer-based statistics, to biomedical data. It leverages on machine learning approaches, such as artificial neural networks, decision trees and clustering algorithms, and is ideally suited for handling huge data amounts. In this article, we review the analysis of mass spectrometry data in proteomics, starting with common pre-processing steps and using single decision trees and decision tree ensembles for classification. Special emphasis is put on the pitfall of overfitting, i.e., of generating too complex single decision trees. Finally, we discuss the pros and cons of the two different decision tree usages.
The Canadian seasonal forecast and the APCC exchange.

NASA Astrophysics Data System (ADS)

Archambault, B.; Fontecilla, J.; Kharin, V.; Bourgouin, P.; Ashok, K.; Lee, D.

2009-05-01

In this talk, we will first describe the Canadian seasonal forecast system. This system uses a 4 model ensemble approach with each of these models generating a 10 members ensemble. Multi-model issues related to this system will be describes. Secondly, we will describe an international multi-system initiative. The Asia-Pacific Economic Cooperation (APEC) is a forum for 21 Pacific Rim countries or regions including Canada. The APEC Climate Center (APCC) provides seasonal forecasts to their regional climate centers with a Multi Model Ensemble (MME) approach. The APCC MME is based on 13 ensemble prediction systems from different institutions including MSC(Canada), NCEP(USA), COLA(USA), KMA(Korea), JMA(Japan), BOM(Australia) and others. In this presentation, we will describe the basics of this international cooperation.
Closed-Loop and Robust Control of Quantum Systems

PubMed Central

Wang, Lin-Cheng

2013-01-01

For most practical quantum control systems, it is important and difficult to attain robustness and reliability due to unavoidable uncertainties in the system dynamics or models. Three kinds of typical approaches (e.g., closed-loop learning control, feedback control, and robust control) have been proved to be effective to solve these problems. This work presents a self-contained survey on the closed-loop and robust control of quantum systems, as well as a brief introduction to a selection of basic theories and methods in this research area, to provide interested readers with a general idea for further studies. In the area of closed-loop learning control of quantum systems, we survey and introduce such learning control methods as gradient-based methods, genetic algorithms (GA), and reinforcement learning (RL) methods from a unified point of view of exploring the quantum control landscapes. For the feedback control approach, the paper surveys three control strategies including Lyapunov control, measurement-based control, and coherent-feedback control. Then such topics in the field of quantum robust control as H ∞ control, sliding mode control, quantum risk-sensitive control, and quantum ensemble control are reviewed. The paper concludes with a perspective of future research directions that are likely to attract more attention. PMID:23997680
Verification of forecast ensembles in complex terrain including observation uncertainty

NASA Astrophysics Data System (ADS)

Dorninger, Manfred; Kloiber, Simon

2017-04-01

Traditionally, verification means to verify a forecast (ensemble) with the truth represented by observations. The observation errors are quite often neglected arguing that they are small when compared to the forecast error. In this study as part of the MesoVICT (Mesoscale Verification Inter-comparison over Complex Terrain) project it will be shown, that observation errors have to be taken into account for verification purposes. The observation uncertainty is estimated from the VERA (Vienna Enhanced Resolution Analysis) and represented via two analysis ensembles which are compared to the forecast ensemble. For the whole study results from COSMO-LEPS provided by Arpae-SIMC Emilia-Romagna are used as forecast ensemble. The time period covers the MesoVICT core case from 20-22 June 2007. In a first step, all ensembles are investigated concerning their distribution. Several tests have been executed (Kolmogorov-Smirnov-Test, Finkelstein-Schafer Test, Chi-Square Test etc.) showing no exact mathematical distribution. So the main focus is on non-parametric statistics (e.g. Kernel density estimation, Boxplots etc.) and also the deviation between "forced" normal distributed data and the kernel density estimations. In a next step the observational deviations due to the analysis ensembles are analysed. In a first approach scores are multiple times calculated with every single ensemble member from the analysis ensemble regarded as "true" observation. The results are presented as boxplots for the different scores and parameters. Additionally, the bootstrapping method is also applied to the ensembles. These possible approaches to incorporating observational uncertainty into the computation of statistics will be discussed in the talk.
An ensemble of dynamic neural network identifiers for fault detection and isolation of gas turbine engines.

PubMed

Amozegar, M; Khorasani, K

2016-04-01

In this paper, a new approach for Fault Detection and Isolation (FDI) of gas turbine engines is proposed by developing an ensemble of dynamic neural network identifiers. For health monitoring of the gas turbine engine, its dynamics is first identified by constructing three separate or individual dynamic neural network architectures. Specifically, a dynamic multi-layer perceptron (MLP), a dynamic radial-basis function (RBF) neural network, and a dynamic support vector machine (SVM) are trained to individually identify and represent the gas turbine engine dynamics. Next, three ensemble-based techniques are developed to represent the gas turbine engine dynamics, namely, two heterogeneous ensemble models and one homogeneous ensemble model. It is first shown that all ensemble approaches do significantly improve the overall performance and accuracy of the developed system identification scheme when compared to each of the stand-alone solutions. The best selected stand-alone model (i.e., the dynamic RBF network) and the best selected ensemble architecture (i.e., the heterogeneous ensemble) in terms of their performances in achieving an accurate system identification are then selected for solving the FDI task. The required residual signals are generated by using both a single model-based solution and an ensemble-based solution under various gas turbine engine health conditions. Our extensive simulation studies demonstrate that the fault detection and isolation task achieved by using the residuals that are obtained from the dynamic ensemble scheme results in a significantly more accurate and reliable performance as illustrated through detailed quantitative confusion matrix analysis and comparative studies. Copyright © 2016 Elsevier Ltd. All rights reserved.
Individual differences in ensemble perception reveal multiple, independent levels of ensemble representation.

PubMed

Haberman, Jason; Brady, Timothy F; Alvarez, George A

2015-04-01

Ensemble perception, including the ability to "see the average" from a group of items, operates in numerous feature domains (size, orientation, speed, facial expression, etc.). Although the ubiquity of ensemble representations is well established, the large-scale cognitive architecture of this process remains poorly defined. We address this using an individual differences approach. In a series of experiments, observers saw groups of objects and reported either a single item from the group or the average of the entire group. High-level ensemble representations (e.g., average facial expression) showed complete independence from low-level ensemble representations (e.g., average orientation). In contrast, low-level ensemble representations (e.g., orientation and color) were correlated with each other, but not with high-level ensemble representations (e.g., facial expression and person identity). These results suggest that there is not a single domain-general ensemble mechanism, and that the relationship among various ensemble representations depends on how proximal they are in representational space. (c) 2015 APA, all rights reserved).

"An Elusive Bird": Perceptions of Music Learning among Canadian and American Adults

ERIC Educational Resources Information Center

Kruse, Nathan B.

2009-01-01

Discovering the perceptions that non-professional adult musicians hold regarding their participation in community ensembles may help improve instruction as well as the ability to more fully understand the implications of lifelong music making. Andragogy, or the teaching and learning strategies associated with adult learning, provided the impetus…
A Proposed Methodology to Classify Frontier Capital Markets

DTIC Science & Technology

2011-07-31

but because it is the surest route to our common good.” -Inaugural Speech by President Barack Obama, Jan 2009 This project involves basic...machine learning. The algorithm consists of a unique binary classifier mechanism that combines three methods: k-Nearest Neighbors ( kNN ), ensemble...Through kNN Ensemble Classification Techniques E. Capital Market Classification Based on Capital Flows and Trading Architecture F. Horizontal
Simulation of weak polyelectrolytes: a comparison between the constant pH and the reaction ensemble method

NASA Astrophysics Data System (ADS)

Landsgesell, Jonas; Holm, Christian; Smiatek, Jens

2017-03-01

The reaction ensemble and the constant pH method are well-known chemical equilibrium approaches to simulate protonation and deprotonation reactions in classical molecular dynamics and Monte Carlo simulations. In this article, we demonstrate the similarity between both methods under certain conditions. We perform molecular dynamics simulations of a weak polyelectrolyte in order to compare the titration curves obtained by both approaches. Our findings reveal a good agreement between the methods when the reaction ensemble is used to sweep the reaction constant. Pronounced differences between the reaction ensemble and the constant pH method can be observed for stronger acids and bases in terms of adaptive pH values. These deviations are due to the presence of explicit protons in the reaction ensemble method which induce a screening of electrostatic interactions between the charged titrable groups of the polyelectrolyte. The outcomes of our simulation hint to a better applicability of the reaction ensemble method for systems in confined geometries and titrable groups in polyelectrolytes with different pKa values.
Automated generation and ensemble-learned matching of X-ray absorption spectra

NASA Astrophysics Data System (ADS)

Zheng, Chen; Mathew, Kiran; Chen, Chi; Chen, Yiming; Tang, Hanmei; Dozier, Alan; Kas, Joshua J.; Vila, Fernando D.; Rehr, John J.; Piper, Louis F. J.; Persson, Kristin A.; Ong, Shyue Ping

2018-12-01

X-ray absorption spectroscopy (XAS) is a widely used materials characterization technique to determine oxidation states, coordination environment, and other local atomic structure information. Analysis of XAS relies on comparison of measured spectra to reliable reference spectra. However, existing databases of XAS spectra are highly limited both in terms of the number of reference spectra available as well as the breadth of chemistry coverage. In this work, we report the development of XASdb, a large database of computed reference XAS, and an Ensemble-Learned Spectra IdEntification (ELSIE) algorithm for the matching of spectra. XASdb currently hosts more than 800,000 K-edge X-ray absorption near-edge spectra (XANES) for over 40,000 materials from the open-science Materials Project database. We discuss a high-throughput automation framework for FEFF calculations, built on robust, rigorously benchmarked parameters. FEFF is a computer program uses a real-space Green's function approach to calculate X-ray absorption spectra. We will demonstrate that the ELSIE algorithm, which combines 33 weak "learners" comprising a set of preprocessing steps and a similarity metric, can achieve up to 84.2% accuracy in identifying the correct oxidation state and coordination environment of a test set of 19 K-edge XANES spectra encompassing a diverse range of chemistries and crystal structures. The XASdb with the ELSIE algorithm has been integrated into a web application in the Materials Project, providing an important new public resource for the analysis of XAS to all materials researchers. Finally, the ELSIE algorithm itself has been made available as part of veidt, an open source machine-learning library for materials science.
NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms

PubMed Central

Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan

2014-01-01

One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available. PMID:24667482
A renaissance of neural networks in drug discovery.

PubMed

Baskin, Igor I; Winkler, David; Tetko, Igor V

2016-08-01

Neural networks are becoming a very popular method for solving machine learning and artificial intelligence problems. The variety of neural network types and their application to drug discovery requires expert knowledge to choose the most appropriate approach. In this review, the authors discuss traditional and newly emerging neural network approaches to drug discovery. Their focus is on backpropagation neural networks and their variants, self-organizing maps and associated methods, and a relatively new technique, deep learning. The most important technical issues are discussed including overfitting and its prevention through regularization, ensemble and multitask modeling, model interpretation, and estimation of applicability domain. Different aspects of using neural networks in drug discovery are considered: building structure-activity models with respect to various targets; predicting drug selectivity, toxicity profiles, ADMET and physicochemical properties; characteristics of drug-delivery systems and virtual screening. Neural networks continue to grow in importance for drug discovery. Recent developments in deep learning suggests further improvements may be gained in the analysis of large chemical data sets. It's anticipated that neural networks will be more widely used in drug discovery in the future, and applied in non-traditional areas such as drug delivery systems, biologically compatible materials, and regenerative medicine.
Parametric dictionary learning for modeling EAP and ODF in diffusion MRI.

PubMed

Merlet, Sylvain; Caruyer, Emmanuel; Deriche, Rachid

2012-01-01

In this work, we propose an original and efficient approach to exploit the ability of Compressed Sensing (CS) to recover diffusion MRI (dMRI) signals from a limited number of samples while efficiently recovering important diffusion features such as the ensemble average propagator (EAP) and the orientation distribution function (ODF). Some attempts to sparsely represent the diffusion signal have already been performed. However and contrarly to what has been presented in CS dMRI, in this work we propose and advocate the use of a well adapted learned dictionary and show that it leads to a sparser signal estimation as well as to an efficient reconstruction of very important diffusion features. We first propose to learn and design a sparse and parametric dictionary from a set of training diffusion data. Then, we propose a framework to analytically estimate in closed form two important diffusion features: the EAP and the ODF. Various experiments on synthetic, phantom and human brain data have been carried out and promising results with reduced number of atoms have been obtained on diffusion signal reconstruction, thus illustrating the added value of our method over state-of-the-art SHORE and SPF based approaches.
Reliable probabilities through statistical post-processing of ensemble predictions

NASA Astrophysics Data System (ADS)

Van Schaeybroeck, Bert; Vannitsem, Stéphane

2013-04-01

We develop post-processing or calibration approaches based on linear regression that make ensemble forecasts more reliable. We enforce climatological reliability in the sense that the total variability of the prediction is equal to the variability of the observations. Second, we impose ensemble reliability such that the spread around the ensemble mean of the observation coincides with the one of the ensemble members. In general the attractors of the model and reality are inhomogeneous. Therefore ensemble spread displays a variability not taken into account in standard post-processing methods. We overcome this by weighting the ensemble by a variable error. The approaches are tested in the context of the Lorenz 96 model (Lorenz 1996). The forecasts become more reliable at short lead times as reflected by a flatter rank histogram. Our best method turns out to be superior to well-established methods like EVMOS (Van Schaeybroeck and Vannitsem, 2011) and Nonhomogeneous Gaussian Regression (Gneiting et al., 2005). References [1] Gneiting, T., Raftery, A. E., Westveld, A., Goldman, T., 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133, 1098-1118. [2] Lorenz, E. N., 1996: Predictability - a problem partly solved. Proceedings, Seminar on Predictability ECMWF. 1, 1-18. [3] Van Schaeybroeck, B., and S. Vannitsem, 2011: Post-processing through linear regression, Nonlin. Processes Geophys., 18, 147.
Learning ensemble classifiers for diabetic retinopathy assessment.

PubMed

Saleh, Emran; Błaszczyński, Jerzy; Moreno, Antonio; Valls, Aida; Romero-Aroca, Pedro; de la Riva-Fernández, Sofia; Słowiński, Roman

2018-04-01

Diabetic retinopathy is one of the most common comorbidities of diabetes. Unfortunately, the recommended annual screening of the eye fundus of diabetic patients is too resource-consuming. Therefore, it is necessary to develop tools that may help doctors to determine the risk of each patient to attain this condition, so that patients with a low risk may be screened less frequently and the use of resources can be improved. This paper explores the use of two kinds of ensemble classifiers learned from data: fuzzy random forest and dominance-based rough set balanced rule ensemble. These classifiers use a small set of attributes which represent main risk factors to determine whether a patient is in risk of developing diabetic retinopathy. The levels of specificity and sensitivity obtained in the presented study are over 80%. This study is thus a first successful step towards the construction of a personalized decision support system that could help physicians in daily clinical practice. Copyright © 2017 Elsevier B.V. All rights reserved.
An iterative ensemble quasi-linear data assimilation approach for integrated reservoir monitoring

NASA Astrophysics Data System (ADS)

Li, J. Y.; Kitanidis, P. K.

2013-12-01

Reservoir forecasting and management are increasingly relying on an integrated reservoir monitoring approach, which involves data assimilation to calibrate the complex process of multi-phase flow and transport in the porous medium. The numbers of unknowns and measurements arising in such joint inversion problems are usually very large. The ensemble Kalman filter and other ensemble-based techniques are popular because they circumvent the computational barriers of computing Jacobian matrices and covariance matrices explicitly and allow nonlinear error propagation. These algorithms are very useful but their performance is not well understood and it is not clear how many realizations are needed for satisfactory results. In this presentation we introduce an iterative ensemble quasi-linear data assimilation approach for integrated reservoir monitoring. It is intended for problems for which the posterior or conditional probability density function is not too different from a Gaussian, despite nonlinearity in the state transition and observation equations. The algorithm generates realizations that have the potential to adequately represent the conditional probability density function (pdf). Theoretical analysis sheds light on the conditions under which this algorithm should work well and explains why some applications require very few realizations while others require many. This algorithm is compared with the classical ensemble Kalman filter (Evensen, 2003) and with Gu and Oliver's (2007) iterative ensemble Kalman filter on a synthetic problem of monitoring a reservoir using wellbore pressure and flux data.
Arc expression identifies the lateral amygdala fear memory trace

PubMed Central

Gouty-Colomer, L A; Hosseini, B; Marcelo, I M; Schreiber, J; Slump, D E; Yamaguchi, S; Houweling, A R; Jaarsma, D; Elgersma, Y; Kushner, S A

2016-01-01

Memories are encoded within sparsely distributed neuronal ensembles. However, the defining cellular properties of neurons within a memory trace remain incompletely understood. Using a fluorescence-based Arc reporter, we were able to visually identify the distinct subset of lateral amygdala (LA) neurons activated during auditory fear conditioning. We found that Arc-expressing neurons have enhanced intrinsic excitability and are preferentially recruited into newly encoded memory traces. Furthermore, synaptic potentiation of thalamic inputs to the LA during fear conditioning is learning-specific, postsynaptically mediated and highly localized to Arc-expressing neurons. Taken together, our findings validate the immediate-early gene Arc as a molecular marker for the LA neuronal ensemble recruited during fear learning. Moreover, these results establish a model of fear memory formation in which intrinsic excitability determines neuronal selection, whereas learning-related encoding is governed by synaptic plasticity. PMID:25802982
A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets.

PubMed

Fernández, Alberto; Carmona, Cristobal José; José Del Jesus, María; Herrera, Francisco

2017-09-01

Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this research, we overcome these problems by carrying out a combination between feature and instance selections. Feature selection will allow simplifying the overlapping areas easing the generation of rules to distinguish among the classes. Selection of instances from all classes will address the imbalance itself by finding the most appropriate class distribution for the learning task, as well as possibly removing noise and difficult borderline examples. For the sake of obtaining an optimal joint set of features and instances, we embedded the searching for both parameters in a Multi-Objective Evolutionary Algorithm, using the C4.5 decision tree as baseline classifier in this wrapper approach. The multi-objective scheme allows taking a double advantage: the search space becomes broader, and we may provide a set of different solutions in order to build an ensemble of classifiers. This proposal has been contrasted versus several state-of-the-art solutions on imbalanced classification showing excellent results in both binary and multi-class problems.
Identification of DNA-binding proteins by combining auto-cross covariance transformation and ensemble learning.

PubMed

Liu, Bin; Wang, Shanyi; Dong, Qiwen; Li, Shumin; Liu, Xuan

2016-04-20

DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. With the rapid development of next generation of sequencing technique, the number of protein sequences is unprecedentedly increasing. Thus it is necessary to develop computational methods to identify the DNA-binding proteins only based on the protein sequence information. In this study, a novel method called iDNA-KACC is presented, which combines the Support Vector Machine (SVM) and the auto-cross covariance transformation. The protein sequences are first converted into profile-based protein representation, and then converted into a series of fixed-length vectors by the auto-cross covariance transformation with Kmer composition. The sequence order effect can be effectively captured by this scheme. These vectors are then fed into Support Vector Machine (SVM) to discriminate the DNA-binding proteins from the non DNA-binding ones. iDNA-KACC achieves an overall accuracy of 75.16% and Matthew correlation coefficient of 0.5 by a rigorous jackknife test. Its performance is further improved by employing an ensemble learning approach, and the improved predictor is called iDNA-KACC-EL. Experimental results on an independent dataset shows that iDNA-KACC-EL outperforms all the other state-of-the-art predictors, indicating that it would be a useful computational tool for DNA binding protein identification. .
Room-temperature and temperature-dependent QSRR modelling for predicting the nitrate radical reaction rate constants of organic chemicals using ensemble learning methods.

PubMed

Gupta, S; Basant, N; Mohan, D; Singh, K P

2016-07-01

Experimental determinations of the rate constants of the reaction of NO3 with a large number of organic chemicals are tedious, and time and resource intensive; and the development of computational methods has widely been advocated. In this study, we have developed room-temperature (298 K) and temperature-dependent quantitative structure-reactivity relationship (QSRR) models based on the ensemble learning approaches (decision tree forest (DTF) and decision treeboost (DTB)) for predicting the rate constant of the reaction of NO3 radicals with diverse organic chemicals, under OECD guidelines. Predictive powers of the developed models were established in terms of statistical coefficients. In the test phase, the QSRR models yielded a correlation (r(2)) of >0.94 between experimental and predicted rate constants. The applicability domains of the constructed models were determined. An attempt has been made to provide the mechanistic interpretation of the selected features for QSRR development. The proposed QSRR models outperformed the previous reports, and the temperature-dependent models offered a much wider applicability domain. This is the first report presenting a temperature-dependent QSRR model for predicting the nitrate radical reaction rate constant at different temperatures. The proposed models can be useful tools in predicting the reactivities of chemicals towards NO3 radicals in the atmosphere, hence, their persistence and exposure risk assessment.
Chaotic genetic algorithm and Adaboost ensemble metamodeling approach for optimum resource planning in emergency departments.

PubMed

Yousefi, Milad; Yousefi, Moslem; Ferreira, Ricardo Poley Martins; Kim, Joong Hoon; Fogliatto, Flavio S

2018-01-01

Long length of stay and overcrowding in emergency departments (EDs) are two common problems in the healthcare industry. To decrease the average length of stay (ALOS) and tackle overcrowding, numerous resources, including the number of doctors, nurses and receptionists need to be adjusted, while a number of constraints are to be considered at the same time. In this study, an efficient method based on agent-based simulation, machine learning and the genetic algorithm (GA) is presented to determine optimum resource allocation in emergency departments. GA can effectively explore the entire domain of all 19 variables and identify the optimum resource allocation through evolution and mimicking the survival of the fittest concept. A chaotic mutation operator is used in this study to boost GA performance. A model of the system needs to be run several thousand times through the GA evolution process to evaluate each solution, hence the process is computationally expensive. To overcome this drawback, a robust metamodel is initially constructed based on an agent-based system simulation. The simulation exhibits ED performance with various resource allocations and trains the metamodel. The metamodel is created with an ensemble of the adaptive neuro-fuzzy inference system (ANFIS), feedforward neural network (FFNN) and recurrent neural network (RNN) using the adaptive boosting (AdaBoost) ensemble algorithm. The proposed GA-based optimization approach is tested in a public ED, and it is shown to decrease the ALOS in this ED case study by 14%. Additionally, the proposed metamodel shows a 26.6% improvement compared to the average results of ANFIS, FFNN and RNN in terms of mean absolute percentage error (MAPE). Copyright © 2017 Elsevier B.V. All rights reserved.
Stochastic Approaches Within a High Resolution Rapid Refresh Ensemble

NASA Astrophysics Data System (ADS)

Jankov, I.

2017-12-01

It is well known that global and regional numerical weather prediction (NWP) ensemble systems are under-dispersive, producing unreliable and overconfident ensemble forecasts. Typical approaches to alleviate this problem include the use of multiple dynamic cores, multiple physics suite configurations, or a combination of the two. While these approaches may produce desirable results, they have practical and theoretical deficiencies and are more difficult and costly to maintain. An active area of research that promotes a more unified and sustainable system is the use of stochastic physics. Stochastic approaches include Stochastic Parameter Perturbations (SPP), Stochastic Kinetic Energy Backscatter (SKEB), and Stochastic Perturbation of Physics Tendencies (SPPT). The focus of this study is to assess model performance within a convection-permitting ensemble at 3-km grid spacing across the Contiguous United States (CONUS) using a variety of stochastic approaches. A single physics suite configuration based on the operational High-Resolution Rapid Refresh (HRRR) model was utilized and ensemble members produced by employing stochastic methods. Parameter perturbations (using SPP) for select fields were employed in the Rapid Update Cycle (RUC) land surface model (LSM) and Mellor-Yamada-Nakanishi-Niino (MYNN) Planetary Boundary Layer (PBL) schemes. Within MYNN, SPP was applied to sub-grid cloud fraction, mixing length, roughness length, mass fluxes and Prandtl number. In the RUC LSM, SPP was applied to hydraulic conductivity and tested perturbing soil moisture at initial time. First iterative testing was conducted to assess the initial performance of several configuration settings (e.g. variety of spatial and temporal de-correlation lengths). Upon selection of the most promising candidate configurations using SPP, a 10-day time period was run and more robust statistics were gathered. SKEB and SPPT were included in additional retrospective tests to assess the impact of using all three stochastic approaches to address model uncertainty. Results from the stochastic perturbation testing were compared to a baseline multi-physics control ensemble. For probabilistic forecast performance the Model Evaluation Tools (MET) verification package was used.
An ensemble-of-classifiers based approach for early diagnosis of Alzheimer's disease: classification using structural features of brain images.

PubMed

Farhan, Saima; Fahiem, Muhammad Abuzar; Tauseef, Huma

2014-01-01

Structural brain imaging is playing a vital role in identification of changes that occur in brain associated with Alzheimer's disease. This paper proposes an automated image processing based approach for the identification of AD from MRI of the brain. The proposed approach is novel in a sense that it has higher specificity/accuracy values despite the use of smaller feature set as compared to existing approaches. Moreover, the proposed approach is capable of identifying AD patients in early stages. The dataset selected consists of 85 age and gender matched individuals from OASIS database. The features selected are volume of GM, WM, and CSF and size of hippocampus. Three different classification models (SVM, MLP, and J48) are used for identification of patients and controls. In addition, an ensemble of classifiers, based on majority voting, is adopted to overcome the error caused by an independent base classifier. Ten-fold cross validation strategy is applied for the evaluation of our scheme. Moreover, to evaluate the performance of proposed approach, individual features and combination of features are fed to individual classifiers and ensemble based classifier. Using size of left hippocampus as feature, the accuracy achieved with ensemble of classifiers is 93.75%, with 100% specificity and 87.5% sensitivity.
MSEBAG: a dynamic classifier ensemble generation based on `minimum-sufficient ensemble' and bagging

NASA Astrophysics Data System (ADS)

Chen, Lei; Kamel, Mohamed S.

2016-01-01

In this paper, we propose a dynamic classifier system, MSEBAG, which is characterised by searching for the 'minimum-sufficient ensemble' and bagging at the ensemble level. It adopts an 'over-generation and selection' strategy and aims to achieve a good bias-variance trade-off. In the training phase, MSEBAG first searches for the 'minimum-sufficient ensemble', which maximises the in-sample fitness with the minimal number of base classifiers. Then, starting from the 'minimum-sufficient ensemble', a backward stepwise algorithm is employed to generate a collection of ensembles. The objective is to create a collection of ensembles with a descending fitness on the data, as well as a descending complexity in the structure. MSEBAG dynamically selects the ensembles from the collection for the decision aggregation. The extended adaptive aggregation (EAA) approach, a bagging-style algorithm performed at the ensemble level, is employed for this task. EAA searches for the competent ensembles using a score function, which takes into consideration both the in-sample fitness and the confidence of the statistical inference, and averages the decisions of the selected ensembles to label the test pattern. The experimental results show that the proposed MSEBAG outperforms the benchmarks on average.
Sampling-based ensemble segmentation against inter-operator variability

NASA Astrophysics Data System (ADS)

Huo, Jing; Okada, Kazunori; Pope, Whitney; Brown, Matthew

2011-03-01

Inconsistency and a lack of reproducibility are commonly associated with semi-automated segmentation methods. In this study, we developed an ensemble approach to improve reproducibility and applied it to glioblastoma multiforme (GBM) brain tumor segmentation on T1-weigted contrast enhanced MR volumes. The proposed approach combines samplingbased simulations and ensemble segmentation into a single framework; it generates a set of segmentations by perturbing user initialization and user-specified internal parameters, then fuses the set of segmentations into a single consensus result. Three combination algorithms were applied: majority voting, averaging and expectation-maximization (EM). The reproducibility of the proposed framework was evaluated by a controlled experiment on 16 tumor cases from a multicenter drug trial. The ensemble framework had significantly better reproducibility than the individual base Otsu thresholding method (p<.001).
Distinguishing high and low flow domains in urban drainage systems 2 days ahead using numerical weather prediction ensembles

NASA Astrophysics Data System (ADS)

Courdent, Vianney; Grum, Morten; Mikkelsen, Peter Steen

2018-01-01

Precipitation constitutes a major contribution to the flow in urban storm- and wastewater systems. Forecasts of the anticipated runoff flows, created from radar extrapolation and/or numerical weather predictions, can potentially be used to optimize operation in both wet and dry weather periods. However, flow forecasts are inevitably uncertain and their use will ultimately require a trade-off between the value of knowing what will happen in the future and the probability and consequence of being wrong. In this study we examine how ensemble forecasts from the HIRLAM-DMI-S05 numerical weather prediction (NWP) model subject to three different ensemble post-processing approaches can be used to forecast flow exceedance in a combined sewer for a wide range of ratios between the probability of detection (POD) and the probability of false detection (POFD). We use a hydrological rainfall-runoff model to transform the forecasted rainfall into forecasted flow series and evaluate three different approaches to establishing the relative operating characteristics (ROC) diagram of the forecast, which is a plot of POD against POFD for each fraction of concordant ensemble members and can be used to select the weight of evidence that matches the desired trade-off between POD and POFD. In the first approach, the rainfall input to the model is calculated for each of 25 ensemble members as a weighted average of rainfall from the NWP cells over the catchment where the weights are proportional to the areal intersection between the catchment and the NWP cells. In the second approach, a total of 2825 flow ensembles are generated using rainfall input from the neighbouring NWP cells up to approximately 6 cells in all directions from the catchment. In the third approach, the first approach is extended spatially by successively increasing the area covered and for each spatial increase and each time step selecting only the cell with the highest intensity resulting in a total of 175 ensemble members. While the first and second approaches have the disadvantage of not covering the full range of the ROC diagram and being computationally heavy, respectively, the third approach leads to both a broad coverage of the ROC diagram range at a relatively low computational cost. A broad coverage of the ROC diagram offers a larger selection of prediction skill to choose from to best match to the prediction purpose. The study distinguishes itself from earlier research in being the first application to urban hydrology, with fast runoff and small catchments that are highly sensitive to local extremes. Furthermore, no earlier reference has been found on the highly efficient third approach using only neighbouring cells with the highest threat to expand the range of the ROC diagram. This study provides an efficient and robust approach to using ensemble rainfall forecasts affected by bias and misplacement errors for predicting flow threshold exceedance in urban drainage systems.

Learn ++.NC: combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes.

PubMed

Muhlbaier, Michael D; Topalis, Apostolos; Polikar, Robi

2009-01-01

We have previously introduced an incremental learning algorithm Learn(++), which learns novel information from consecutive data sets by generating an ensemble of classifiers with each data set, and combining them by weighted majority voting. However, Learn(++) suffers from an inherent "outvoting" problem when asked to learn a new class omega(new) introduced by a subsequent data set, as earlier classifiers not trained on this class are guaranteed to misclassify omega(new) instances. The collective votes of earlier classifiers, for an inevitably incorrect decision, then outweigh the votes of the new classifiers' correct decision on omega(new) instances--until there are enough new classifiers to counteract the unfair outvoting. This forces Learn(++) to generate an unnecessarily large number of classifiers. This paper describes Learn(++).NC, specifically designed for efficient incremental learning of multiple new classes using significantly fewer classifiers. To do so, Learn (++).NC introduces dynamically weighted consult and vote (DW-CAV), a novel voting mechanism for combining classifiers: individual classifiers consult with each other to determine which ones are most qualified to classify a given instance, and decide how much weight, if any, each classifier's decision should carry. Experiments on real-world problems indicate that the new algorithm performs remarkably well with substantially fewer classifiers, not only as compared to its predecessor Learn(++), but also as compared to several other algorithms recently proposed for similar problems.
Ensemble of classifiers for ontology enrichment

NASA Astrophysics Data System (ADS)

Semenova, A. V.; Kureichik, V. M.

2018-05-01

A classifier is a basis of ontology learning systems. Classification of text documents is used in many applications, such as information retrieval, information extraction, definition of spam. A new ensemble of classifiers based on SVM (a method of support vectors), LSTM (neural network) and word embedding are suggested. An experiment was conducted on open data, which allows us to conclude that the proposed classification method is promising. The implementation of the proposed classifier is performed in the Matlab using the functions of the Text Analytics Toolbox. The principal difference between the proposed ensembles of classifiers is the high quality of classification of data at acceptable time costs.
Machine-Learning Algorithms to Code Public Health Spending Accounts

PubMed Central

Leider, Jonathon P.; Resnick, Beth A.; Alfonso, Y. Natalia; Bishai, David

2017-01-01

Objectives: Government public health expenditure data sets require time- and labor-intensive manipulation to summarize results that public health policy makers can use. Our objective was to compare the performances of machine-learning algorithms with manual classification of public health expenditures to determine if machines could provide a faster, cheaper alternative to manual classification. Methods: We used machine-learning algorithms to replicate the process of manually classifying state public health expenditures, using the standardized public health spending categories from the Foundational Public Health Services model and a large data set from the US Census Bureau. We obtained a data set of 1.9 million individual expenditure items from 2000 to 2013. We collapsed these data into 147 280 summary expenditure records, and we followed a standardized method of manually classifying each expenditure record as public health, maybe public health, or not public health. We then trained 9 machine-learning algorithms to replicate the manual process. We calculated recall, precision, and coverage rates to measure the performance of individual and ensembled algorithms. Results: Compared with manual classification, the machine-learning random forests algorithm produced 84% recall and 91% precision. With algorithm ensembling, we achieved our target criterion of 90% recall by using a consensus ensemble of ≥6 algorithms while still retaining 93% coverage, leaving only 7% of the summary expenditure records unclassified. Conclusions: Machine learning can be a time- and cost-saving tool for estimating public health spending in the United States. It can be used with standardized public health spending categories based on the Foundational Public Health Services model to help parse public health expenditure information from other types of health-related spending, provide data that are more comparable across public health organizations, and evaluate the impact of evidence-based public health resource allocation. PMID:28363034
Machine-Learning Algorithms to Code Public Health Spending Accounts.

PubMed

Brady, Eoghan S; Leider, Jonathon P; Resnick, Beth A; Alfonso, Y Natalia; Bishai, David

Government public health expenditure data sets require time- and labor-intensive manipulation to summarize results that public health policy makers can use. Our objective was to compare the performances of machine-learning algorithms with manual classification of public health expenditures to determine if machines could provide a faster, cheaper alternative to manual classification. We used machine-learning algorithms to replicate the process of manually classifying state public health expenditures, using the standardized public health spending categories from the Foundational Public Health Services model and a large data set from the US Census Bureau. We obtained a data set of 1.9 million individual expenditure items from 2000 to 2013. We collapsed these data into 147 280 summary expenditure records, and we followed a standardized method of manually classifying each expenditure record as public health, maybe public health, or not public health. We then trained 9 machine-learning algorithms to replicate the manual process. We calculated recall, precision, and coverage rates to measure the performance of individual and ensembled algorithms. Compared with manual classification, the machine-learning random forests algorithm produced 84% recall and 91% precision. With algorithm ensembling, we achieved our target criterion of 90% recall by using a consensus ensemble of ≥6 algorithms while still retaining 93% coverage, leaving only 7% of the summary expenditure records unclassified. Machine learning can be a time- and cost-saving tool for estimating public health spending in the United States. It can be used with standardized public health spending categories based on the Foundational Public Health Services model to help parse public health expenditure information from other types of health-related spending, provide data that are more comparable across public health organizations, and evaluate the impact of evidence-based public health resource allocation.
Challenges in Visual Analysis of Ensembles

DOE PAGES

Crossno, Patricia

2018-04-12

Modeling physical phenomena through computational simulation increasingly relies on generating a collection of related runs, known as an ensemble. In this paper, we explore the challenges we face in developing analysis and visualization systems for large and complex ensemble data sets, which we seek to understand without having to view the results of every simulation run. Implementing approaches and ideas developed in response to this goal, we demonstrate the analysis of a 15K run material fracturing study using Slycat, our ensemble analysis system.
Challenges in Visual Analysis of Ensembles

DOE Office of Scientific and Technical Information (OSTI.GOV)

Crossno, Patricia

Modeling physical phenomena through computational simulation increasingly relies on generating a collection of related runs, known as an ensemble. In this paper, we explore the challenges we face in developing analysis and visualization systems for large and complex ensemble data sets, which we seek to understand without having to view the results of every simulation run. Implementing approaches and ideas developed in response to this goal, we demonstrate the analysis of a 15K run material fracturing study using Slycat, our ensemble analysis system.
Designing Computer-Supported Collaborative Learning at Work for Rural It Workers: Learning Ensembles and Geographic Isolation

ERIC Educational Resources Information Center

Goggins, Sean P.

2014-01-01

This paper presents the results of a 9-month ethnographic and action research study of rural technology workers where computer support for collaborative learning through workplace technologies was introduced to a US-based technology firm. Throughout the implementation of this support and participation, issues related to geographic isolation are…
Case-Based Learning, Pedagogical Innovation, and Semantic Web Technologies

ERIC Educational Resources Information Center

Martinez-Garcia, A.; Morris, S.; Tscholl, M.; Tracy, F.; Carmichael, P.

2012-01-01

This paper explores the potential of Semantic Web technologies to support teaching and learning in a variety of higher education settings in which some form of case-based learning is the pedagogy of choice. It draws on the empirical work of a major three year research and development project in the United Kingdom: "Ensemble: Semantic…
Ripple-Triggered Stimulation of the Locus Coeruleus during Post-Learning Sleep Disrupts Ripple/Spindle Coupling and Impairs Memory Consolidation

ERIC Educational Resources Information Center

Novitskaya, Yulia; Sara, Susan J.; Logothetis, Nikos K.; Eschenko, Oxana

2016-01-01

Experience-induced replay of neuronal ensembles occurs during hippocampal high-frequency oscillations, or ripples. Post-learning increase in ripple rate is predictive of memory recall, while ripple disruption impairs learning. Ripples may thus present a fundamental component of a neurophysiological mechanism of memory consolidation. In addition to…
Computer Based Behavioral Biometric Authentication via Multi-Modal Fusion

DTIC Science & Technology

2013-03-01

the decisions made by each individual modality. Fusion of features is the simple concatenation of feature vectors from multiple modalities to be...of Features BayesNet MDL 330 LibSVM PCA 80 J48 Wrapper Evaluator 11 3.5.3 Ensemble Based Decision Level Fusion. In ensemble learning multiple ...The high fusion percentages validate our hypothesis that by combining features from multiple modalities, classification accuracy can be improved. As
Wang-Landau Reaction Ensemble Method: Simulation of Weak Polyelectrolytes and General Acid-Base Reactions.

PubMed

Landsgesell, Jonas; Holm, Christian; Smiatek, Jens

2017-02-14

We present a novel method for the study of weak polyelectrolytes and general acid-base reactions in molecular dynamics and Monte Carlo simulations. The approach combines the advantages of the reaction ensemble and the Wang-Landau sampling method. Deprotonation and protonation reactions are simulated explicitly with the help of the reaction ensemble method, while the accurate sampling of the corresponding phase space is achieved by the Wang-Landau approach. The combination of both techniques provides a sufficient statistical accuracy such that meaningful estimates for the density of states and the partition sum can be obtained. With regard to these estimates, several thermodynamic observables like the heat capacity or reaction free energies can be calculated. We demonstrate that the computation times for the calculation of titration curves with a high statistical accuracy can be significantly decreased when compared to the original reaction ensemble method. The applicability of our approach is validated by the study of weak polyelectrolytes and their thermodynamic properties.
Ensemble stump classifiers and gene expression signatures in lung cancer.

PubMed

Frey, Lewis; Edgerton, Mary; Fisher, Douglas; Levy, Shawn

2007-01-01

Microarray data sets for cancer tumor tissue generally have very few samples, each sample having thousands of probes (i.e., continuous variables). The sparsity of samples makes it difficult for machine learning techniques to discover probes relevant to the classification of tumor tissue. By combining data from different platforms (i.e., data sources), data sparsity is reduced, but this typically requires normalizing data from the different platforms, which can be non-trivial. This paper proposes a variant on the idea of ensemble learners to circumvent the need for normalization. To facilitate comprehension we build ensembles of very simple classifiers known as decision stumps--decision trees of one test each. The Ensemble Stump Classifier (ESC) identifies an mRNA signature having three probes and high accuracy for distinguishing between adenocarcinoma and squamous cell carcinoma of the lung across four data sets. In terms of accuracy, ESC outperforms a decision tree classifier on all four data sets, outperforms ensemble decision trees on three data sets, and simple stump classifiers on two data sets.
Mixture models for protein structure ensembles.

PubMed

Hirsch, Michael; Habeck, Michael

2008-10-01

Protein structure ensembles provide important insight into the dynamics and function of a protein and contain information that is not captured with a single static structure. However, it is not clear a priori to what extent the variability within an ensemble is caused by internal structural changes. Additional variability results from overall translations and rotations of the molecule. And most experimental data do not provide information to relate the structures to a common reference frame. To report meaningful values of intrinsic dynamics, structural precision, conformational entropy, etc., it is therefore important to disentangle local from global conformational heterogeneity. We consider the task of disentangling local from global heterogeneity as an inference problem. We use probabilistic methods to infer from the protein ensemble missing information on reference frames and stable conformational sub-states. To this end, we model a protein ensemble as a mixture of Gaussian probability distributions of either entire conformations or structural segments. We learn these models from a protein ensemble using the expectation-maximization algorithm. Our first model can be used to find multiple conformers in a structure ensemble. The second model partitions the protein chain into locally stable structural segments or core elements and less structured regions typically found in loops. Both models are simple to implement and contain only a single free parameter: the number of conformers or structural segments. Our models can be used to analyse experimental ensembles, molecular dynamics trajectories and conformational change in proteins. The Python source code for protein ensemble analysis is available from the authors upon request.
Ensuring Agile Power to the Edge Approach in Cyberspace

DTIC Science & Technology

2011-06-01

University, , 2. Musical Director, The Nagaokakyo Chamber Ensemble, Kyoto, Japan Former Professor, Roosevelt University Chicago College of...Performing Arts The Music Conservatory 3. Professor Emeritus, Department of Electrical Engineering & Computer Science University of California...of musical ensemble without conductor. Second, we discuss to evaluate the feasibility on the Network Centric Warfare / Power to the Edge approach
Causal network in a deafferented non-human primate brain.

PubMed

Balasubramanian, Karthikeyan; Takahashi, Kazutaka; Hatsopoulos, Nicholas G

2015-01-01

De-afferented/efferented neural ensembles can undergo causal changes when interfaced to neuroprosthetic devices. These changes occur via recruitment or isolation of neurons, alterations in functional connectivity within the ensemble and/or changes in the role of neurons, i.e., excitatory/inhibitory. In this work, emergence of a causal network and changes in the dynamics are demonstrated for a deafferented brain region exposed to BMI (brain-machine interface) learning. The BMI was controlling a robot for reach-and-grasp behavior. And, the motor cortical regions used for the BMI were deafferented due to chronic amputation, and ensembles of neurons were decoded for velocity control of the multi-DOF robot. A generalized linear model-framework based Granger causality (GLM-GC) technique was used in estimating the ensemble connectivity. Model selection was based on the AIC (Akaike Information Criterion).
Gibbs Ensembles for Nearly Compatible and Incompatible Conditional Models

PubMed Central

Chen, Shyh-Huei; Wang, Yuchung J.

2010-01-01

Gibbs sampler has been used exclusively for compatible conditionals that converge to a unique invariant joint distribution. However, conditional models are not always compatible. In this paper, a Gibbs sampling-based approach — Gibbs ensemble —is proposed to search for a joint distribution that deviates least from a prescribed set of conditional distributions. The algorithm can be easily scalable such that it can handle large data sets of high dimensionality. Using simulated data, we show that the proposed approach provides joint distributions that are less discrepant from the incompatible conditionals than those obtained by other methods discussed in the literature. The ensemble approach is also applied to a data set regarding geno-polymorphism and response to chemotherapy in patients with metastatic colorectal PMID:21286232
Can machine learning complement traditional medical device surveillance? A case study of dual-chamber implantable cardioverter–defibrillators

PubMed Central

Ross, Joseph S; Bates, Jonathan; Parzynski, Craig S; Akar, Joseph G; Curtis, Jeptha P; Desai, Nihar R; Freeman, James V; Gamble, Ginger M; Kuntz, Richard; Li, Shu-Xia; Marinac-Dabic, Danica; Masoudi, Frederick A; Normand, Sharon-Lise T; Ranasinghe, Isuru; Shaw, Richard E; Krumholz, Harlan M

2017-01-01

Background Machine learning methods may complement traditional analytic methods for medical device surveillance. Methods and results Using data from the National Cardiovascular Data Registry for implantable cardioverter–defibrillators (ICDs) linked to Medicare administrative claims for longitudinal follow-up, we applied three statistical approaches to safety-signal detection for commonly used dual-chamber ICDs that used two propensity score (PS) models: one specified by subject-matter experts (PS-SME), and the other one by machine learning-based selection (PS-ML). The first approach used PS-SME and cumulative incidence (time-to-event), the second approach used PS-SME and cumulative risk (Data Extraction and Longitudinal Trend Analysis [DELTA]), and the third approach used PS-ML and cumulative risk (embedded feature selection). Safety-signal surveillance was conducted for eleven dual-chamber ICD models implanted at least 2,000 times over 3 years. Between 2006 and 2010, there were 71,948 Medicare fee-for-service beneficiaries who received dual-chamber ICDs. Cumulative device-specific unadjusted 3-year event rates varied for three surveyed safety signals: death from any cause, 12.8%–20.9%; nonfatal ICD-related adverse events, 19.3%–26.3%; and death from any cause or nonfatal ICD-related adverse event, 27.1%–37.6%. Agreement among safety signals detected/not detected between the time-to-event and DELTA approaches was 90.9% (360 of 396, k=0.068), between the time-to-event and embedded feature-selection approaches was 91.7% (363 of 396, k=−0.028), and between the DELTA and embedded feature selection approaches was 88.1% (349 of 396, k=−0.042). Conclusion Three statistical approaches, including one machine learning method, identified important safety signals, but without exact agreement. Ensemble methods may be needed to detect all safety signals for further evaluation during medical device surveillance. PMID:28860874
Cases, Simulacra, and Semantic Web Technologies

ERIC Educational Resources Information Center

Carmichael, P.; Tscholl, M.

2013-01-01

"Ensemble" is an interdisciplinary research and development project exploring the potential role of emerging Semantic Web technologies in case-based learning across learning environments in higher education. Empirical findings have challenged the claim that cases "bring reality into the classroom" and that this, in turn, might…
Convolutional Neural Network for Multi-Source Deep Learning Crop Classification in Ukraine

NASA Astrophysics Data System (ADS)

Lavreniuk, M. S.

2016-12-01

Land cover and crop type maps are one of the most essential inputs when dealing with environmental and agriculture monitoring tasks [1]. During long time neural network (NN) approach was one of the most efficient and popular approach for most applications, including crop classification using remote sensing data, with high an overall accuracy (OA) [2]. In the last years the most popular and efficient method for multi-sensor and multi-temporal land cover classification is convolution neural networks (CNNs). Taking into account presence clouds in optical data, self-organizing Kohonen maps (SOMs) are used to restore missing pixel values in a time series of optical imagery from Landsat-8 satellite. After missing data restoration, optical data from Landsat-8 was merged with Sentinel-1A radar data for better crop types discrimination [3]. An ensemble of CNNs is proposed for multi-temporal satellite images supervised classification. Each CNN in the corresponding ensemble is a 1-d CNN with 4 layers implemented using the Google's library TensorFlow. The efficiency of the proposed approach was tested on a time-series of Landsat-8 and Sentinel-1A images over the JECAM test site (Kyiv region) in Ukraine in 2015. Overall classification accuracy for ensemble of CNNs was 93.5% that outperformed an ensemble of multi-layer perceptrons (MLPs) by +0.8% and allowed us to better discriminate summer crops, in particular maize and soybeans. For 2016 we would like to validate this method using Sentinel-1 and Sentinel-2 data for Ukraine territory within ESA project on country level demonstration Sen2Agri. 1. A. Kolotii et al., "Comparison of biophysical and satellite predictors for wheat yield forecasting in Ukraine," The Int. Arch. of Photogram., Rem. Sens. and Spatial Inform. Scie., vol. 40, no. 7, pp. 39-44, 2015. 2. F. Waldner et al., "Towards a set of agrosystem-specific cropland mapping methods to address the global cropland diversity," Int. Journal of Rem. Sens. vol. 37, no. 14, pp 3196-3231, 2016. 3. S. Skakun et al., "Efficiency assessment of multitemporal C-band Radarsat-2 intensity and Landsat-8 surface reflectance satellite imagery for crop classification in Ukraine," IEEE Journal of Selected Topics in Applied Earth Observ. and Rem. Sens., 2015, DOI: 10.1109/JSTARS.2015.2454297.
Machine learning algorithms for the creation of clinical healthcare enterprise systems

NASA Astrophysics Data System (ADS)

Mandal, Indrajit

2017-10-01

Clinical recommender systems are increasingly becoming popular for improving modern healthcare systems. Enterprise systems are persuasively used for creating effective nurse care plans to provide nurse training, clinical recommendations and clinical quality control. A novel design of a reliable clinical recommender system based on multiple classifier system (MCS) is implemented. A hybrid machine learning (ML) ensemble based on random subspace method and random forest is presented. The performance accuracy and robustness of proposed enterprise architecture are quantitatively estimated to be above 99% and 97%, respectively (above 95% confidence interval). The study then extends to experimental analysis of the clinical recommender system with respect to the noisy data environment. The ranking of items in nurse care plan is demonstrated using machine learning algorithms (MLAs) to overcome the drawback of the traditional association rule method. The promising experimental results are compared against the sate-of-the-art approaches to highlight the advancement in recommendation technology. The proposed recommender system is experimentally validated using five benchmark clinical data to reinforce the research findings.

"Playing It Like a Professional": Approaches to Ensemble Direction in Tertiary Institutions

ERIC Educational Resources Information Center

Harrison, Scott; O'Bryan, Jessica; Lebler, Don

2013-01-01

This article reports on a case study of three directors of large ensembles within a large conservatoire and the ways in which they attempted to scaffold their students into professional music careers. The core aim in this article is to respond to the question "What is the role and function of the ensemble experience on the training of the…
EHR-based phenotyping: Bulk learning and evaluation.

PubMed

Chiu, Po-Hsiang; Hripcsak, George

2017-06-01

In data-driven phenotyping, a core computational task is to identify medical concepts and their variations from sources of electronic health records (EHR) to stratify phenotypic cohorts. A conventional analytic framework for phenotyping largely uses a manual knowledge engineering approach or a supervised learning approach where clinical cases are represented by variables encompassing diagnoses, medicinal treatments and laboratory tests, among others. In such a framework, tasks associated with feature engineering and data annotation remain a tedious and expensive exercise, resulting in poor scalability. In addition, certain clinical conditions, such as those that are rare and acute in nature, may never accumulate sufficient data over time, which poses a challenge to establishing accurate and informative statistical models. In this paper, we use infectious diseases as the domain of study to demonstrate a hierarchical learning method based on ensemble learning that attempts to address these issues through feature abstraction. We use a sparse annotation set to train and evaluate many phenotypes at once, which we call bulk learning. In this batch-phenotyping framework, disease cohort definitions can be learned from within the abstract feature space established by using multiple diseases as a substrate and diagnostic codes as surrogates. In particular, using surrogate labels for model training renders possible its subsequent evaluation using only a sparse annotated sample. Moreover, statistical models can be trained and evaluated, using the same sparse annotation, from within the abstract feature space of low dimensionality that encapsulates the shared clinical traits of these target diseases, collectively referred to as the bulk learning set. Copyright © 2017 Elsevier Inc. All rights reserved.
Collective coupling in hybrid superconducting circuits

NASA Astrophysics Data System (ADS)

Saito, Shiro

Hybrid quantum systems utilizing superconducting circuits have attracted significant recent attention, not only for quantum information processing tasks but also as a way to explore fundamentally new physics regimes. In this talk, I will discuss two superconducting circuit based hybrid quantum system approaches. The first is a superconducting flux qubit - electron spin ensemble hybrid system in which quantum information manipulated in the flux qubit can be transferred to, stored in and retrieved from the ensemble. Although the coherence time of the ensemble is short, about 20 ns, this is a significant first step to utilize the spin ensemble as quantum memory for superconducting flux qubits. The second approach is a superconducting resonator - flux qubit ensemble hybrid system in which we fabricated a superconducting LC resonator coupled to a large ensemble of flux qubits. Here we observed a dispersive frequency shift of approximately 250 MHz in the resonators transmission spectrum. This indicates thousands of flux qubits are coupling to the resonator collectively. Although we need to improve our qubits inhomogeneity, our system has many potential uses including the creation of new quantum metamaterials, novel applications in quantum metrology and so on. This work was partially supported by JSPS KAKENHI Grant Number 25220601.
Classifier ensemble based on feature selection and diversity measures for predicting the affinity of A(2B) adenosine receptor antagonists.

PubMed

Bonet, Isis; Franco-Montero, Pedro; Rivero, Virginia; Teijeira, Marta; Borges, Fernanda; Uriarte, Eugenio; Morales Helguera, Aliuska

2013-12-23

A(2B) adenosine receptor antagonists may be beneficial in treating diseases like asthma, diabetes, diabetic retinopathy, and certain cancers. This has stimulated research for the development of potent ligands for this subtype, based on quantitative structure-affinity relationships. In this work, a new ensemble machine learning algorithm is proposed for classification and prediction of the ligand-binding affinity of A(2B) adenosine receptor antagonists. This algorithm is based on the training of different classifier models with multiple training sets (composed of the same compounds but represented by diverse features). The k-nearest neighbor, decision trees, neural networks, and support vector machines were used as single classifiers. To select the base classifiers for combining into the ensemble, several diversity measures were employed. The final multiclassifier prediction results were computed from the output obtained by using a combination of selected base classifiers output, by utilizing different mathematical functions including the following: majority vote, maximum and average probability. In this work, 10-fold cross- and external validation were used. The strategy led to the following results: i) the single classifiers, together with previous features selections, resulted in good overall accuracy, ii) a comparison between single classifiers, and their combinations in the multiclassifier model, showed that using our ensemble gave a better performance than the single classifier model, and iii) our multiclassifier model performed better than the most widely used multiclassifier models in the literature. The results and statistical analysis demonstrated the supremacy of our multiclassifier approach for predicting the affinity of A(2B) adenosine receptor antagonists, and it can be used to develop other QSAR models.
Ensemble of surrogates-based optimization for identifying an optimal surfactant-enhanced aquifer remediation strategy at heterogeneous DNAPL-contaminated sites

NASA Astrophysics Data System (ADS)

Jiang, Xue; Lu, Wenxi; Hou, Zeyu; Zhao, Haiqing; Na, Jin

2015-11-01

The purpose of this study was to identify an optimal surfactant-enhanced aquifer remediation (SEAR) strategy for aquifers contaminated by dense non-aqueous phase liquid (DNAPL) based on an ensemble of surrogates-based optimization technique. A saturated heterogeneous medium contaminated by nitrobenzene was selected as case study. A new kind of surrogate-based SEAR optimization employing an ensemble surrogate (ES) model together with a genetic algorithm (GA) is presented. Four methods, namely radial basis function artificial neural network (RBFANN), kriging (KRG), support vector regression (SVR), and kernel extreme learning machines (KELM), were used to create four individual surrogate models, which were then compared. The comparison enabled us to select the two most accurate models (KELM and KRG) to establish an ES model of the SEAR simulation model, and the developed ES model as well as these four stand-alone surrogate models was compared. The results showed that the average relative error of the average nitrobenzene removal rates between the ES model and the simulation model for 20 test samples was 0.8%, which is a high approximation accuracy, and which indicates that the ES model provides more accurate predictions than the stand-alone surrogate models. Then, a nonlinear optimization model was formulated for the minimum cost, and the developed ES model was embedded into this optimization model as a constrained condition. Besides, GA was used to solve the optimization model to provide the optimal SEAR strategy. The developed ensemble surrogate-optimization approach was effective in seeking a cost-effective SEAR strategy for heterogeneous DNAPL-contaminated sites. This research is expected to enrich and develop the theoretical and technical implications for the analysis of remediation strategy optimization of DNAPL-contaminated aquifers.
Ensemble of Surrogates-based Optimization for Identifying an Optimal Surfactant-enhanced Aquifer Remediation Strategy at Heterogeneous DNAPL-contaminated Sites

NASA Astrophysics Data System (ADS)

Lu, W., Sr.; Xin, X.; Luo, J.; Jiang, X.; Zhang, Y.; Zhao, Y.; Chen, M.; Hou, Z.; Ouyang, Q.

2015-12-01

The purpose of this study was to identify an optimal surfactant-enhanced aquifer remediation (SEAR) strategy for aquifers contaminated by dense non-aqueous phase liquid (DNAPL) based on an ensemble of surrogates-based optimization technique. A saturated heterogeneous medium contaminated by nitrobenzene was selected as case study. A new kind of surrogate-based SEAR optimization employing an ensemble surrogate (ES) model together with a genetic algorithm (GA) is presented. Four methods, namely radial basis function artificial neural network (RBFANN), kriging (KRG), support vector regression (SVR), and kernel extreme learning machines (KELM), were used to create four individual surrogate models, which were then compared. The comparison enabled us to select the two most accurate models (KELM and KRG) to establish an ES model of the SEAR simulation model, and the developed ES model as well as these four stand-alone surrogate models was compared. The results showed that the average relative error of the average nitrobenzene removal rates between the ES model and the simulation model for 20 test samples was 0.8%, which is a high approximation accuracy, and which indicates that the ES model provides more accurate predictions than the stand-alone surrogate models. Then, a nonlinear optimization model was formulated for the minimum cost, and the developed ES model was embedded into this optimization model as a constrained condition. Besides, GA was used to solve the optimization model to provide the optimal SEAR strategy. The developed ensemble surrogate-optimization approach was effective in seeking a cost-effective SEAR strategy for heterogeneous DNAPL-contaminated sites. This research is expected to enrich and develop the theoretical and technical implications for the analysis of remediation strategy optimization of DNAPL-contaminated aquifers.
Ensemble Clustering Classification compete SVM and One-Class classifiers applied on plant microRNAs Data.

PubMed

Yousef, Malik; Khalifa, Waleed; AbedAllah, Loai

2016-12-22

The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that ECkNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.
Ensemble Clustering Classification Applied to Competing SVM and One-Class Classifiers Exemplified by Plant MicroRNAs Data.

PubMed

Yousef, Malik; Khalifa, Waleed; AbdAllah, Loai

2016-12-01

The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that EC-kNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.
The Schaake shuffle: A method for reconstructing space-time variability in forecasted precipitation and temperature fields

USGS Publications Warehouse

Clark, M.R.; Gangopadhyay, S.; Hay, L.; Rajagopalan, B.; Wilby, R.

2004-01-01

A number of statistical methods that are used to provide local-scale ensemble forecasts of precipitation and temperature do not contain realistic spatial covariability between neighboring stations or realistic temporal persistence for subsequent forecast lead times. To demonstrate this point, output from a global-scale numerical weather prediction model is used in a stepwise multiple linear regression approach to downscale precipitation and temperature to individual stations located in and around four study basins in the United States. Output from the forecast model is downscaled for lead times up to 14 days. Residuals in the regression equation are modeled stochastically to provide 100 ensemble forecasts. The precipitation and temperature ensembles from this approach have a poor representation of the spatial variability and temporal persistence. The spatial correlations for downscaled output are considerably lower than observed spatial correlations at short forecast lead times (e.g., less than 5 days) when there is high accuracy in the forecasts. At longer forecast lead times, the downscaled spatial correlations are close to zero. Similarly, the observed temporal persistence is only partly present at short forecast lead times. A method is presented for reordering the ensemble output in order to recover the space-time variability in precipitation and temperature fields. In this approach, the ensemble members for a given forecast day are ranked and matched with the rank of precipitation and temperature data from days randomly selected from similar dates in the historical record. The ensembles are then reordered to correspond to the original order of the selection of historical data. Using this approach, the observed intersite correlations, intervariable correlations, and the observed temporal persistence are almost entirely recovered. This reordering methodology also has applications for recovering the space-time variability in modeled streamflow. ?? 2004 American Meteorological Society.
Evaluation of medium-range ensemble flood forecasting based on calibration strategies and ensemble methods in Lanjiang Basin, Southeast China

NASA Astrophysics Data System (ADS)

Liu, Li; Gao, Chao; Xuan, Weidong; Xu, Yue-Ping

2017-11-01

Ensemble flood forecasts by hydrological models using numerical weather prediction products as forcing data are becoming more commonly used in operational flood forecasting applications. In this study, a hydrological ensemble flood forecasting system comprised of an automatically calibrated Variable Infiltration Capacity model and quantitative precipitation forecasts from TIGGE dataset is constructed for Lanjiang Basin, Southeast China. The impacts of calibration strategies and ensemble methods on the performance of the system are then evaluated. The hydrological model is optimized by the parallel programmed ε-NSGA II multi-objective algorithm. According to the solutions by ε-NSGA II, two differently parameterized models are determined to simulate daily flows and peak flows at each of the three hydrological stations. Then a simple yet effective modular approach is proposed to combine these daily and peak flows at the same station into one composite series. Five ensemble methods and various evaluation metrics are adopted. The results show that ε-NSGA II can provide an objective determination on parameter estimation, and the parallel program permits a more efficient simulation. It is also demonstrated that the forecasts from ECMWF have more favorable skill scores than other Ensemble Prediction Systems. The multimodel ensembles have advantages over all the single model ensembles and the multimodel methods weighted on members and skill scores outperform other methods. Furthermore, the overall performance at three stations can be satisfactory up to ten days, however the hydrological errors can degrade the skill score by approximately 2 days, and the influence persists until a lead time of 10 days with a weakening trend. With respect to peak flows selected by the Peaks Over Threshold approach, the ensemble means from single models or multimodels are generally underestimated, indicating that the ensemble mean can bring overall improvement in forecasting of flows. For peak values taking flood forecasts from each individual member into account is more appropriate.
Development of Super-Ensemble techniques for ocean analyses: the Mediterranean Sea case

NASA Astrophysics Data System (ADS)

Pistoia, Jenny; Pinardi, Nadia; Oddo, Paolo; Collins, Matthew; Korres, Gerasimos; Drillet, Yann

2017-04-01

Short-term ocean analyses for Sea Surface Temperature SST in the Mediterranean Sea can be improved by a statistical post-processing technique, called super-ensemble. This technique consists in a multi-linear regression algorithm applied to a Multi-Physics Multi-Model Super-Ensemble (MMSE) dataset, a collection of different operational forecasting analyses together with ad-hoc simulations produced by modifying selected numerical model parameterizations. A new linear regression algorithm based on Empirical Orthogonal Function filtering techniques is capable to prevent overfitting problems, even if best performances are achieved when we add correlation to the super-ensemble structure using a simple spatial filter applied after the linear regression. Our outcomes show that super-ensemble performances depend on the selection of an unbiased operator and the length of the learning period, but the quality of the generating MMSE dataset has the largest impact on the MMSE analysis Root Mean Square Error (RMSE) evaluated with respect to observed satellite SST. Lower RMSE analysis estimates result from the following choices: 15 days training period, an overconfident MMSE dataset (a subset with the higher quality ensemble members), and the least square algorithm being filtered a posteriori.
Multi-model ensembles for assessment of flood losses and associated uncertainty

NASA Astrophysics Data System (ADS)

Figueiredo, Rui; Schröter, Kai; Weiss-Motz, Alexander; Martina, Mario L. V.; Kreibich, Heidi

2018-05-01

Flood loss modelling is a crucial part of risk assessments. However, it is subject to large uncertainty that is often neglected. Most models available in the literature are deterministic, providing only single point estimates of flood loss, and large disparities tend to exist among them. Adopting any one such model in a risk assessment context is likely to lead to inaccurate loss estimates and sub-optimal decision-making. In this paper, we propose the use of multi-model ensembles to address these issues. This approach, which has been applied successfully in other scientific fields, is based on the combination of different model outputs with the aim of improving the skill and usefulness of predictions. We first propose a model rating framework to support ensemble construction, based on a probability tree of model properties, which establishes relative degrees of belief between candidate models. Using 20 flood loss models in two test cases, we then construct numerous multi-model ensembles, based both on the rating framework and on a stochastic method, differing in terms of participating members, ensemble size and model weights. We evaluate the performance of ensemble means, as well as their probabilistic skill and reliability. Our results demonstrate that well-designed multi-model ensembles represent a pragmatic approach to consistently obtain more accurate flood loss estimates and reliable probability distributions of model uncertainty.
Theoretical and Empirical Analysis of a Spatial EA Parallel Boosting Algorithm.

PubMed

Kamath, Uday; Domeniconi, Carlotta; De Jong, Kenneth

2018-01-01

Many real-world problems involve massive amounts of data. Under these circumstances learning algorithms often become prohibitively expensive, making scalability a pressing issue to be addressed. A common approach is to perform sampling to reduce the size of the dataset and enable efficient learning. Alternatively, one customizes learning algorithms to achieve scalability. In either case, the key challenge is to obtain algorithmic efficiency without compromising the quality of the results. In this article we discuss a meta-learning algorithm (PSBML) that combines concepts from spatially structured evolutionary algorithms (SSEAs) with concepts from ensemble and boosting methodologies to achieve the desired scalability property. We present both theoretical and empirical analyses which show that PSBML preserves a critical property of boosting, specifically, convergence to a distribution centered around the margin. We then present additional empirical analyses showing that this meta-level algorithm provides a general and effective framework that can be used in combination with a variety of learning classifiers. We perform extensive experiments to investigate the trade-off achieved between scalability and accuracy, and robustness to noise, on both synthetic and real-world data. These empirical results corroborate our theoretical analysis, and demonstrate the potential of PSBML in achieving scalability without sacrificing accuracy.
Ensembles and Experiments in Classical and Quantum Physics

NASA Astrophysics Data System (ADS)

Neumaier, Arnold

A philosophically consistent axiomatic approach to classical and quantum mechanics is given. The approach realizes a strong formal implementation of Bohr's correspondence principle. In all instances, classical and quantum concepts are fully parallel: the same general theory has a classical realization and a quantum realization. Extending the ''probability via expectation'' approach of Whittle to noncommuting quantities, this paper defines quantities, ensembles, and experiments as mathematical concepts and shows how to model complementarity, uncertainty, probability, nonlocality and dynamics in these terms. The approach carries no connotation of unlimited repeatability; hence it can be applied to unique systems such as the universe. Consistent experiments provide an elegant solution to the reality problem, confirming the insistence of the orthodox Copenhagen interpretation on that there is nothing but ensembles, while avoiding its elusive reality picture. The weak law of large numbers explains the emergence of classical properties for macroscopic systems.
Modulating RNA Alignment Using Directional Dynamic Kinks: Application in Determining an Atomic-Resolution Ensemble for a Hairpin using NMR Residual Dipolar Couplings.

PubMed

Salmon, Loïc; Giambaşu, George M; Nikolova, Evgenia N; Petzold, Katja; Bhattacharya, Akash; Case, David A; Al-Hashimi, Hashim M

2015-10-14

Approaches that combine experimental data and computational molecular dynamics (MD) to determine atomic resolution ensembles of biomolecules require the measurement of abundant experimental data. NMR residual dipolar couplings (RDCs) carry rich dynamics information, however, difficulties in modulating overall alignment of nucleic acids have limited the ability to fully extract this information. We present a strategy for modulating RNA alignment that is based on introducing variable dynamic kinks in terminal helices. With this strategy, we measured seven sets of RDCs in a cUUCGg apical loop and used this rich data set to test the accuracy of an 0.8 μs MD simulation computed using the Amber ff10 force field as well as to determine an atomic resolution ensemble. The MD-generated ensemble quantitatively reproduces the measured RDCs, but selection of a sub-ensemble was required to satisfy the RDCs within error. The largest discrepancies between the RDC-selected and MD-generated ensembles are observed for the most flexible loop residues and backbone angles connecting the loop to the helix, with the RDC-selected ensemble resulting in more uniform dynamics. Comparison of the RDC-selected ensemble with NMR spin relaxation data suggests that the dynamics occurs on the ps-ns time scales as verified by measurements of R(1ρ) relaxation-dispersion data. The RDC-satisfying ensemble samples many conformations adopted by the hairpin in crystal structures indicating that intrinsic plasticity may play important roles in conformational adaptation. The approach presented here can be applied to test nucleic acid force fields and to characterize dynamics in diverse RNA motifs at atomic resolution.
Residue-level global and local ensemble-ensemble comparisons of protein domains.

PubMed

Clark, Sarah A; Tronrud, Dale E; Karplus, P Andrew

2015-09-01

Many methods of protein structure generation such as NMR-based solution structure determination and template-based modeling do not produce a single model, but an ensemble of models consistent with the available information. Current strategies for comparing ensembles lose information because they use only a single representative structure. Here, we describe the ENSEMBLATOR and its novel strategy to directly compare two ensembles containing the same atoms to identify significant global and local backbone differences between them on per-atom and per-residue levels, respectively. The ENSEMBLATOR has four components: eePREP (ee for ensemble-ensemble), which selects atoms common to all models; eeCORE, which identifies atoms belonging to a cutoff-distance dependent common core; eeGLOBAL, which globally superimposes all models using the defined core atoms and calculates for each atom the two intraensemble variations, the interensemble variation, and the closest approach of members of the two ensembles; and eeLOCAL, which performs a local overlay of each dipeptide and, using a novel measure of local backbone similarity, reports the same four variations as eeGLOBAL. The combination of eeGLOBAL and eeLOCAL analyses identifies the most significant differences between ensembles. We illustrate the ENSEMBLATOR's capabilities by showing how using it to analyze NMR ensembles and to compare NMR ensembles with crystal structures provides novel insights compared to published studies. One of these studies leads us to suggest that a "consistency check" of NMR-derived ensembles may be a useful analysis step for NMR-based structure determinations in general. The ENSEMBLATOR 1.0 is available as a first generation tool to carry out ensemble-ensemble comparisons. © 2015 The Protein Society.
Residue-level global and local ensemble-ensemble comparisons of protein domains

PubMed Central

Clark, Sarah A; Tronrud, Dale E; Andrew Karplus, P

2015-01-01

Many methods of protein structure generation such as NMR-based solution structure determination and template-based modeling do not produce a single model, but an ensemble of models consistent with the available information. Current strategies for comparing ensembles lose information because they use only a single representative structure. Here, we describe the ENSEMBLATOR and its novel strategy to directly compare two ensembles containing the same atoms to identify significant global and local backbone differences between them on per-atom and per-residue levels, respectively. The ENSEMBLATOR has four components: eePREP (ee for ensemble-ensemble), which selects atoms common to all models; eeCORE, which identifies atoms belonging to a cutoff-distance dependent common core; eeGLOBAL, which globally superimposes all models using the defined core atoms and calculates for each atom the two intraensemble variations, the interensemble variation, and the closest approach of members of the two ensembles; and eeLOCAL, which performs a local overlay of each dipeptide and, using a novel measure of local backbone similarity, reports the same four variations as eeGLOBAL. The combination of eeGLOBAL and eeLOCAL analyses identifies the most significant differences between ensembles. We illustrate the ENSEMBLATOR's capabilities by showing how using it to analyze NMR ensembles and to compare NMR ensembles with crystal structures provides novel insights compared to published studies. One of these studies leads us to suggest that a “consistency check” of NMR-derived ensembles may be a useful analysis step for NMR-based structure determinations in general. The ENSEMBLATOR 1.0 is available as a first generation tool to carry out ensemble-ensemble comparisons. PMID:26032515
Using Theater Concepts in the TESOL Classroom

ERIC Educational Resources Information Center

Badie, Gina Tiffany

2014-01-01

This article discusses practical ways to incorporate theater concepts into the ESL classroom. The notion of a theater ensemble lends itself well to group work in language learning. I have used my experience auditioning, participating in theater games, and improv techniques to encourage second language learning through public speaking, group…
Comparison of machine-learning methods for above-ground biomass estimation based on Landsat imagery

NASA Astrophysics Data System (ADS)

Wu, Chaofan; Shen, Huanhuan; Shen, Aihua; Deng, Jinsong; Gan, Muye; Zhu, Jinxia; Xu, Hongwei; Wang, Ke

2016-07-01

Biomass is one significant biophysical parameter of a forest ecosystem, and accurate biomass estimation on the regional scale provides important information for carbon-cycle investigation and sustainable forest management. In this study, Landsat satellite imagery data combined with field-based measurements were integrated through comparisons of five regression approaches [stepwise linear regression, K-nearest neighbor, support vector regression, random forest (RF), and stochastic gradient boosting] with two different candidate variable strategies to implement the optimal spatial above-ground biomass (AGB) estimation. The results suggested that RF algorithm exhibited the best performance by 10-fold cross-validation with respect to R2 (0.63) and root-mean-square error (26.44 ton/ha). Consequently, the map of estimated AGB was generated with a mean value of 89.34 ton/ha in northwestern Zhejiang Province, China, with a similar pattern to the distribution mode of local forest species. This research indicates that machine-learning approaches associated with Landsat imagery provide an economical way for biomass estimation. Moreover, ensemble methods using all candidate variables, especially for Landsat images, provide an alternative for regional biomass simulation.
Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods.

PubMed

Notaro, Marco; Schubach, Max; Robinson, Peter N; Valentini, Giorgio

2017-10-12

The prediction of human gene-abnormal phenotype associations is a fundamental step toward the discovery of novel genes associated with human disorders, especially when no genes are known to be associated with a specific disease. In this context the Human Phenotype Ontology (HPO) provides a standard categorization of the abnormalities associated with human diseases. While the problem of the prediction of gene-disease associations has been widely investigated, the related problem of gene-phenotypic feature (i.e., HPO term) associations has been largely overlooked, even if for most human genes no HPO term associations are known and despite the increasing application of the HPO to relevant medical problems. Moreover most of the methods proposed in literature are not able to capture the hierarchical relationships between HPO terms, thus resulting in inconsistent and relatively inaccurate predictions. We present two hierarchical ensemble methods that we formally prove to provide biologically consistent predictions according to the hierarchical structure of the HPO. The modular structure of the proposed methods, that consists in a "flat" learning first step and a hierarchical combination of the predictions in the second step, allows the predictions of virtually any flat learning method to be enhanced. The experimental results show that hierarchical ensemble methods are able to predict novel associations between genes and abnormal phenotypes with results that are competitive with state-of-the-art algorithms and with a significant reduction of the computational complexity. Hierarchical ensembles are efficient computational methods that guarantee biologically meaningful predictions that obey the true path rule, and can be used as a tool to improve and make consistent the HPO terms predictions starting from virtually any flat learning method. The implementation of the proposed methods is available as an R package from the CRAN repository.

Identifying pollution sources and predicting urban air quality using ensemble learning methods

NASA Astrophysics Data System (ADS)

Singh, Kunwar P.; Gupta, Shikha; Rai, Premanjali

2013-12-01

In this study, principal components analysis (PCA) was performed to identify air pollution sources and tree based ensemble learning models were constructed to predict the urban air quality of Lucknow (India) using the air quality and meteorological databases pertaining to a period of five years. PCA identified vehicular emissions and fuel combustion as major air pollution sources. The air quality indices revealed the air quality unhealthy during the summer and winter. Ensemble models were constructed to discriminate between the seasonal air qualities, factors responsible for discrimination, and to predict the air quality indices. Accordingly, single decision tree (SDT), decision tree forest (DTF), and decision treeboost (DTB) were constructed and their generalization and predictive performance was evaluated in terms of several statistical parameters and compared with conventional machine learning benchmark, support vector machines (SVM). The DT and SVM models discriminated the seasonal air quality rendering misclassification rate (MR) of 8.32% (SDT); 4.12% (DTF); 5.62% (DTB), and 6.18% (SVM), respectively in complete data. The AQI and CAQI regression models yielded a correlation between measured and predicted values and root mean squared error of 0.901, 6.67 and 0.825, 9.45 (SDT); 0.951, 4.85 and 0.922, 6.56 (DTF); 0.959, 4.38 and 0.929, 6.30 (DTB); 0.890, 7.00 and 0.836, 9.16 (SVR) in complete data. The DTF and DTB models outperformed the SVM both in classification and regression which could be attributed to the incorporation of the bagging and boosting algorithms in these models. The proposed ensemble models successfully predicted the urban ambient air quality and can be used as effective tools for its management.
Operational planning using Climatological Observations for Maritime Prediction and Analysis Support Service (COMPASS)

NASA Astrophysics Data System (ADS)

O'Connor, Alison; Kirtman, Benjamin; Harrison, Scott; Gorman, Joe

2016-05-01

The US Navy faces several limitations when planning operations in regard to forecasting environmental conditions. Currently, mission analysis and planning tools rely heavily on short-term (less than a week) forecasts or long-term statistical climate products. However, newly available data in the form of weather forecast ensembles provides dynamical and statistical extended-range predictions that can produce more accurate predictions if ensemble members can be combined correctly. Charles River Analytics is designing the Climatological Observations for Maritime Prediction and Analysis Support Service (COMPASS), which performs data fusion over extended-range multi-model ensembles, such as the North American Multi-Model Ensemble (NMME), to produce a unified forecast for several weeks to several seasons in the future. We evaluated thirty years of forecasts using machine learning to select predictions for an all-encompassing and superior forecast that can be used to inform the Navy's decision planning process.
Random forest feature selection, fusion and ensemble strategy: Combining multiple morphological MRI measures to discriminate among healhy elderly, MCI, cMCI and alzheimer's disease patients: From the alzheimer's disease neuroimaging initiative (ADNI) database.

PubMed

Dimitriadis, S I; Liparas, Dimitris; Tsolaki, Magda N

2018-05-15

In the era of computer-assisted diagnostic tools for various brain diseases, Alzheimer's disease (AD) covers a large percentage of neuroimaging research, with the main scope being its use in daily practice. However, there has been no study attempting to simultaneously discriminate among Healthy Controls (HC), early mild cognitive impairment (MCI), late MCI (cMCI) and stable AD, using features derived from a single modality, namely MRI. Based on preprocessed MRI images from the organizers of a neuroimaging challenge, 3 we attempted to quantify the prediction accuracy of multiple morphological MRI features to simultaneously discriminate among HC, MCI, cMCI and AD. We explored the efficacy of a novel scheme that includes multiple feature selections via Random Forest from subsets of the whole set of features (e.g. whole set, left/right hemisphere etc.), Random Forest classification using a fusion approach and ensemble classification via majority voting. From the ADNI database, 60 HC, 60 MCI, 60 cMCI and 60 CE were used as a training set with known labels. An extra dataset of 160 subjects (HC: 40, MCI: 40, cMCI: 40 and AD: 40) was used as an external blind validation dataset to evaluate the proposed machine learning scheme. In the second blind dataset, we succeeded in a four-class classification of 61.9% by combining MRI-based features with a Random Forest-based Ensemble Strategy. We achieved the best classification accuracy of all teams that participated in this neuroimaging competition. The results demonstrate the effectiveness of the proposed scheme to simultaneously discriminate among four groups using morphological MRI features for the very first time in the literature. Hence, the proposed machine learning scheme can be used to define single and multi-modal biomarkers for AD. Copyright © 2017 Elsevier B.V. All rights reserved.
Attractor learning in synchronized chaotic systems in the presence of unresolved scales

NASA Astrophysics Data System (ADS)

Wiegerinck, W.; Selten, F. M.

2017-12-01

Recently, supermodels consisting of an ensemble of interacting models, synchronizing on a common solution, have been proposed as an alternative to the common non-interactive multi-model ensembles in order to improve climate predictions. The connection terms in the interacting ensemble are to be optimized based on the data. The supermodel approach has been successfully demonstrated in a number of simulation experiments with an assumed ground truth and a set of good, but imperfect models. The supermodels were optimized with respect to their short-term prediction error. Nevertheless, they produced long-term climatological behavior that was close to the long-term behavior of the assumed ground truth, even in cases where the long-term behavior of the imperfect models was very different. In these supermodel experiments, however, a perfect model class scenario was assumed, in which the ground truth and imperfect models belong to the same model class and only differ in parameter setting. In this paper, we consider the imperfect model class scenario, in which the ground truth model class is more complex than the model class of imperfect models due to unresolved scales. We perform two supermodel experiments in two toy problems. The first one consists of a chaotically driven Lorenz 63 oscillator ground truth and two Lorenz 63 oscillators with constant forcings as imperfect models. The second one is more realistic and consists of a global atmosphere model as ground truth and imperfect models that have perturbed parameters and reduced spatial resolution. In both problems, we find that supermodel optimization with respect to short-term prediction error can lead to a long-term climatological behavior that is worse than that of the imperfect models. However, we also show that attractor learning can remedy this problem, leading to supermodels with long-term behavior superior to the imperfect models.
Temporal learning in the cerebellum: The microcircuit model

NASA Technical Reports Server (NTRS)

Miles, Coe F.; Rogers, David

1990-01-01

The cerebellum is that part of the brain which coordinates motor reflex behavior. To perform effectively, it must learn to generate specific motor commands at the proper times. We propose a fundamental circuit, called the MicroCircuit, which is the minimal ensemble of neurons both necessary and sufficient to learn timing. We describe how learning takes place in the MicroCircuit, which then explains the global behavior of the cerebellum as coordinated MicroCircuit behavior.
An ensemble boosting model for predicting transfer to the pediatric intensive care unit.

PubMed

Rubin, Jonathan; Potes, Cristhian; Xu-Wilson, Minnan; Dong, Junzi; Rahman, Asif; Nguyen, Hiep; Moromisato, David

2018-04-01

Early deterioration indicators have the potential to alert hospital care staff in advance of adverse events, such as patients requiring an increased level of care, or the need for rapid response teams to be called. Our work focuses on the problem of predicting the transfer of pediatric patients from the general ward of a hospital to the pediatric intensive care unit. The development of a data-driven pediatric early deterioration indicator for use by clinicians with the purpose of predicting encounters where transfer from the general ward to the PICU is likely. Using data collected over 5.5 years from the electronic health records of two medical facilities, we develop machine learning classifiers based on adaptive boosting and gradient tree boosting. We further combine these learned classifiers into an ensemble model and compare its performance to a modified pediatric early warning score (PEWS) baseline that relies on expert defined guidelines. To gauge model generalizability, we perform an inter-facility evaluation where we train our algorithm on data from one facility and perform evaluation on a hidden test dataset from a separate facility. We show that improvements are witnessed over the modified PEWS baseline in accuracy (0.77 vs. 0.69), sensitivity (0.80 vs. 0.68), specificity (0.74 vs. 0.70) and AUROC (0.85 vs. 0.73). Data-driven, machine learning algorithms can improve PICU transfer prediction accuracy compared to expertly defined systems, such as a modified PEWS, but care must be taken in the training of such approaches to avoid inadvertently introducing bias into the outcomes of these systems. Copyright © 2018 Elsevier B.V. All rights reserved.
Ideas for a pattern-oriented approach towards a VERA analysis ensemble

NASA Astrophysics Data System (ADS)

Gorgas, T.; Dorninger, M.

2010-09-01

Ideas for a pattern-oriented approach towards a VERA analysis ensemble For many applications in meteorology and especially for verification purposes it is important to have some information about the uncertainties of observation and analysis data. A high quality of these "reference data" is an absolute necessity as the uncertainties are reflected in verification measures. The VERA (Vienna Enhanced Resolution Analysis) scheme includes a sophisticated quality control tool which accounts for the correction of observational data and provides an estimation of the observation uncertainty. It is crucial for meteorologically and physically reliable analysis fields. VERA is based on a variational principle and does not need any first guess fields. It is therefore NWP model independent and can also be used as an unbiased reference for real time model verification. For downscaling purposes VERA uses an a priori knowledge on small-scale physical processes over complex terrain, the so called "fingerprint technique", which transfers information from rich to data sparse regions. The enhanced Joint D-PHASE and COPS data set forms the data base for the analysis ensemble study. For the WWRP projects D-PHASE and COPS a joint activity has been started to collect GTS and non-GTS data from the national and regional meteorological services in Central Europe for 2007. Data from more than 11.000 stations are available for high resolution analyses. The usage of random numbers as perturbations for ensemble experiments is a common approach in meteorology. In most implementations, like for NWP-model ensemble systems, the focus lies on error growth and propagation on the spatial and temporal scale. When defining errors in analysis fields we have to consider the fact that analyses are not time dependent and that no perturbation method aimed at temporal evolution is possible. Further, the method applied should respect two major sources of analysis errors: Observation errors AND analysis or interpolation errors. With the concept of an analysis ensemble we hope to get a more detailed sight on both sources of analysis errors. For the computation of the VERA ensemble members a sample of Gaussian random perturbations is produced for each station and parameter. The deviation of perturbations is based on the correction proposals by the VERA QC scheme to provide some "natural" limits for the ensemble. In order to put more emphasis on the weather situation we aim to integrate the main synoptic field structures as weighting factors for the perturbations. Two widely approved approaches are used for the definition of these main field structures: The Principal Component Analysis and a 2D-Discrete Wavelet Transform. The results of tests concerning the implementation of this pattern-supported analysis ensemble system and a comparison of the different approaches are given in the presentation.
A shared neural ensemble links distinct contextual memories encoded close in time

NASA Astrophysics Data System (ADS)

Cai, Denise J.; Aharoni, Daniel; Shuman, Tristan; Shobe, Justin; Biane, Jeremy; Song, Weilin; Wei, Brandon; Veshkini, Michael; La-Vu, Mimi; Lou, Jerry; Flores, Sergio E.; Kim, Isaac; Sano, Yoshitake; Zhou, Miou; Baumgaertel, Karsten; Lavi, Ayal; Kamata, Masakazu; Tuszynski, Mark; Mayford, Mark; Golshani, Peyman; Silva, Alcino J.

2016-06-01

Recent studies suggest that a shared neural ensemble may link distinct memories encoded close in time. According to the memory allocation hypothesis, learning triggers a temporary increase in neuronal excitability that biases the representation of a subsequent memory to the neuronal ensemble encoding the first memory, such that recall of one memory increases the likelihood of recalling the other memory. Here we show in mice that the overlap between the hippocampal CA1 ensembles activated by two distinct contexts acquired within a day is higher than when they are separated by a week. Several findings indicate that this overlap of neuronal ensembles links two contextual memories. First, fear paired with one context is transferred to a neutral context when the two contexts are acquired within a day but not across a week. Second, the first memory strengthens the second memory within a day but not across a week. Older mice, known to have lower CA1 excitability, do not show the overlap between ensembles, the transfer of fear between contexts, or the strengthening of the second memory. Finally, in aged mice, increasing cellular excitability and activating a common ensemble of CA1 neurons during two distinct context exposures rescued the deficit in linking memories. Taken together, these findings demonstrate that contextual memories encoded close in time are linked by directing storage into overlapping ensembles. Alteration of these processes by ageing could affect the temporal structure of memories, thus impairing efficient recall of related information.
Conservation of Mass and Preservation of Positivity with Ensemble-Type Kalman Filter Algorithms

NASA Technical Reports Server (NTRS)

Janjic, Tijana; Mclaughlin, Dennis; Cohn, Stephen E.; Verlaan, Martin

2014-01-01

This paper considers the incorporation of constraints to enforce physically based conservation laws in the ensemble Kalman filter. In particular, constraints are used to ensure that the ensemble members and the ensemble mean conserve mass and remain nonnegative through measurement updates. In certain situations filtering algorithms such as the ensemble Kalman filter (EnKF) and ensemble transform Kalman filter (ETKF) yield updated ensembles that conserve mass but are negative, even though the actual states must be nonnegative. In such situations if negative values are set to zero, or a log transform is introduced, the total mass will not be conserved. In this study, mass and positivity are both preserved by formulating the filter update as a set of quadratic programming problems that incorporate non-negativity constraints. Simple numerical experiments indicate that this approach can have a significant positive impact on the posterior ensemble distribution, giving results that are more physically plausible both for individual ensemble members and for the ensemble mean. In two examples, an update that includes a non-negativity constraint is able to properly describe the transport of a sharp feature (e.g., a triangle or cone). A number of implementation questions still need to be addressed, particularly the need to develop a computationally efficient quadratic programming update for large ensemble.
Ensemble models of proteins and protein domains based on distance distribution restraints.

PubMed

Jeschke, Gunnar

2016-04-01

Conformational ensembles of intrinsically disordered peptide chains are not fully determined by experimental observations. Uncertainty due to lack of experimental restraints and due to intrinsic disorder can be distinguished if distance distributions restraints are available. Such restraints can be obtained from pulsed dipolar electron paramagnetic resonance (EPR) spectroscopy applied to pairs of spin labels. Here, we introduce a Monte Carlo approach for generating conformational ensembles that are consistent with a set of distance distribution restraints, backbone dihedral angle statistics in known protein structures, and optionally, secondary structure propensities or membrane immersion depths. The approach is tested with simulated restraints for a terminal and an internal loop and for a protein with 69 residues by using sets of sparse restraints for underlying well-defined conformations and for published ensembles of a premolten globule-like and a coil-like intrinsically disordered protein. © 2016 Wiley Periodicals, Inc.
Novel layered clustering-based approach for generating ensemble of classifiers.

PubMed

Rahman, Ashfaqur; Verma, Brijesh

2011-05-01

This paper introduces a novel concept for creating an ensemble of classifiers. The concept is based on generating an ensemble of classifiers through clustering of data at multiple layers. The ensemble classifier model generates a set of alternative clustering of a dataset at different layers by randomly initializing the clustering parameters and trains a set of base classifiers on the patterns at different clusters in different layers. A test pattern is classified by first finding the appropriate cluster at each layer and then using the corresponding base classifier. The decisions obtained at different layers are fused into a final verdict using majority voting. As the base classifiers are trained on overlapping patterns at different layers, the proposed approach achieves diversity among the individual classifiers. Identification of difficult-to-classify patterns through clustering as well as achievement of diversity through layering leads to better classification results as evidenced from the experimental results.
Synchronization Experiments With A Global Coupled Model of Intermediate Complexity

NASA Astrophysics Data System (ADS)

Selten, Frank; Hiemstra, Paul; Shen, Mao-Lin

2013-04-01

In the super modeling approach an ensemble of imperfect models are connected through nudging terms that nudge the solution of each model to the solution of all other models in the ensemble. The goal is to obtain a synchronized state through a proper choice of connection strengths that closely tracks the trajectory of the true system. For the super modeling approach to be successful, the connections should be dense and strong enough for synchronization to occur. In this study we analyze the behavior of an ensemble of connected global atmosphere-ocean models of intermediate complexity. All atmosphere models are connected to the same ocean model through the surface fluxes of heat, water and momentum, the ocean is integrated using weighted averaged surface fluxes. In particular we analyze the degree of synchronization between the atmosphere models and the characteristics of the ensemble mean solution. The results are interpreted using a low order atmosphere-ocean toy model.
Earthquake prediction in California using regression algorithms and cloud-based big data infrastructure

NASA Astrophysics Data System (ADS)

Asencio-Cortés, G.; Morales-Esteban, A.; Shang, X.; Martínez-Álvarez, F.

2018-06-01

Earthquake magnitude prediction is a challenging problem that has been widely studied during the last decades. Statistical, geophysical and machine learning approaches can be found in literature, with no particularly satisfactory results. In recent years, powerful computational techniques to analyze big data have emerged, making possible the analysis of massive datasets. These new methods make use of physical resources like cloud based architectures. California is known for being one of the regions with highest seismic activity in the world and many data are available. In this work, the use of several regression algorithms combined with ensemble learning is explored in the context of big data (1 GB catalog is used), in order to predict earthquakes magnitude within the next seven days. Apache Spark framework, H2 O library in R language and Amazon cloud infrastructure were been used, reporting very promising results.
Behavioral and Neurophysiological Study of Olfactory Perception and Learning in Honeybees

PubMed Central

Sandoz, Jean Christophe

2011-01-01

The honeybee Apis mellifera has been a central insect model in the study of olfactory perception and learning for more than a century, starting with pioneer work by Karl von Frisch. Research on olfaction in honeybees has greatly benefited from the advent of a range of behavioral and neurophysiological paradigms in the Lab. Here I review major findings about how the honeybee brain detects, processes, and learns odors, based on behavioral, neuroanatomical, and neurophysiological approaches. I first address the behavioral study of olfactory learning, from experiments on free-flying workers visiting artificial flowers to laboratory-based conditioning protocols on restrained individuals. I explain how the study of olfactory learning has allowed understanding the discrimination and generalization ability of the honeybee olfactory system, its capacity to grant special properties to olfactory mixtures as well as to retain individual component information. Next, based on the impressive amount of anatomical and immunochemical studies of the bee brain, I detail our knowledge of olfactory pathways. I then show how functional recordings of odor-evoked activity in the brain allow following the transformation of the olfactory message from the periphery until higher-order central structures. Data from extra- and intracellular electrophysiological approaches as well as from the most recent optical imaging developments are described. Lastly, I discuss results addressing how odor representation changes as a result of experience. This impressive ensemble of behavioral, neuroanatomical, and neurophysiological data available in the bee make it an attractive model for future research aiming to understand olfactory perception and learning in an integrative fashion. PMID:22163215
The application of machine learning in multi sensor data fusion for activity recognition in mobile device space

NASA Astrophysics Data System (ADS)

Marhoubi, Asmaa H.; Saravi, Sara; Edirisinghe, Eran A.

2015-05-01

The present generation of mobile handheld devices comes equipped with a large number of sensors. The key sensors include the Ambient Light Sensor, Proximity Sensor, Gyroscope, Compass and the Accelerometer. Many mobile applications are driven based on the readings obtained from either one or two of these sensors. However the presence of multiple-sensors will enable the determination of more detailed activities that are carried out by the user of a mobile device, thus enabling smarter mobile applications to be developed that responds more appropriately to user behavior and device usage. In the proposed research we use recent advances in machine learning to fuse together the data obtained from all key sensors of a mobile device. We investigate the possible use of single and ensemble classifier based approaches to identify a mobile device's behavior in the space it is present. Feature selection algorithms are used to remove non-discriminant features that often lead to poor classifier performance. As the sensor readings are noisy and include a significant proportion of missing values and outliers, we use machine learning based approaches to clean the raw data obtained from the sensors, before use. Based on selected practical case studies, we demonstrate the ability to accurately recognize device behavior based on multi-sensor data fusion.
An Integrated Scenario Ensemble-Based Framework for Hurricane Evacuation Modeling: Part 2-Hazard Modeling.

PubMed

Blanton, Brian; Dresback, Kendra; Colle, Brian; Kolar, Randy; Vergara, Humberto; Hong, Yang; Leonardo, Nicholas; Davidson, Rachel; Nozick, Linda; Wachtendorf, Tricia

2018-04-25

Hurricane track and intensity can change rapidly in unexpected ways, thus making predictions of hurricanes and related hazards uncertain. This inherent uncertainty often translates into suboptimal decision-making outcomes, such as unnecessary evacuation. Representing this uncertainty is thus critical in evacuation planning and related activities. We describe a physics-based hazard modeling approach that (1) dynamically accounts for the physical interactions among hazard components and (2) captures hurricane evolution uncertainty using an ensemble method. This loosely coupled model system provides a framework for probabilistic water inundation and wind speed levels for a new, risk-based approach to evacuation modeling, described in a companion article in this issue. It combines the Weather Research and Forecasting (WRF) meteorological model, the Coupled Routing and Excess STorage (CREST) hydrologic model, and the ADvanced CIRCulation (ADCIRC) storm surge, tide, and wind-wave model to compute inundation levels and wind speeds for an ensemble of hurricane predictions. Perturbations to WRF's initial and boundary conditions and different model physics/parameterizations generate an ensemble of storm solutions, which are then used to drive the coupled hydrologic + hydrodynamic models. Hurricane Isabel (2003) is used as a case study to illustrate the ensemble-based approach. The inundation, river runoff, and wind hazard results are strongly dependent on the accuracy of the mesoscale meteorological simulations, which improves with decreasing lead time to hurricane landfall. The ensemble envelope brackets the observed behavior while providing "best-case" and "worst-case" scenarios for the subsequent risk-based evacuation model. © 2018 Society for Risk Analysis.
Unveiling Inherent Degeneracies in Determining Population-weighted Ensembles of Inter-domain Orientational Distributions Using NMR Residual Dipolar Couplings: Application to RNA Helix Junction Helix Motifs

PubMed Central

Yang, Shan; Al-Hashimi, Hashim M.

2016-01-01

A growing number of studies employ time-averaged experimental data to determine dynamic ensembles of biomolecules. While it is well known that different ensembles can satisfy experimental data to within error, the extent and nature of these degeneracies, and their impact on the accuracy of the ensemble determination remains poorly understood. Here, we use simulations and a recently introduced metric for assessing ensemble similarity to explore degeneracies in determining ensembles using NMR residual dipolar couplings (RDCs) with specific application to A-form helices in RNA. Various target ensembles were constructed representing different domain-domain orientational distributions that are confined to a topologically restricted (<10%) conformational space. Five independent sets of ensemble averaged RDCs were then computed for each target ensemble and a ‘sample and select’ scheme used to identify degenerate ensembles that satisfy RDCs to within experimental uncertainty. We find that ensembles with different ensemble sizes and that can differ significantly from the target ensemble (by as much as ΣΩ ~ 0.4 where ΣΩ varies between 0 and 1 for maximum and minimum ensemble similarity, respectively) can satisfy the ensemble averaged RDCs. These deviations increase with the number of unique conformers and breadth of the target distribution, and result in significant uncertainty in determining conformational entropy (as large as 5 kcal/mol at T = 298 K). Nevertheless, the RDC-degenerate ensembles are biased towards populated regions of the target ensemble, and capture other essential features of the distribution, including the shape. Our results identify ensemble size as a major source of uncertainty in determining ensembles and suggest that NMR interactions such as RDCs and spin relaxation, on their own, do not carry the necessary information needed to determine conformational entropy at a useful level of precision. The framework introduced here provides a general approach for exploring degeneracies in ensemble determination for different types of experimental data. PMID:26131693
SVM and SVM Ensembles in Breast Cancer Prediction.

PubMed

Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong

2017-01-01

Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.
SVM and SVM Ensembles in Breast Cancer Prediction

PubMed Central

Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong

2017-01-01

Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers. PMID:28060807
Combining the ensemble and Franck-Condon approaches for calculating spectral shapes of molecules in solution

NASA Astrophysics Data System (ADS)

Zuehlsdorff, T. J.; Isborn, C. M.

2018-01-01

The correct treatment of vibronic effects is vital for the modeling of absorption spectra of many solvated dyes. Vibronic spectra for small dyes in solution can be easily computed within the Franck-Condon approximation using an implicit solvent model. However, implicit solvent models neglect specific solute-solvent interactions on the electronic excited state. On the other hand, a straightforward way to account for solute-solvent interactions and temperature-dependent broadening is by computing vertical excitation energies obtained from an ensemble of solute-solvent conformations. Ensemble approaches usually do not account for vibronic transitions and thus often produce spectral shapes in poor agreement with experiment. We address these shortcomings by combining zero-temperature vibronic fine structure with vertical excitations computed for a room-temperature ensemble of solute-solvent configurations. In this combined approach, all temperature-dependent broadening is treated classically through the sampling of configurations and quantum mechanical vibronic contributions are included as a zero-temperature correction to each vertical transition. In our calculation of the vertical excitations, significant regions of the solvent environment are treated fully quantum mechanically to account for solute-solvent polarization and charge-transfer. For the Franck-Condon calculations, a small amount of frozen explicit solvent is considered in order to capture solvent effects on the vibronic shape function. We test the proposed method by comparing calculated and experimental absorption spectra of Nile red and the green fluorescent protein chromophore in polar and non-polar solvents. For systems with strong solute-solvent interactions, the combined approach yields significant improvements over the ensemble approach. For systems with weak to moderate solute-solvent interactions, both the high-energy vibronic tail and the width of the spectra are in excellent agreement with experiments.

Initial assessment of a multi-model approach to spring flood forecasting in Sweden

NASA Astrophysics Data System (ADS)

Olsson, J.; Uvo, C. B.; Foster, K.; Yang, W.

2015-06-01

Hydropower is a major energy source in Sweden and proper reservoir management prior to the spring flood onset is crucial for optimal production. This requires useful forecasts of the accumulated discharge in the spring flood period (i.e. the spring-flood volume, SFV). Today's SFV forecasts are generated using a model-based climatological ensemble approach, where time series of precipitation and temperature from historical years are used to force a calibrated and initialised set-up of the HBV model. In this study, a number of new approaches to spring flood forecasting, that reflect the latest developments with respect to analysis and modelling on seasonal time scales, are presented and evaluated. Three main approaches, represented by specific methods, are evaluated in SFV hindcasts for three main Swedish rivers over a 10-year period with lead times between 0 and 4 months. In the first approach, historically analogue years with respect to the climate in the period preceding the spring flood are identified and used to compose a reduced ensemble. In the second, seasonal meteorological ensemble forecasts are used to drive the HBV model over the spring flood period. In the third approach, statistical relationships between SFV and the large-sale atmospheric circulation are used to build forecast models. None of the new approaches consistently outperform the climatological ensemble approach, but for specific locations and lead times improvements of 20-30 % are found. When combining all forecasts in a weighted multi-model approach, a mean improvement over all locations and lead times of nearly 10 % was indicated. This demonstrates the potential of the approach and further development and optimisation into an operational system is ongoing.
Computational identification of binding energy hot spots in protein-RNA complexes using an ensemble approach.

PubMed

Pan, Yuliang; Wang, Zixiang; Zhan, Weihua; Deng, Lei

2018-05-01

Identifying RNA-binding residues, especially energetically favored hot spots, can provide valuable clues for understanding the mechanisms and functional importance of protein-RNA interactions. Yet, limited availability of experimentally recognized energy hot spots in protein-RNA crystal structures leads to the difficulties in developing empirical identification approaches. Computational prediction of RNA-binding hot spot residues is still in its infant stage. Here, we describe a computational method, PrabHot (Prediction of protein-RNA binding hot spots), that can effectively detect hot spot residues on protein-RNA binding interfaces using an ensemble of conceptually different machine learning classifiers. Residue interaction network features and new solvent exposure characteristics are combined together and selected for classification with the Boruta algorithm. In particular, two new reference datasets (benchmark and independent) have been generated containing 107 hot spots from 47 known protein-RNA complex structures. In 10-fold cross-validation on the training dataset, PrabHot achieves promising performances with an AUC score of 0.86 and a sensitivity of 0.78, which are significantly better than that of the pioneer RNA-binding hot spot prediction method HotSPRing. We also demonstrate the capability of our proposed method on the independent test dataset and gain a competitive advantage as a result. The PrabHot webserver is freely available at http://denglab.org/PrabHot/. leideng@csu.edu.cn. Supplementary data are available at Bioinformatics online.
Classifying Imbalanced Data Streams via Dynamic Feature Group Weighting with Importance Sampling.

PubMed

Wu, Ke; Edwards, Andrea; Fan, Wei; Gao, Jing; Zhang, Kun

2014-04-01

Data stream classification and imbalanced data learning are two important areas of data mining research. Each has been well studied to date with many interesting algorithms developed. However, only a few approaches reported in literature address the intersection of these two fields due to their complex interplay. In this work, we proposed an importance sampling driven, dynamic feature group weighting framework (DFGW-IS) for classifying data streams of imbalanced distribution. Two components are tightly incorporated into the proposed approach to address the intrinsic characteristics of concept-drifting, imbalanced streaming data. Specifically, the ever-evolving concepts are tackled by a weighted ensemble trained on a set of feature groups with each sub-classifier (i.e. a single classifier or an ensemble) weighed by its discriminative power and stable level. The un-even class distribution, on the other hand, is typically battled by the sub-classifier built in a specific feature group with the underlying distribution rebalanced by the importance sampling technique. We derived the theoretical upper bound for the generalization error of the proposed algorithm. We also studied the empirical performance of our method on a set of benchmark synthetic and real world data, and significant improvement has been achieved over the competing algorithms in terms of standard evaluation metrics and parallel running time. Algorithm implementations and datasets are available upon request.
Lessons learned and way forward from 6 years of Aerosol_cci

NASA Astrophysics Data System (ADS)

Popp, Thomas; de Leeuw, Gerrit; Pinnock, Simon

2017-04-01

Within the ESA Climate Change Initiative (CCI) Aerosol_cci (2010 - 2017) conducts intensive work to improve and qualify algorithms for the retrieval of aerosol information from European sensors. Meanwhile, several validated (multi-) decadal time series of different aerosol parameters from complementary sensors are available: Aerosol Optical Depth (AOD), stratospheric extinction profiles, a qualitative Absorbing Aerosol Index (AAI), fine mode AOD, mineral dust AOD; absorption information and aerosol layer height are in an evaluation phase and the multi-pixel GRASP algorithm for the POLDER instrument is used for selected regions. Validation (vs. AERONET, MAN) and inter-comparison to other satellite datasets (MODIS, MISR, SeaWIFS) proved the high quality of the available datasets comparable to other satellite retrievals and revealed needs for algorithm improvement (for example for higher AOD values) which were taken into account in an iterative evolution cycle. The datasets contain pixel level uncertainty estimates which were also validated and improved in the reprocessing. The use of an ensemble method was tested, where several algorithms are applied to the same sensor. The presentation will summarize and discuss the lessons learned from the 6 years of intensive collaboration and highlight major achievements (significantly improved AOD quality, fine mode AOD, dust AOD, pixel level uncertainties, ensemble approach); also limitations and remaining deficits shall be discussed. An outlook will discuss the way forward for the continuous algorithm improvement and re-processing together with opportunities for time series extension with successor instruments of the Sentinel family and the complementarity of the different satellite aerosol products.
A Global Carbon Assimilation System using a modified EnKF assimilation method

NASA Astrophysics Data System (ADS)

Zhang, S.; Zheng, X.; Chen, Z.; Dan, B.; Chen, J. M.; Yi, X.; Wang, L.; Wu, G.

2014-10-01

A Global Carbon Assimilation System based on Ensemble Kalman filter (GCAS-EK) is developed for assimilating atmospheric CO2 abundance data into an ecosystem model to simultaneously estimate the surface carbon fluxes and atmospheric CO2 distribution. This assimilation approach is based on the ensemble Kalman filter (EnKF), but with several new developments, including using analysis states to iteratively estimate ensemble forecast errors, and a maximum likelihood estimation of the inflation factors of the forecast and observation errors. The proposed assimilation approach is tested in observing system simulation experiments and then used to estimate the terrestrial ecosystem carbon fluxes and atmospheric CO2 distributions from 2002 to 2008. The results showed that this assimilation approach can effectively reduce the biases and uncertainties of the carbon fluxes simulated by the ecosystem model.
Big Data-Based Approach to Detect, Locate, and Enhance the Stability of an Unplanned Microgrid Islanding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jiang, Huaiguang; Li, Yan; Zhang, Yingchen

In this paper, a big data-based approach is proposed for the security improvement of an unplanned microgrid islanding (UMI). The proposed approach contains two major steps: the first step is big data analysis of wide-area monitoring to detect a UMI and locate it; the second step is particle swarm optimization (PSO)-based stability enhancement for the UMI. First, an optimal synchrophasor measurement device selection (OSMDS) and matching pursuit decomposition (MPD)-based spatial-temporal analysis approach is proposed to significantly reduce the volume of data while keeping appropriate information from the synchrophasor measurements. Second, a random forest-based ensemble learning approach is trained to detectmore » the UMI. When combined with grid topology, the UMI can be located. Then the stability problem of the UMI is formulated as an optimization problem and the PSO is used to find the optimal operational parameters of the UMI. An eigenvalue-based multiobjective function is proposed, which aims to improve the damping and dynamic characteristics of the UMI. Finally, the simulation results demonstrate the effectiveness and robustness of the proposed approach.« less
Deep multi-spectral ensemble learning for electronic cleansing in dual-energy CT colonography

NASA Astrophysics Data System (ADS)

Tachibana, Rie; Näppi, Janne J.; Hironaka, Toru; Kim, Se Hyung; Yoshida, Hiroyuki

2017-03-01

We developed a novel electronic cleansing (EC) method for dual-energy CT colonography (DE-CTC) based on an ensemble deep convolution neural network (DCNN) and multi-spectral multi-slice image patches. In the method, an ensemble DCNN is used to classify each voxel of a DE-CTC image volume into five classes: luminal air, soft tissue, tagged fecal materials, and partial-volume boundaries between air and tagging and those between soft tissue and tagging. Each DCNN acts as a voxel classifier, where an input image patch centered at the voxel is generated as input to the DCNNs. An image patch has three channels that are mapped from a region-of-interest containing the image plane of the voxel and the two adjacent image planes. Six different types of spectral input image datasets were derived using two dual-energy CT images, two virtual monochromatic images, and two material images. An ensemble DCNN was constructed by use of a meta-classifier that combines the output of multiple DCNNs, each of which was trained with a different type of multi-spectral image patches. The electronically cleansed CTC images were calculated by removal of regions classified as other than soft tissue, followed by a colon surface reconstruction. For pilot evaluation, 359 volumes of interest (VOIs) representing sources of subtraction artifacts observed in current EC schemes were sampled from 30 clinical CTC cases. Preliminary results showed that the ensemble DCNN can yield high accuracy in labeling of the VOIs, indicating that deep learning of multi-spectral EC with multi-slice imaging could accurately remove residual fecal materials from CTC images without generating major EC artifacts.
Increased Sparsity of Hippocampal CA1 Neuronal Ensembles in a Mouse Model of Down Syndrome Assayed by Arc Expression

PubMed Central

Smith-Hicks, Constance L.; Cai, Peiling; Savonenko, Alena V.; Reeves, Roger H.; Worley, Paul F.

2017-01-01

Down syndrome (DS) is the leading chromosomal cause of intellectual disability, yet the neural substrates of learning and memory deficits remain poorly understood. Here, we interrogate neural networks linked to learning and memory in a well-characterized model of DS, the Ts65Dn mouse. We report that Ts65Dn mice exhibit exploratory behavior that is not different from littermate wild-type (WT) controls yet behavioral activation of Arc mRNA transcription in pyramidal neurons of the CA1 region of the hippocampus is altered in Ts65Dn mice. In WT mice, a 5 min period of exploration of a novel environment resulted in Arc mRNA transcription in 39% of CA1 neurons. By contrast, the same period of exploration resulted in only ~20% of CA1 neurons transcribing Arc mRNA in Ts65Dn mice indicating increased sparsity of the behaviorally induced ensemble. Like WT mice the CA1 pyramidal neurons of Ts65Dn mice reactivated Arc transcription during a second exposure to the same environment 20 min after the first experience, but the size of the reactivated ensemble was only ~60% of that in WT mice. After repeated daily exposures there was a further decline in the size of the reactivated ensemble in Ts65Dn and a disruption of reactivation. Together these data demonstrate reduction in the size of the behaviorally induced network that expresses Arc in Ts65Dn mice and disruption of the long-term stability of the ensemble. We propose that these deficits in network formation and stability contribute to cognitive symptoms in DS. PMID:28217086
Representation of photon limited data in emission tomography using origin ensembles

NASA Astrophysics Data System (ADS)

Sitek, A.

2008-06-01

Representation and reconstruction of data obtained by emission tomography scanners are challenging due to high noise levels in the data. Typically, images obtained using tomographic measurements are represented using grids. In this work, we define images as sets of origins of events detected during tomographic measurements; we call these origin ensembles (OEs). A state in the ensemble is characterized by a vector of 3N parameters Y, where the parameters are the coordinates of origins of detected events in a three-dimensional space and N is the number of detected events. The 3N-dimensional probability density function (PDF) for that ensemble is derived, and we present an algorithm for OE image estimation from tomographic measurements. A displayable image (e.g. grid based image) is derived from the OE formulation by calculating ensemble expectations based on the PDF using the Markov chain Monte Carlo method. The approach was applied to computer-simulated 3D list-mode positron emission tomography data. The reconstruction errors for a 10 000 000 event acquisition for simulated ranged from 0.1 to 34.8%, depending on object size and sampling density. The method was also applied to experimental data and the results of the OE method were consistent with those obtained by a standard maximum-likelihood approach. The method is a new approach to representation and reconstruction of data obtained by photon-limited emission tomography measurements.
Desynchronization in an ensemble of globally coupled chaotic bursting neuronal oscillators by dynamic delayed feedback control

NASA Astrophysics Data System (ADS)

Che, Yanqiu; Yang, Tingting; Li, Ruixue; Li, Huiyan; Han, Chunxiao; Wang, Jiang; Wei, Xile

2015-09-01

In this paper, we propose a dynamic delayed feedback control approach or desynchronization of chaotic-bursting synchronous activities in an ensemble of globally coupled neuronal oscillators. We demonstrate that the difference signal between an ensemble's mean field and its time delayed state, filtered and fed back to the ensemble, can suppress the self-synchronization in the ensemble. These individual units are decoupled and stabilized at the desired desynchronized states while the stimulation signal reduces to the noise level. The effectiveness of the method is illustrated by examples of two different populations of globally coupled chaotic-bursting neurons. The proposed method has potential for mild, effective and demand-controlled therapy of neurological diseases characterized by pathological synchronization.
Feasibility of Using an Augmented Immersive Virtual Reality Learning Environment to Enhance Music Conducting Skills

ERIC Educational Resources Information Center

Orman, Evelyn K.; Price, Harry E.; Russell, Christine R.

2017-01-01

Acquiring nonverbal skills necessary to appropriately communicate and educate members of performing ensembles is essential for wind band conductors. Virtual reality learning environments (VRLEs) provide a unique setting for developing these proficiencies. For this feasibility study, we used an augmented immersive VRLE to enhance eye contact, torso…
The Violet Experience: Social Interaction through Eclectic Music Learning Practices

ERIC Educational Resources Information Center

Dakon, Jacob M.; Cloete, Elene

2018-01-01

In this qualitative case study, we used participant observation and interviews to examine Violet, a Flemish string youth orchestra. In doing so, we identify the qualities that constitute an 'eclectic' ensemble space, herein defined as a musical environment that uses a blend of informal and formal learning practices. Moreover, we emphasize how…
Impaired hippocampal place cell dynamics in a mouse model of the 22q11.2 deletion

PubMed Central

Zaremba, Jeffrey D; Diamantopoulou, Anastasia; Danielson, Nathan B; Grosmark, Andres D; Kaifosh, Patrick W; Bowler, John C; Liao, Zhenrui; Sparks, Fraser T; Gogos, Joseph A; Losonczy, Attila

2018-01-01

Hippocampal place cells represent the cellular substrate of episodic memory. Place cell ensembles reorganize to support learning but must also maintain stable representations to facilitate memory recall. Despite extensive research, the learning-related role of place cell dynamics in health and disease remains elusive. Using chronic two-photon Ca2+ imaging in hippocampal area CA1 of wild-type and Df(16)A+/− mice, an animal model of 22q11.2 deletion syndrome, one of the most common genetic risk factors for cognitive dysfunction and schizophrenia, we found that goal-oriented learning in wild-type mice was supported by stable spatial maps and robust remapping of place fields toward the goal location. Df(16)A+/− mice showed a significant learning deficit accompanied by reduced spatial map stability and the absence of goal-directed place cell reorganization. These results expand our understanding of the hippocampal ensemble dynamics supporting cognitive flexibility and demonstrate their importance in a model of 22q11.2-associated cognitive dysfunction. PMID:28869582
Determination of the conformational ensemble of the TAR RNA by X-ray scattering interferometry

PubMed Central

Walker, Peter

2017-01-01

Abstract The conformational ensembles of structured RNA's are crucial for biological function, but they remain difficult to elucidate experimentally. We demonstrate with HIV-1 TAR RNA that X-ray scattering interferometry (XSI) can be used to determine RNA conformational ensembles. X-ray scattering interferometry (XSI) is based on site-specifically labeling RNA with pairs of heavy atom probes, and precisely measuring the distribution of inter-probe distances that arise from a heterogeneous mixture of RNA solution structures. We show that the XSI-based model of the TAR RNA ensemble closely resembles an independent model derived from NMR-RDC data. Further, we show how the TAR RNA ensemble changes shape at different salt concentrations. Finally, we demonstrate that a single hybrid model of the TAR RNA ensemble simultaneously fits both the XSI and NMR-RDC data set and show that XSI can be combined with NMR-RDC to further improve the quality of the determined ensemble. The results suggest that XSI-RNA will be a powerful approach for characterizing the solution conformational ensembles of RNAs and RNA-protein complexes under diverse solution conditions. PMID:28108663
Entropy of network ensembles

NASA Astrophysics Data System (ADS)

Bianconi, Ginestra

2009-03-01

In this paper we generalize the concept of random networks to describe network ensembles with nontrivial features by a statistical mechanics approach. This framework is able to describe undirected and directed network ensembles as well as weighted network ensembles. These networks might have nontrivial community structure or, in the case of networks embedded in a given space, they might have a link probability with a nontrivial dependence on the distance between the nodes. These ensembles are characterized by their entropy, which evaluates the cardinality of networks in the ensemble. In particular, in this paper we define and evaluate the structural entropy, i.e., the entropy of the ensembles of undirected uncorrelated simple networks with given degree sequence. We stress the apparent paradox that scale-free degree distributions are characterized by having small structural entropy while they are so widely encountered in natural, social, and technological complex systems. We propose a solution to the paradox by proving that scale-free degree distributions are the most likely degree distribution with the corresponding value of the structural entropy. Finally, the general framework we present in this paper is able to describe microcanonical ensembles of networks as well as canonical or hidden-variable network ensembles with significant implications for the formulation of network-constructing algorithms.
Simulating ensembles of source water quality using a K-nearest neighbor resampling approach.

PubMed

Towler, Erin; Rajagopalan, Balaji; Seidel, Chad; Summers, R Scott

2009-03-01

Climatological, geological, and water management factors can cause significant variability in surface water quality. As drinking water quality standards become more stringent, the ability to quantify the variability of source water quality becomes more important for decision-making and planning in water treatment for regulatory compliance. However, paucity of long-term water quality data makes it challenging to apply traditional simulation techniques. To overcome this limitation, we have developed and applied a robust nonparametric K-nearest neighbor (K-nn) bootstrap approach utilizing the United States Environmental Protection Agency's Information Collection Rule (ICR) data. In this technique, first an appropriate "feature vector" is formed from the best available explanatory variables. The nearest neighbors to the feature vector are identified from the ICR data and are resampled using a weight function. Repetition of this results in water quality ensembles, and consequently the distribution and the quantification of the variability. The main strengths of the approach are its flexibility, simplicity, and the ability to use a large amount of spatial data with limited temporal extent to provide water quality ensembles for any given location. We demonstrate this approach by applying it to simulate monthly ensembles of total organic carbon for two utilities in the U.S. with very different watersheds and to alkalinity and bromide at two other U.S. utilities.
Adaptive correction of ensemble forecasts

NASA Astrophysics Data System (ADS)

Pelosi, Anna; Battista Chirico, Giovanni; Van den Bergh, Joris; Vannitsem, Stephane

2017-04-01

Forecasts from numerical weather prediction (NWP) models often suffer from both systematic and non-systematic errors. These are present in both deterministic and ensemble forecasts, and originate from various sources such as model error and subgrid variability. Statistical post-processing techniques can partly remove such errors, which is particularly important when NWP outputs concerning surface weather variables are employed for site specific applications. Many different post-processing techniques have been developed. For deterministic forecasts, adaptive methods such as the Kalman filter are often used, which sequentially post-process the forecasts by continuously updating the correction parameters as new ground observations become available. These methods are especially valuable when long training data sets do not exist. For ensemble forecasts, well-known techniques are ensemble model output statistics (EMOS), and so-called "member-by-member" approaches (MBM). Here, we introduce a new adaptive post-processing technique for ensemble predictions. The proposed method is a sequential Kalman filtering technique that fully exploits the information content of the ensemble. One correction equation is retrieved and applied to all members, however the parameters of the regression equations are retrieved by exploiting the second order statistics of the forecast ensemble. We compare our new method with two other techniques: a simple method that makes use of a running bias correction of the ensemble mean, and an MBM post-processing approach that rescales the ensemble mean and spread, based on minimization of the Continuous Ranked Probability Score (CRPS). We perform a verification study for the region of Campania in southern Italy. We use two years (2014-2015) of daily meteorological observations of 2-meter temperature and 10-meter wind speed from 18 ground-based automatic weather stations distributed across the region, comparing them with the corresponding COSMO-LEPS ensemble forecasts. Deterministic verification scores (e.g., mean absolute error, bias) and probabilistic scores (e.g., CRPS) are used to evaluate the post-processing techniques. We conclude that the new adaptive method outperforms the simpler running bias-correction. The proposed adaptive method often outperforms the MBM method in removing bias. The MBM method has the advantage of correcting the ensemble spread, although it needs more training data.
Responsive Teaching, Informal Learning and Cultural Tools in Year Nine Ensemble Practice: A Lost Opportunity

ERIC Educational Resources Information Center

Wallerstedt, Cecilia; Pramling, Niklas

2016-01-01

In the present study we investigate what problems adolescents in year-nine compulsory school face when trying to learn to play a song together, how they take on these, and how their teacher responds to these problems. The studied practice, where students are to form a band and learn to play a song together in music class, is an example of…
EFS: an ensemble feature selection tool implemented as R-package and web-application.

PubMed

Neumann, Ursula; Genze, Nikita; Heider, Dominik

2017-01-01

Feature selection methods aim at identifying a subset of features that improve the prediction performance of subsequent classification models and thereby also simplify their interpretability. Preceding studies demonstrated that single feature selection methods can have specific biases, whereas an ensemble feature selection has the advantage to alleviate and compensate for these biases. The software EFS (Ensemble Feature Selection) makes use of multiple feature selection methods and combines their normalized outputs to a quantitative ensemble importance. Currently, eight different feature selection methods have been integrated in EFS, which can be used separately or combined in an ensemble. EFS identifies relevant features while compensating specific biases of single methods due to an ensemble approach. Thereby, EFS can improve the prediction accuracy and interpretability in subsequent binary classification models. EFS can be downloaded as an R-package from CRAN or used via a web application at http://EFS.heiderlab.de.
Ensemble gene function prediction database reveals genes important for complex I formation in Arabidopsis thaliana.

PubMed

Hansen, Bjoern Oest; Meyer, Etienne H; Ferrari, Camilla; Vaid, Neha; Movahedi, Sara; Vandepoele, Klaas; Nikoloski, Zoran; Mutwil, Marek

2018-03-01

Recent advances in gene function prediction rely on ensemble approaches that integrate results from multiple inference methods to produce superior predictions. Yet, these developments remain largely unexplored in plants. We have explored and compared two methods to integrate 10 gene co-function networks for Arabidopsis thaliana and demonstrate how the integration of these networks produces more accurate gene function predictions for a larger fraction of genes with unknown function. These predictions were used to identify genes involved in mitochondrial complex I formation, and for five of them, we confirmed the predictions experimentally. The ensemble predictions are provided as a user-friendly online database, EnsembleNet. The methods presented here demonstrate that ensemble gene function prediction is a powerful method to boost prediction performance, whereas the EnsembleNet database provides a cutting-edge community tool to guide experimentalists. © 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.

The Road to DLCZ Protocol in Rubidium Ensemble

NASA Astrophysics Data System (ADS)

Li, Chang; Pu, Yunfei; Jiang, Nan; Chang, Wei; Zhang, Sheng; CenterQuantum Information, InstituteInterdisciplinary Information Sciences, Tsinghua Univ Team

2017-04-01

Quantum communication is the powerful approach achieving a fully secure information transferal. The DLCZ protocol ensures that photon linearly decays with transferring distance increasing, which improves the success potential and shortens the time to build up an entangled channel. Apart from that, it provides an advanced idea that building up a quantum internet based on different nodes connected to different sites and themselves. In our laboratory, three sets of laser-cooled Rubidium 87 ensemble have been built. Two of them serve as the single photon emitter, which generate the entanglement between ensemble and photon. What's more, crossed AODs are equipped to multiplex and demultiplex optical circuit so that ensemble is divided into 2 hundred of 2D sub-memory cells. And the third ensemble is used as quantum telecommunication, which converts 780nm photon into telecom-wavelength one. And we have been building double-MOT system, which provides more atoms in ensemble and larger optical density.
Multi-objective optimization for generating a weighted multi-model ensemble

NASA Astrophysics Data System (ADS)

Lee, H.

2017-12-01

Many studies have demonstrated that multi-model ensembles generally show better skill than each ensemble member. When generating weighted multi-model ensembles, the first step is measuring the performance of individual model simulations using observations. There is a consensus on the assignment of weighting factors based on a single evaluation metric. When considering only one evaluation metric, the weighting factor for each model is proportional to a performance score or inversely proportional to an error for the model. While this conventional approach can provide appropriate combinations of multiple models, the approach confronts a big challenge when there are multiple metrics under consideration. When considering multiple evaluation metrics, it is obvious that a simple averaging of multiple performance scores or model ranks does not address the trade-off problem between conflicting metrics. So far, there seems to be no best method to generate weighted multi-model ensembles based on multiple performance metrics. The current study applies the multi-objective optimization, a mathematical process that provides a set of optimal trade-off solutions based on a range of evaluation metrics, to combining multiple performance metrics for the global climate models and their dynamically downscaled regional climate simulations over North America and generating a weighted multi-model ensemble. NASA satellite data and the Regional Climate Model Evaluation System (RCMES) software toolkit are used for assessment of the climate simulations. Overall, the performance of each model differs markedly with strong seasonal dependence. Because of the considerable variability across the climate simulations, it is important to evaluate models systematically and make future projections by assigning optimized weighting factors to the models with relatively good performance. Our results indicate that the optimally weighted multi-model ensemble always shows better performance than an arithmetic ensemble mean and may provide reliable future projections.
Exploring the calibration of a wind forecast ensemble for energy applications

NASA Astrophysics Data System (ADS)

Heppelmann, Tobias; Ben Bouallegue, Zied; Theis, Susanne

2015-04-01

In the German research project EWeLiNE, Deutscher Wetterdienst (DWD) and Fraunhofer Institute for Wind Energy and Energy System Technology (IWES) are collaborating with three German Transmission System Operators (TSO) in order to provide the TSOs with improved probabilistic power forecasts. Probabilistic power forecasts are derived from probabilistic weather forecasts, themselves derived from ensemble prediction systems (EPS). Since the considered raw ensemble wind forecasts suffer from underdispersiveness and bias, calibration methods are developed for the correction of the model bias and the ensemble spread bias. The overall aim is to improve the ensemble forecasts such that the uncertainty of the possible weather deployment is depicted by the ensemble spread from the first forecast hours. Additionally, the ensemble members after calibration should remain physically consistent scenarios. We focus on probabilistic hourly wind forecasts with horizon of 21 h delivered by the convection permitting high-resolution ensemble system COSMO-DE-EPS which has become operational in 2012 at DWD. The ensemble consists of 20 ensemble members driven by four different global models. The model area includes whole Germany and parts of Central Europe with a horizontal resolution of 2.8 km and a vertical resolution of 50 model levels. For verification we use wind mast measurements around 100 m height that corresponds to the hub height of wind energy plants that belong to wind farms within the model area. Calibration of the ensemble forecasts can be performed by different statistical methods applied to the raw ensemble output. Here, we explore local bivariate Ensemble Model Output Statistics at individual sites and quantile regression with different predictors. Applying different methods, we already show an improvement of ensemble wind forecasts from COSMO-DE-EPS for energy applications. In addition, an ensemble copula coupling approach transfers the time-dependencies of the raw ensemble to the calibrated ensemble. The calibrated wind forecasts are evaluated first with univariate probabilistic scores and additionally with diagnostics of wind ramps in order to assess the time-consistency of the calibrated ensemble members.
Assessing an ensemble Kalman filter inference of Manning's n coefficient of an idealized tidal inlet against a polynomial chaos-based MCMC

NASA Astrophysics Data System (ADS)

Siripatana, Adil; Mayo, Talea; Sraj, Ihab; Knio, Omar; Dawson, Clint; Le Maitre, Olivier; Hoteit, Ibrahim

2017-08-01

Bayesian estimation/inversion is commonly used to quantify and reduce modeling uncertainties in coastal ocean model, especially in the framework of parameter estimation. Based on Bayes rule, the posterior probability distribution function (pdf) of the estimated quantities is obtained conditioned on available data. It can be computed either directly, using a Markov chain Monte Carlo (MCMC) approach, or by sequentially processing the data following a data assimilation approach, which is heavily exploited in large dimensional state estimation problems. The advantage of data assimilation schemes over MCMC-type methods arises from the ability to algorithmically accommodate a large number of uncertain quantities without significant increase in the computational requirements. However, only approximate estimates are generally obtained by this approach due to the restricted Gaussian prior and noise assumptions that are generally imposed in these methods. This contribution aims at evaluating the effectiveness of utilizing an ensemble Kalman-based data assimilation method for parameter estimation of a coastal ocean model against an MCMC polynomial chaos (PC)-based scheme. We focus on quantifying the uncertainties of a coastal ocean ADvanced CIRCulation (ADCIRC) model with respect to the Manning's n coefficients. Based on a realistic framework of observation system simulation experiments (OSSEs), we apply an ensemble Kalman filter and the MCMC method employing a surrogate of ADCIRC constructed by a non-intrusive PC expansion for evaluating the likelihood, and test both approaches under identical scenarios. We study the sensitivity of the estimated posteriors with respect to the parameters of the inference methods, including ensemble size, inflation factor, and PC order. A full analysis of both methods, in the context of coastal ocean model, suggests that an ensemble Kalman filter with appropriate ensemble size and well-tuned inflation provides reliable mean estimates and uncertainties of Manning's n coefficients compared to the full posterior distributions inferred by MCMC.
A real-time ocean reanalyses intercomparison project in the context of tropical pacific observing system and ENSO monitoring

NASA Astrophysics Data System (ADS)

Xue, Yan; Wen, C.; Kumar, A.; Balmaseda, M.; Fujii, Y.; Alves, O.; Martin, M.; Yang, X.; Vernieres, G.; Desportes, C.; Lee, T.; Ascione, I.; Gudgel, R.; Ishikawa, I.

2017-12-01

An ensemble of nine operational ocean reanalyses (ORAs) is now routinely collected, and is used to monitor the consistency across the tropical Pacific temperature analyses in real-time in support of ENSO monitoring, diagnostics, and prediction. The ensemble approach allows a more reliable estimate of the signal as well as an estimation of the noise among analyses. The real-time estimation of signal-to-noise ratio assists the prediction of ENSO. The ensemble approach also enables us to estimate the impact of the Tropical Pacific Observing System (TPOS) on the estimation of ENSO-related oceanic indicators. The ensemble mean is shown to have a better accuracy than individual ORAs, suggesting the ensemble approach is an effective tool to reduce uncertainties in temperature analysis for ENSO. The ensemble spread, as a measure of uncertainties in ORAs, is shown to be partially linked to the data counts of in situ observations. Despite the constraints by TPOS data, uncertainties in ORAs are still large in the northwestern tropical Pacific, in the SPCZ region, as well as in the central and northeastern tropical Pacific. The uncertainties in total temperature reduced significantly in 2015 due to the recovery of the TAO/TRITON array to approach the value before the TAO crisis in 2012. However, the uncertainties in anomalous temperature remained much higher than the pre-2012 value, probably due to uncertainties in the reference climatology. This highlights the importance of the long-term stability of the observing system for anomaly monitoring. The current data assimilation systems tend to constrain the solution very locally near the buoy sites, potentially damaging the larger-scale dynamical consistency. So there is an urgent need to improve data assimilation systems so that they can optimize the observation information from TPOS and contribute to improved ENSO prediction.
Conformational and functional analysis of molecular dynamics trajectories by Self-Organising Maps

PubMed Central

2011-01-01

Background Molecular dynamics (MD) simulations are powerful tools to investigate the conformational dynamics of proteins that is often a critical element of their function. Identification of functionally relevant conformations is generally done clustering the large ensemble of structures that are generated. Recently, Self-Organising Maps (SOMs) were reported performing more accurately and providing more consistent results than traditional clustering algorithms in various data mining problems. We present a novel strategy to analyse and compare conformational ensembles of protein domains using a two-level approach that combines SOMs and hierarchical clustering. Results The conformational dynamics of the α-spectrin SH3 protein domain and six single mutants were analysed by MD simulations. The Cα's Cartesian coordinates of conformations sampled in the essential space were used as input data vectors for SOM training, then complete linkage clustering was performed on the SOM prototype vectors. A specific protocol to optimize a SOM for structural ensembles was proposed: the optimal SOM was selected by means of a Taguchi experimental design plan applied to different data sets, and the optimal sampling rate of the MD trajectory was selected. The proposed two-level approach was applied to single trajectories of the SH3 domain independently as well as to groups of them at the same time. The results demonstrated the potential of this approach in the analysis of large ensembles of molecular structures: the possibility of producing a topological mapping of the conformational space in a simple 2D visualisation, as well as of effectively highlighting differences in the conformational dynamics directly related to biological functions. Conclusions The use of a two-level approach combining SOMs and hierarchical clustering for conformational analysis of structural ensembles of proteins was proposed. It can easily be extended to other study cases and to conformational ensembles from other sources. PMID:21569575
Comparing writing style feature-based classification methods for estimating user reputations in social media.

PubMed

Suh, Jong Hwan

2016-01-01

In recent years, the anonymous nature of the Internet has made it difficult to detect manipulated user reputations in social media, as well as to ensure the qualities of users and their posts. To deal with this, this study designs and examines an automatic approach that adopts writing style features to estimate user reputations in social media. Under varying ways of defining Good and Bad classes of user reputations based on the collected data, it evaluates the classification performance of the state-of-art methods: four writing style features, i.e. lexical, syntactic, structural, and content-specific, and eight classification techniques, i.e. four base learners-C4.5, Neural Network (NN), Support Vector Machine (SVM), and Naïve Bayes (NB)-and four Random Subspace (RS) ensemble methods based on the four base learners. When South Korea's Web forum, Daum Agora, was selected as a test bed, the experimental results show that the configuration of the full feature set containing content-specific features and RS-SVM combining RS and SVM gives the best accuracy for classification if the test bed poster reputations are segmented strictly into Good and Bad classes by portfolio approach. Pairwise t tests on accuracy confirm two expectations coming from the literature reviews: first, the feature set adding content-specific features outperform the others; second, ensemble learning methods are more viable than base learners. Moreover, among the four ways on defining the classes of user reputations, i.e. like, dislike, sum, and portfolio, the results show that the portfolio approach gives the highest accuracy.
Using Unsupervised Learning to Unlock the Potential of Hydrologic Similarity

NASA Astrophysics Data System (ADS)

Chaney, N.; Newman, A. J.

2017-12-01

By clustering environmental data into representative hydrologic response units (HRUs), hydrologic similarity aims to harness the covariance between a system's physical environment and its hydrologic response to create reduced-order models. This is the primary approach through which sub-grid hydrologic processes are represented in large-scale models (e.g., Earth System Models). Although the possibilities of hydrologic similarity are extensive, its practical implementations have been limited to 1-d bins of oversimplistic metrics of hydrologic response (e.g., topographic index)—this is a missed opportunity. In this presentation we will show how unsupervised learning is unlocking the potential of hydrologic similarity; clustering methods enable generalized frameworks to effectively and efficiently harness the petabytes of global environmental data to robustly characterize sub-grid heterogeneity in large-scale models. To illustrate the potential that unsupervised learning has towards advancing hydrologic similarity, we introduce a hierarchical clustering algorithm (HCA) that clusters very high resolution (30-100 meters) elevation, soil, climate, and land cover data to assemble a domain's representative HRUs. These HRUs are then used to parameterize the sub-grid heterogeneity in land surface models; for this study we use the GFDL LM4 model—the land component of the GFDL Earth System Model. To explore HCA and its impacts on the hydrologic system we use a ¼ grid cell in southeastern California as a test site. HCA is used to construct an ensemble of 9 different HRU configurations—each configuration has a different number of HRUs; for each ensemble member LM4 is run between 2002 and 2014 with a 26 year spinup. The analysis of the ensemble of model simulations show that: 1) clustering the high-dimensional environmental data space leads to a robust representation of the role of the physical environment in the coupled water, energy, and carbon cycles at a relatively low number of HRUs; 2) the reduced-order model with around 300 HRUs effectively reproduces the fully distributed model simulation (30 meters) with less than 1/1000 of computational expense; 3) assigning each grid cell of the fully distributed grid to an HRU via HCA enables novel visualization methods for large-scale models—this has significant implications for how these models are applied and evaluated. We will conclude by outlining the potential that this work has within operational prediction systems including numerical weather prediction, Earth System models, and Early Warning systems.
Robust Machine Learning Variable Importance Analyses of Medical Conditions for Health Care Spending.

PubMed

Rose, Sherri

2018-03-11

To propose nonparametric double robust machine learning in variable importance analyses of medical conditions for health spending. 2011-2012 Truven MarketScan database. I evaluate how much more, on average, commercially insured enrollees with each of 26 of the most prevalent medical conditions cost per year after controlling for demographics and other medical conditions. This is accomplished within the nonparametric targeted learning framework, which incorporates ensemble machine learning. Previous literature studying the impact of medical conditions on health care spending has almost exclusively focused on parametric risk adjustment; thus, I compare my approach to parametric regression. My results demonstrate that multiple sclerosis, congestive heart failure, severe cancers, major depression and bipolar disorders, and chronic hepatitis are the most costly medical conditions on average per individual. These findings differed from those obtained using parametric regression. The literature may be underestimating the spending contributions of several medical conditions, which is a potentially critical oversight. If current methods are not capturing the true incremental effect of medical conditions, undesirable incentives related to care may remain. Further work is needed to directly study these issues in the context of federal formulas. © Health Research and Educational Trust.
The role of ensemble post-processing for modeling the ensemble tail

NASA Astrophysics Data System (ADS)

Van De Vyver, Hans; Van Schaeybroeck, Bert; Vannitsem, Stéphane

2016-04-01

The past decades the numerical weather prediction community has witnessed a paradigm shift from deterministic to probabilistic forecast and state estimation (Buizza and Leutbecher, 2015; Buizza et al., 2008), in an attempt to quantify the uncertainties associated with initial-condition and model errors. An important benefit of a probabilistic framework is the improved prediction of extreme events. However, one may ask to what extent such model estimates contain information on the occurrence probability of extreme events and how this information can be optimally extracted. Different approaches have been proposed and applied on real-world systems which, based on extreme value theory, allow the estimation of extreme-event probabilities conditional on forecasts and state estimates (Ferro, 2007; Friederichs, 2010). Using ensemble predictions generated with a model of low dimensionality, a thorough investigation is presented quantifying the change of predictability of extreme events associated with ensemble post-processing and other influencing factors including the finite ensemble size, lead time and model assumption and the use of different covariates (ensemble mean, maximum, spread...) for modeling the tail distribution. Tail modeling is performed by deriving extreme-quantile estimates using peak-over-threshold representation (generalized Pareto distribution) or quantile regression. Common ensemble post-processing methods aim to improve mostly the ensemble mean and spread of a raw forecast (Van Schaeybroeck and Vannitsem, 2015). Conditional tail modeling, on the other hand, is a post-processing in itself, focusing on the tails only. Therefore, it is unclear how applying ensemble post-processing prior to conditional tail modeling impacts the skill of extreme-event predictions. This work is investigating this question in details. Buizza, Leutbecher, and Isaksen, 2008: Potential use of an ensemble of analyses in the ECMWF Ensemble Prediction System, Q. J. R. Meteorol. Soc. 134: 2051-2066.Buizza and Leutbecher, 2015: The forecast skill horizon, Q. J. R. Meteorol. Soc. 141: 3366-3382.Ferro, 2007: A probability model for verifying deterministic forecasts of extreme events. Weather and Forecasting 22 (5), 1089-1100.Friederichs, 2010: Statistical downscaling of extreme precipitation events using extreme value theory. Extremes 13, 109-132.Van Schaeybroeck and Vannitsem, 2015: Ensemble post-processing using member-by-member approaches: theoretical aspects. Q.J.R. Meteorol. Soc., 141: 807-818.
GA(M)E-QSAR: a novel, fully automatic genetic-algorithm-(meta)-ensembles approach for binary classification in ligand-based drug design.

PubMed

Pérez-Castillo, Yunierkis; Lazar, Cosmin; Taminau, Jonatan; Froeyen, Mathy; Cabrera-Pérez, Miguel Ángel; Nowé, Ann

2012-09-24

Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.
An "Ensemble Approach" to Modernizing Extreme Precipitation Estimation for Dam Safety Decision-Making

NASA Astrophysics Data System (ADS)

Cifelli, R.; Mahoney, K. M.; Webb, R. S.; McCormick, B.

2017-12-01

To ensure structural and operational safety of dams and other water management infrastructure, water resources managers and engineers require information about the potential for heavy precipitation. The methods and data used to estimate extreme rainfall amounts for managing risk are based on 40-year-old science and in need of improvement. The need to evaluate new approaches based on the best science available has led the states of Colorado and New Mexico to engage a body of scientists and engineers in an innovative "ensemble approach" to updating extreme precipitation estimates. NOAA is at the forefront of one of three technical approaches that make up the "ensemble study"; the three approaches are conducted concurrently and in collaboration with each other. One approach is the conventional deterministic, "storm-based" method, another is a risk-based regional precipitation frequency estimation tool, and the third is an experimental approach utilizing NOAA's state-of-the-art High Resolution Rapid Refresh (HRRR) physically-based dynamical weather prediction model. The goal of the overall project is to use the individual strengths of these different methods to define an updated and broadly acceptable state of the practice for evaluation and design of dam spillways. This talk will highlight the NOAA research and NOAA's role in the overarching goal to better understand and characterizing extreme precipitation estimation uncertainty. The research led by NOAA explores a novel high-resolution dataset and post-processing techniques using a super-ensemble of hourly forecasts from the HRRR model. We also investigate how this rich dataset may be combined with statistical methods to optimally cast the data in probabilistic frameworks. NOAA expertise in the physical processes that drive extreme precipitation is also employed to develop careful testing and improved understanding of the limitations of older estimation methods and assumptions. The process of decision making in the midst of uncertainty is a major part of this study. We will speak to how the ensemble approach may be used in concert with one another to manage risk and enhance resiliency in the midst of uncertainty. Finally, the presentation will also address the implications of including climate change in future extreme precipitation estimation studies.
Technical Note: Initial assessment of a multi-method approach to spring-flood forecasting in Sweden

NASA Astrophysics Data System (ADS)

Olsson, J.; Uvo, C. B.; Foster, K.; Yang, W.

2016-02-01

Hydropower is a major energy source in Sweden, and proper reservoir management prior to the spring-flood onset is crucial for optimal production. This requires accurate forecasts of the accumulated discharge in the spring-flood period (i.e. the spring-flood volume, SFV). Today's SFV forecasts are generated using a model-based climatological ensemble approach, where time series of precipitation and temperature from historical years are used to force a calibrated and initialized set-up of the HBV model. In this study, a number of new approaches to spring-flood forecasting that reflect the latest developments with respect to analysis and modelling on seasonal timescales are presented and evaluated. Three main approaches, represented by specific methods, are evaluated in SFV hindcasts for the Swedish river Vindelälven over a 10-year period with lead times between 0 and 4 months. In the first approach, historically analogue years with respect to the climate in the period preceding the spring flood are identified and used to compose a reduced ensemble. In the second, seasonal meteorological ensemble forecasts are used to drive the HBV model over the spring-flood period. In the third approach, statistical relationships between SFV and the large-sale atmospheric circulation are used to build forecast models. None of the new approaches consistently outperform the climatological ensemble approach, but for early forecasts improvements of up to 25 % are found. This potential is reasonably well realized in a multi-method system, which over all forecast dates reduced the error in SFV by ˜ 4 %. This improvement is limited but potentially significant for e.g. energy trading.
Computational predictive models for P-glycoprotein inhibition of in-house chalcone derivatives and drug-bank compounds.

PubMed

Ngo, Trieu-Du; Tran, Thanh-Dao; Le, Minh-Tri; Thai, Khac-Minh

2016-11-01

The human P-glycoprotein (P-gp) efflux pump is of great interest for medicinal chemists because of its important role in multidrug resistance (MDR). Because of the high polyspecificity as well as the unavailability of high-resolution X-ray crystal structures of this transmembrane protein, ligand-based, and structure-based approaches which were machine learning, homology modeling, and molecular docking were combined for this study. In ligand-based approach, individual two-dimensional quantitative structure-activity relationship models were developed using different machine learning algorithms and subsequently combined into the Ensemble model which showed good performance on both the diverse training set and the validation sets. The applicability domain and the prediction quality of the developed models were also judged using the state-of-the-art methods and tools. In our structure-based approach, the P-gp structure and its binding region were predicted for a docking study to determine possible interactions between the ligands and the receptor. Based on these in silico tools, hit compounds for reversing MDR were discovered from the in-house and DrugBank databases through virtual screening using prediction models and molecular docking in an attempt to restore cancer cell sensitivity to cytotoxic drugs.
An analytical approach to gravitational lensing by an ensemble of axisymmetric lenses

NASA Technical Reports Server (NTRS)

Lee, Man Hoi; Spergel, David N.

1990-01-01

The problem of gravitational lensing by an ensemble of identical axisymmetric lenses randomly distributed on a single lens plane is considered and a formal expression is derived for the joint probability density of finding shear and convergence at a random point on the plane. The amplification probability for a source can be accurately estimated from the distribution in shear and convergence. This method is applied to two cases: lensing by an ensemble of point masses and by an ensemble of objects with Gaussian surface mass density. There is no convergence for point masses whereas shear is negligible for wide Gaussian lenses.
Machine learning-, rule- and pharmacophore-based classification on the inhibition of P-glycoprotein and NorA.

PubMed

Ngo, T-D; Tran, T-D; Le, M-T; Thai, K-M

2016-09-01

The efflux pumps P-glycoprotein (P-gp) in humans and NorA in Staphylococcus aureus are of great interest for medicinal chemists because of their important roles in multidrug resistance (MDR). The high polyspecificity as well as the unavailability of high-resolution X-ray crystal structures of these transmembrane proteins lead us to combining ligand-based approaches, which in the case of this study were machine learning, perceptual mapping and pharmacophore modelling. For P-gp inhibitory activity, individual models were developed using different machine learning algorithms and subsequently combined into an ensemble model which showed a good discrimination between inhibitors and noninhibitors (acctrain-diverse = 84%; accinternal-test = 92% and accexternal-test = 100%). For ligand promiscuity between P-gp and NorA, perceptual maps and pharmacophore models were generated for the detection of rules and features. Based on these in silico tools, hit compounds for reversing MDR were discovered from the in-house and DrugBank databases through virtual screening in an attempt to restore drug sensitivity in cancer cells and bacteria.
Forecasting daily streamflow using online sequential extreme learning machines

NASA Astrophysics Data System (ADS)

Lima, Aranildo R.; Cannon, Alex J.; Hsieh, William W.

2016-06-01

While nonlinear machine methods have been widely used in environmental forecasting, in situations where new data arrive continually, the need to make frequent model updates can become cumbersome and computationally costly. To alleviate this problem, an online sequential learning algorithm for single hidden layer feedforward neural networks - the online sequential extreme learning machine (OSELM) - is automatically updated inexpensively as new data arrive (and the new data can then be discarded). OSELM was applied to forecast daily streamflow at two small watersheds in British Columbia, Canada, at lead times of 1-3 days. Predictors used were weather forecast data generated by the NOAA Global Ensemble Forecasting System (GEFS), and local hydro-meteorological observations. OSELM forecasts were tested with daily, monthly or yearly model updates. More frequent updating gave smaller forecast errors, including errors for data above the 90th percentile. Larger datasets used in the initial training of OSELM helped to find better parameters (number of hidden nodes) for the model, yielding better predictions. With the online sequential multiple linear regression (OSMLR) as benchmark, we concluded that OSELM is an attractive approach as it easily outperformed OSMLR in forecast accuracy.
ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction.

PubMed

Hajiloo, Mohsen; Sapkota, Yadav; Mackey, John R; Robson, Paula; Greiner, Russell; Damaraju, Sambasivarao

2013-02-22

Population stratification is a systematic difference in allele frequencies between subpopulations. This can lead to spurious association findings in the case-control genome wide association studies (GWASs) used to identify single nucleotide polymorphisms (SNPs) associated with disease-linked phenotypes. Methods such as self-declared ancestry, ancestry informative markers, genomic control, structured association, and principal component analysis are used to assess and correct population stratification but each has limitations. We provide an alternative technique to address population stratification. We propose a novel machine learning method, ETHNOPRED, which uses the genotype and ethnicity data from the HapMap project to learn ensembles of disjoint decision trees, capable of accurately predicting an individual's continental and sub-continental ancestry. To predict an individual's continental ancestry, ETHNOPRED produced an ensemble of 3 decision trees involving a total of 10 SNPs, with 10-fold cross validation accuracy of 100% using HapMap II dataset. We extended this model to involve 29 disjoint decision trees over 149 SNPs, and showed that this ensemble has an accuracy of ≥ 99.9%, even if some of those 149 SNP values were missing. On an independent dataset, predominantly of Caucasian origin, our continental classifier showed 96.8% accuracy and improved genomic control's λ from 1.22 to 1.11. We next used the HapMap III dataset to learn classifiers to distinguish European subpopulations (North-Western vs. Southern), East Asian subpopulations (Chinese vs. Japanese), African subpopulations (Eastern vs. Western), North American subpopulations (European vs. Chinese vs. African vs. Mexican vs. Indian), and Kenyan subpopulations (Luhya vs. Maasai). In these cases, ETHNOPRED produced ensembles of 3, 39, 21, 11, and 25 disjoint decision trees, respectively involving 31, 502, 526, 242 and 271 SNPs, with 10-fold cross validation accuracy of 86.5% ± 2.4%, 95.6% ± 3.9%, 95.6% ± 2.1%, 98.3% ± 2.0%, and 95.9% ± 1.5%. However, ETHNOPRED was unable to produce a classifier that can accurately distinguish Chinese in Beijing vs. Chinese in Denver. ETHNOPRED is a novel technique for producing classifiers that can identify an individual's continental and sub-continental heritage, based on a small number of SNPs. We show that its learned classifiers are simple, cost-efficient, accurate, transparent, flexible, fast, applicable to large scale GWASs, and robust to missing values.
Ensemble Recordings in Awake Rats: Achieving Behavioral Regularity during Multimodal Stimulus Processing and Discriminative Learning

ERIC Educational Resources Information Center

Lee, Eunjeong; Oliveira-Ferreira, Ana I.; de Water, Ed; Gerritsen, Hans; Bakker, Mattijs C.; Kalwij, Jan A. W.; van Goudoever, Tjerk; Buster, Wietze H.; Pennartz, Cyriel M. A.

2009-01-01

To meet an increasing need to examine the neurophysiological underpinnings of behavior in rats, we developed a behavioral system for studying sensory processing, attention and discrimination learning in rats while recording firing patterns of neurons in one or more brain areas of interest. Because neuronal activity is sensitive to variations in…
Statistical Analysis of Protein Ensembles

NASA Astrophysics Data System (ADS)

Máté, Gabriell; Heermann, Dieter

2014-04-01

As 3D protein-configuration data is piling up, there is an ever-increasing need for well-defined, mathematically rigorous analysis approaches, especially that the vast majority of the currently available methods rely heavily on heuristics. We propose an analysis framework which stems from topology, the field of mathematics which studies properties preserved under continuous deformations. First, we calculate a barcode representation of the molecules employing computational topology algorithms. Bars in this barcode represent different topological features. Molecules are compared through their barcodes by statistically determining the difference in the set of their topological features. As a proof-of-principle application, we analyze a dataset compiled of ensembles of different proteins, obtained from the Ensemble Protein Database. We demonstrate that our approach correctly detects the different protein groupings.

Parameterizations for ensemble Kalman inversion

NASA Astrophysics Data System (ADS)

Chada, Neil K.; Iglesias, Marco A.; Roininen, Lassi; Stuart, Andrew M.

2018-05-01

The use of ensemble methods to solve inverse problems is attractive because it is a derivative-free methodology which is also well-adapted to parallelization. In its basic iterative form the method produces an ensemble of solutions which lie in the linear span of the initial ensemble. Choice of the parameterization of the unknown field is thus a key component of the success of the method. We demonstrate how both geometric ideas and hierarchical ideas can be used to design effective parameterizations for a number of applied inverse problems arising in electrical impedance tomography, groundwater flow and source inversion. In particular we show how geometric ideas, including the level set method, can be used to reconstruct piecewise continuous fields, and we show how hierarchical methods can be used to learn key parameters in continuous fields, such as length-scales, resulting in improved reconstructions. Geometric and hierarchical ideas are combined in the level set method to find piecewise constant reconstructions with interfaces of unknown topology.
Adiabatic and nonadiabatic perturbation theory for coherence vector description of neutrino oscillations

NASA Astrophysics Data System (ADS)

Hollenberg, Sebastian; Päs, Heinrich

2012-01-01

The standard wave function approach for the treatment of neutrino oscillations fails in situations where quantum ensembles at a finite temperature with or without an interacting background plasma are encountered. As a first step to treat such phenomena in a novel way, we propose a unified approach to both adiabatic and nonadiabatic two-flavor oscillations in neutrino ensembles with finite temperature and generic (e.g., matter) potentials. Neglecting effects of ensemble decoherence for now, we study the evolution of a neutrino ensemble governed by the associated quantum kinetic equations, which apply to systems with finite temperature. The quantum kinetic equations are solved formally using the Magnus expansion and it is shown that a convenient choice of the quantum mechanical picture (e.g., the interaction picture) reveals suitable parameters to characterize the physics of the underlying system (e.g., an effective oscillation length). It is understood that this method also provides a promising starting point for the treatment of the more general case in which decoherence is taken into account.
A Flexible Approach for the Statistical Visualization of Ensemble Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Potter, K.; Wilson, A.; Bremer, P.

2009-09-29

Scientists are increasingly moving towards ensemble data sets to explore relationships present in dynamic systems. Ensemble data sets combine spatio-temporal simulation results generated using multiple numerical models, sampled input conditions and perturbed parameters. While ensemble data sets are a powerful tool for mitigating uncertainty, they pose significant visualization and analysis challenges due to their complexity. We present a collection of overview and statistical displays linked through a high level of interactivity to provide a framework for gaining key scientific insight into the distribution of the simulation results as well as the uncertainty associated with the data. In contrast to methodsmore » that present large amounts of diverse information in a single display, we argue that combining multiple linked statistical displays yields a clearer presentation of the data and facilitates a greater level of visual data analysis. We demonstrate this approach using driving problems from climate modeling and meteorology and discuss generalizations to other fields.« less
Ensembles vs. information theory: supporting science under uncertainty

NASA Astrophysics Data System (ADS)

Nearing, Grey S.; Gupta, Hoshin V.

2018-05-01

Multi-model ensembles are one of the most common ways to deal with epistemic uncertainty in hydrology. This is a problem because there is no known way to sample models such that the resulting ensemble admits a measure that has any systematic (i.e., asymptotic, bounded, or consistent) relationship with uncertainty. Multi-model ensembles are effectively sensitivity analyses and cannot - even partially - quantify uncertainty. One consequence of this is that multi-model approaches cannot support a consistent scientific method - in particular, multi-model approaches yield unbounded errors in inference. In contrast, information theory supports a coherent hypothesis test that is robust to (i.e., bounded under) arbitrary epistemic uncertainty. This paper may be understood as advocating a procedure for hypothesis testing that does not require quantifying uncertainty, but is coherent and reliable (i.e., bounded) in the presence of arbitrary (unknown and unknowable) uncertainty. We conclude by offering some suggestions about how this proposed philosophy of science suggests new ways to conceptualize and construct simulation models of complex, dynamical systems.
Shallow cumuli ensemble statistics for development of a stochastic parameterization

NASA Astrophysics Data System (ADS)

Sakradzija, Mirjana; Seifert, Axel; Heus, Thijs

2014-05-01

According to a conventional deterministic approach to the parameterization of moist convection in numerical atmospheric models, a given large scale forcing produces an unique response from the unresolved convective processes. This representation leaves out the small-scale variability of convection, as it is known from the empirical studies of deep and shallow convective cloud ensembles, there is a whole distribution of sub-grid states corresponding to the given large scale forcing. Moreover, this distribution gets broader with the increasing model resolution. This behavior is also consistent with our theoretical understanding of a coarse-grained nonlinear system. We propose an approach to represent the variability of the unresolved shallow-convective states, including the dependence of the sub-grid states distribution spread and shape on the model horizontal resolution. Starting from the Gibbs canonical ensemble theory, Craig and Cohen (2006) developed a theory for the fluctuations in a deep convective ensemble. The micro-states of a deep convective cloud ensemble are characterized by the cloud-base mass flux, which, according to the theory, is exponentially distributed (Boltzmann distribution). Following their work, we study the shallow cumulus ensemble statistics and the distribution of the cloud-base mass flux. We employ a Large-Eddy Simulation model (LES) and a cloud tracking algorithm, followed by a conditional sampling of clouds at the cloud base level, to retrieve the information about the individual cloud life cycles and the cloud ensemble as a whole. In the case of shallow cumulus cloud ensemble, the distribution of micro-states is a generalized exponential distribution. Based on the empirical and theoretical findings, a stochastic model has been developed to simulate the shallow convective cloud ensemble and to test the convective ensemble theory. Stochastic model simulates a compound random process, with the number of convective elements drawn from a Poisson distribution, and cloud properties sub-sampled from a generalized ensemble distribution. We study the role of the different cloud subtypes in a shallow convective ensemble and how the diverse cloud properties and cloud lifetimes affect the system macro-state. To what extent does the cloud-base mass flux distribution deviate from the simple Boltzmann distribution and how does it affect the results from the stochastic model? Is the memory, provided by the finite lifetime of individual clouds, of importance for the ensemble statistics? We also test for the minimal information given as an input to the stochastic model, able to reproduce the ensemble mean statistics and the variability in a convective ensemble. An important property of the resulting distribution of the sub-grid convective states is its scale-adaptivity - the smaller the grid-size, the broader the compound distribution of the sub-grid states.
Numerical weather prediction model tuning via ensemble prediction system

NASA Astrophysics Data System (ADS)

Jarvinen, H.; Laine, M.; Ollinaho, P.; Solonen, A.; Haario, H.

2011-12-01

This paper discusses a novel approach to tune predictive skill of numerical weather prediction (NWP) models. NWP models contain tunable parameters which appear in parameterizations schemes of sub-grid scale physical processes. Currently, numerical values of these parameters are specified manually. In a recent dual manuscript (QJRMS, revised) we developed a new concept and method for on-line estimation of the NWP model parameters. The EPPES ("Ensemble prediction and parameter estimation system") method requires only minimal changes to the existing operational ensemble prediction infra-structure and it seems very cost-effective because practically no new computations are introduced. The approach provides an algorithmic decision making tool for model parameter optimization in operational NWP. In EPPES, statistical inference about the NWP model tunable parameters is made by (i) generating each member of the ensemble of predictions using different model parameter values, drawn from a proposal distribution, and (ii) feeding-back the relative merits of the parameter values to the proposal distribution, based on evaluation of a suitable likelihood function against verifying observations. In the presentation, the method is first illustrated in low-order numerical tests using a stochastic version of the Lorenz-95 model which effectively emulates the principal features of ensemble prediction systems. The EPPES method correctly detects the unknown and wrongly specified parameters values, and leads to an improved forecast skill. Second, results with an atmospheric general circulation model based ensemble prediction system show that the NWP model tuning capacity of EPPES scales up to realistic models and ensemble prediction systems. Finally, a global top-end NWP model tuning exercise with preliminary results is published.
Integrating machine learning techniques and high-resolution imagery to generate GIS-ready information for urban water consumption studies

NASA Astrophysics Data System (ADS)

Wolf, Nils; Hof, Angela

2012-10-01

Urban sprawl driven by shifts in tourism development produces new suburban landscapes of water consumption on Mediterranean coasts. Golf courses, ornamental, 'Atlantic' gardens and swimming pools are the most striking artefacts of this transformation, threatening the local water supply systems and exacerbating water scarcity. In the face of climate change, urban landscape irrigation is becoming increasingly important from a resource management point of view. This paper adopts urban remote sensing towards a targeted mapping approach using machine learning techniques and highresolution satellite imagery (WorldView-2) to generate GIS-ready information for urban water consumption studies. Swimming pools, vegetation and - as a subgroup of vegetation - turf grass are extracted as important determinants of water consumption. For image analysis, the complex nature of urban environments suggests spatial-spectral classification, i.e. the complementary use of the spectral signature and spatial descriptors. Multiscale image segmentation provides means to extract the spatial descriptors - namely object feature layers - which can be concatenated at pixel level to the spectral signature. This study assesses the value of object features using different machine learning techniques and amounts of labeled information for learning. The results indicate the benefit of the spatial-spectral approach if combined with appropriate classifiers like tree-based ensembles or support vector machines, which can handle high dimensionality. Finally, a Random Forest classifier was chosen to deliver the classified input data for the estimation of evaporative water loss and net landscape irrigation requirements.
Mortality risk score prediction in an elderly population using machine learning.

PubMed

Rose, Sherri

2013-03-01

Standard practice for prediction often relies on parametric regression methods. Interesting new methods from the machine learning literature have been introduced in epidemiologic studies, such as random forest and neural networks. However, a priori, an investigator will not know which algorithm to select and may wish to try several. Here I apply the super learner, an ensembling machine learning approach that combines multiple algorithms into a single algorithm and returns a prediction function with the best cross-validated mean squared error. Super learning is a generalization of stacking methods. I used super learning in the Study of Physical Performance and Age-Related Changes in Sonomans (SPPARCS) to predict death among 2,066 residents of Sonoma, California, aged 54 years or more during the period 1993-1999. The super learner for predicting death (risk score) improved upon all single algorithms in the collection of algorithms, although its performance was similar to that of several algorithms. Super learner outperformed the worst algorithm (neural networks) by 44% with respect to estimated cross-validated mean squared error and had an R2 value of 0.201. The improvement of super learner over random forest with respect to R2 was approximately 2-fold. Alternatives for risk score prediction include the super learner, which can provide improved performance.
An Efficient Local Correlation Matrix Decomposition Approach for the Localization Implementation of Ensemble-Based Assimilation Methods

NASA Astrophysics Data System (ADS)

Zhang, Hongqin; Tian, Xiangjun

2018-04-01

Ensemble-based data assimilation methods often use the so-called localization scheme to improve the representation of the ensemble background error covariance (Be). Extensive research has been undertaken to reduce the computational cost of these methods by using the localized ensemble samples to localize Be by means of a direct decomposition of the local correlation matrix C. However, the computational costs of the direct decomposition of the local correlation matrix C are still extremely high due to its high dimension. In this paper, we propose an efficient local correlation matrix decomposition approach based on the concept of alternating directions. This approach is intended to avoid direct decomposition of the correlation matrix. Instead, we first decompose the correlation matrix into 1-D correlation matrices in the three coordinate directions, then construct their empirical orthogonal function decomposition at low resolution. This procedure is followed by the 1-D spline interpolation process to transform the above decompositions to the high-resolution grid. Finally, an efficient correlation matrix decomposition is achieved by computing the very similar Kronecker product. We conducted a series of comparison experiments to illustrate the validity and accuracy of the proposed local correlation matrix decomposition approach. The effectiveness of the proposed correlation matrix decomposition approach and its efficient localization implementation of the nonlinear least-squares four-dimensional variational assimilation are further demonstrated by several groups of numerical experiments based on the Advanced Research Weather Research and Forecasting model.
Ensemble data assimilation in the Red Sea: sensitivity to ensemble selection and atmospheric forcing

NASA Astrophysics Data System (ADS)

Toye, Habib; Zhan, Peng; Gopalakrishnan, Ganesh; Kartadikaria, Aditya R.; Huang, Huang; Knio, Omar; Hoteit, Ibrahim

2017-07-01

We present our efforts to build an ensemble data assimilation and forecasting system for the Red Sea. The system consists of the high-resolution Massachusetts Institute of Technology general circulation model (MITgcm) to simulate ocean circulation and of the Data Research Testbed (DART) for ensemble data assimilation. DART has been configured to integrate all members of an ensemble adjustment Kalman filter (EAKF) in parallel, based on which we adapted the ensemble operations in DART to use an invariant ensemble, i.e., an ensemble Optimal Interpolation (EnOI) algorithm. This approach requires only single forward model integration in the forecast step and therefore saves substantial computational cost. To deal with the strong seasonal variability of the Red Sea, the EnOI ensemble is then seasonally selected from a climatology of long-term model outputs. Observations of remote sensing sea surface height (SSH) and sea surface temperature (SST) are assimilated every 3 days. Real-time atmospheric fields from the National Center for Environmental Prediction (NCEP) and the European Center for Medium-Range Weather Forecasts (ECMWF) are used as forcing in different assimilation experiments. We investigate the behaviors of the EAKF and (seasonal-) EnOI and compare their performances for assimilating and forecasting the circulation of the Red Sea. We further assess the sensitivity of the assimilation system to various filtering parameters (ensemble size, inflation) and atmospheric forcing.
CATCh, an Ensemble Classifier for Chimera Detection in 16S rRNA Sequencing Studies

PubMed Central

Mysara, Mohamed; Saeys, Yvan; Leys, Natalie; Raes, Jeroen

2014-01-01

In ecological studies, microbial diversity is nowadays mostly assessed via the detection of phylogenetic marker genes, such as 16S rRNA. However, PCR amplification of these marker genes produces a significant amount of artificial sequences, often referred to as chimeras. Different algorithms have been developed to remove these chimeras, but efforts to combine different methodologies are limited. Therefore, two machine learning classifiers (reference-based and de novo CATCh) were developed by integrating the output of existing chimera detection tools into a new, more powerful method. When comparing our classifiers with existing tools in either the reference-based or de novo mode, a higher performance of our ensemble method was observed on a wide range of sequencing data, including simulated, 454 pyrosequencing, and Illumina MiSeq data sets. Since our algorithm combines the advantages of different individual chimera detection tools, our approach produces more robust results when challenged with chimeric sequences having a low parent divergence, short length of the chimeric range, and various numbers of parents. Additionally, it could be shown that integrating CATCh in the preprocessing pipeline has a beneficial effect on the quality of the clustering in operational taxonomic units. PMID:25527546
Path planning in uncertain flow fields using ensemble method

NASA Astrophysics Data System (ADS)

Wang, Tong; Le Maître, Olivier P.; Hoteit, Ibrahim; Knio, Omar M.

2016-10-01

An ensemble-based approach is developed to conduct optimal path planning in unsteady ocean currents under uncertainty. We focus our attention on two-dimensional steady and unsteady uncertain flows, and adopt a sampling methodology that is well suited to operational forecasts, where an ensemble of deterministic predictions is used to model and quantify uncertainty. In an operational setting, much about dynamics, topography, and forcing of the ocean environment is uncertain. To address this uncertainty, the flow field is parametrized using a finite number of independent canonical random variables with known densities, and the ensemble is generated by sampling these variables. For each of the resulting realizations of the uncertain current field, we predict the path that minimizes the travel time by solving a boundary value problem (BVP), based on the Pontryagin maximum principle. A family of backward-in-time trajectories starting at the end position is used to generate suitable initial values for the BVP solver. This allows us to examine and analyze the performance of the sampling strategy and to develop insight into extensions dealing with general circulation ocean models. In particular, the ensemble method enables us to perform a statistical analysis of travel times and consequently develop a path planning approach that accounts for these statistics. The proposed methodology is tested for a number of scenarios. We first validate our algorithms by reproducing simple canonical solutions, and then demonstrate our approach in more complex flow fields, including idealized, steady and unsteady double-gyre flows.
Prediction of bacterial small RNAs in the RsmA (CsrA) and ToxT pathways: a machine learning approach.

PubMed

Fakhry, Carl Tony; Kulkarni, Prajna; Chen, Ping; Kulkarni, Rahul; Zarringhalam, Kourosh

2017-08-22

Small RNAs (sRNAs) constitute an important class of post-transcriptional regulators that control critical cellular processes in bacteria. Recent research using high-throughput transcriptomic approaches has led to a dramatic increase in the discovery of bacterial sRNAs. However, it is generally believed that the currently identified sRNAs constitute a limited subset of the bacterial sRNA repertoire. In several cases, sRNAs belonging to a specific class are already known and the challenge is to identify additional sRNAs belonging to the same class. In such cases, machine-learning approaches can be used to predict novel sRNAs in a given class. In this work, we develop novel bioinformatics approaches that integrate sequence and structure-based features to train machine-learning models for the discovery of bacterial sRNAs. We show that features derived from recurrent structural motifs in the ensemble of low energy secondary structures can distinguish the RNA classes with high accuracy. We apply this approach to predict new members in two broad classes of bacterial small RNAs: 1) sRNAs that bind to the RNA-binding protein RsmA/CsrA in diverse bacterial species and 2) sRNAs regulated by the master regulator of virulence, ToxT, in Vibrio cholerae. The involvement of sRNAs in bacterial adaptation to changing environments is an increasingly recurring theme in current research in microbiology. It is likely that future research, combining experimental and computational approaches, will discover many more examples of sRNAs as components of critical regulatory pathways in bacteria. We have developed a novel approach for prediction of small RNA regulators in important bacterial pathways. This approach can be applied to specific classes of sRNAs for which several members have been identified and the challenge is to identify additional sRNAs.
Determination of the conformational ensemble of the TAR RNA by X-ray scattering interferometry.

PubMed

Shi, Xuesong; Walker, Peter; Harbury, Pehr B; Herschlag, Daniel

2017-05-05

The conformational ensembles of structured RNA's are crucial for biological function, but they remain difficult to elucidate experimentally. We demonstrate with HIV-1 TAR RNA that X-ray scattering interferometry (XSI) can be used to determine RNA conformational ensembles. X-ray scattering interferometry (XSI) is based on site-specifically labeling RNA with pairs of heavy atom probes, and precisely measuring the distribution of inter-probe distances that arise from a heterogeneous mixture of RNA solution structures. We show that the XSI-based model of the TAR RNA ensemble closely resembles an independent model derived from NMR-RDC data. Further, we show how the TAR RNA ensemble changes shape at different salt concentrations. Finally, we demonstrate that a single hybrid model of the TAR RNA ensemble simultaneously fits both the XSI and NMR-RDC data set and show that XSI can be combined with NMR-RDC to further improve the quality of the determined ensemble. The results suggest that XSI-RNA will be a powerful approach for characterizing the solution conformational ensembles of RNAs and RNA-protein complexes under diverse solution conditions. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
A Statistical Description of Neural Ensemble Dynamics

PubMed Central

Long, John D.; Carmena, Jose M.

2011-01-01

The growing use of multi-channel neural recording techniques in behaving animals has produced rich datasets that hold immense potential for advancing our understanding of how the brain mediates behavior. One limitation of these techniques is they do not provide important information about the underlying anatomical connections among the recorded neurons within an ensemble. Inferring these connections is often intractable because the set of possible interactions grows exponentially with ensemble size. This is a fundamental challenge one confronts when interpreting these data. Unfortunately, the combination of expert knowledge and ensemble data is often insufficient for selecting a unique model of these interactions. Our approach shifts away from modeling the network diagram of the ensemble toward analyzing changes in the dynamics of the ensemble as they relate to behavior. Our contribution consists of adapting techniques from signal processing and Bayesian statistics to track the dynamics of ensemble data on time-scales comparable with behavior. We employ a Bayesian estimator to weigh prior information against the available ensemble data, and use an adaptive quantization technique to aggregate poorly estimated regions of the ensemble data space. Importantly, our method is capable of detecting changes in both the magnitude and structure of correlations among neurons missed by firing rate metrics. We show that this method is scalable across a wide range of time-scales and ensemble sizes. Lastly, the performance of this method on both simulated and real ensemble data is used to demonstrate its utility. PMID:22319486
TEES 2.2: Biomedical Event Extraction for Diverse Corpora

PubMed Central

2015-01-01

Background The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks. Results The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets. Conclusions The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented. PMID:26551925
Automated Grading of Gliomas using Deep Learning in Digital Pathology Images: A modular approach with ensemble of convolutional neural networks.

PubMed

Ertosun, Mehmet Günhan; Rubin, Daniel L

2015-01-01

Brain glioma is the most common primary malignant brain tumors in adults with different pathologic subtypes: Lower Grade Glioma (LGG) Grade II, Lower Grade Glioma (LGG) Grade III, and Glioblastoma Multiforme (GBM) Grade IV. The survival and treatment options are highly dependent of this glioma grade. We propose a deep learning-based, modular classification pipeline for automated grading of gliomas using digital pathology images. Whole tissue digitized images of pathology slides obtained from The Cancer Genome Atlas (TCGA) were used to train our deep learning modules. Our modular pipeline provides diagnostic quality statistics, such as precision, sensitivity and specificity, of the individual deep learning modules, and (1) facilitates training given the limited data in this domain, (2) enables exploration of different deep learning structures for each module, (3) leads to developing less complex modules that are simpler to analyze, and (4) provides flexibility, permitting use of single modules within the framework or use of other modeling or machine learning applications, such as probabilistic graphical models or support vector machines. Our modular approach helps us meet the requirements of minimum accuracy levels that are demanded by the context of different decision points within a multi-class classification scheme. Convolutional Neural Networks are trained for each module for each sub-task with more than 90% classification accuracies on validation data set, and achieved classification accuracy of 96% for the task of GBM vs LGG classification, 71% for further identifying the grade of LGG into Grade II or Grade III on independent data set coming from new patients from the multi-institutional repository.
Automated Grading of Gliomas using Deep Learning in Digital Pathology Images: A modular approach with ensemble of convolutional neural networks

PubMed Central

Ertosun, Mehmet Günhan; Rubin, Daniel L.

2015-01-01

Brain glioma is the most common primary malignant brain tumors in adults with different pathologic subtypes: Lower Grade Glioma (LGG) Grade II, Lower Grade Glioma (LGG) Grade III, and Glioblastoma Multiforme (GBM) Grade IV. The survival and treatment options are highly dependent of this glioma grade. We propose a deep learning-based, modular classification pipeline for automated grading of gliomas using digital pathology images. Whole tissue digitized images of pathology slides obtained from The Cancer Genome Atlas (TCGA) were used to train our deep learning modules. Our modular pipeline provides diagnostic quality statistics, such as precision, sensitivity and specificity, of the individual deep learning modules, and (1) facilitates training given the limited data in this domain, (2) enables exploration of different deep learning structures for each module, (3) leads to developing less complex modules that are simpler to analyze, and (4) provides flexibility, permitting use of single modules within the framework or use of other modeling or machine learning applications, such as probabilistic graphical models or support vector machines. Our modular approach helps us meet the requirements of minimum accuracy levels that are demanded by the context of different decision points within a multi-class classification scheme. Convolutional Neural Networks are trained for each module for each sub-task with more than 90% classification accuracies on validation data set, and achieved classification accuracy of 96% for the task of GBM vs LGG classification, 71% for further identifying the grade of LGG into Grade II or Grade III on independent data set coming from new patients from the multi-institutional repository. PMID:26958289
TEES 2.2: Biomedical Event Extraction for Diverse Corpora.

PubMed

Björne, Jari; Salakoski, Tapio

2015-01-01

The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks. The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets. The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented.
Ensemble streamflow assimilation with the National Water Model.

NASA Astrophysics Data System (ADS)

Rafieeinasab, A.; McCreight, J. L.; Noh, S.; Seo, D. J.; Gochis, D.

2017-12-01

Through case studies of flooding across the US, we compare the performance of the National Water Model (NWM) data assimilation (DA) scheme to that of a newly implemented ensemble Kalman filter approach. The NOAA National Water Model (NWM) is an operational implementation of the community WRF-Hydro modeling system. As of August 2016, the NWM forecasts of distributed hydrologic states and fluxes (including soil moisture, snowpack, ET, and ponded water) over the contiguous United States have been publicly disseminated by the National Center for Environmental Prediction (NCEP) . It also provides streamflow forecasts at more than 2.7 million river reaches up to 30 days in advance. The NWM employs a nudging scheme to assimilate more than 6,000 USGS streamflow observations and provide initial conditions for its forecasts. A problem with nudging is how the forecasts relax quickly to open-loop bias in the forecast. This has been partially addressed by an experimental bias correction approach which was found to have issues with phase errors during flooding events. In this work, we present an ensemble streamflow data assimilation approach combining new channel-only capabilities of the NWM and HydroDART (a coupling of the offline WRF-Hydro model and NCAR's Data Assimilation Research Testbed; DART). Our approach focuses on the single model state of discharge and incorporates error distributions on channel-influxes (overland and groundwater) in the assimilation via an ensemble Kalman filter (EnKF). In order to avoid filter degeneracy associated with a limited number of ensemble at large scale, DART's covariance inflation (Anderson, 2009) and localization capabilities are implemented and evaluated. The current NWM data assimilation scheme is compared to preliminary results from the EnKF application for several flooding case studies across the US.

Achieving Greater Musical Independence in Ensembles through Cognitive Apprenticeship

ERIC Educational Resources Information Center

Weidner, Brian N.

2018-01-01

Musical independence is a common objective for large-ensemble classes, but traditional, teacher-centric instructional practices for these groups may discourage rather than promote students' critical thinking and decision making in music. Cognitive apprenticeship provides an instructional approach through which student musicians can develop skills…
Artificial neural networks for efficient clustering of conformational ensembles and their potential for medicinal chemistry.

PubMed

Pandini, Alessandro; Fraccalvieri, Domenico; Bonati, Laura

2013-01-01

The biological function of proteins is strictly related to their molecular flexibility and dynamics: enzymatic activity, protein-protein interactions, ligand binding and allosteric regulation are important mechanisms involving protein motions. Computational approaches, such as Molecular Dynamics (MD) simulations, are now routinely used to study the intrinsic dynamics of target proteins as well as to complement molecular docking approaches. These methods have also successfully supported the process of rational design and discovery of new drugs. Identification of functionally relevant conformations is a key step in these studies. This is generally done by cluster analysis of the ensemble of structures in the MD trajectory. Recently Artificial Neural Network (ANN) approaches, in particular methods based on Self-Organising Maps (SOMs), have been reported performing more accurately and providing more consistent results than traditional clustering algorithms in various data-mining problems. In the specific case of conformational analysis, SOMs have been successfully used to compare multiple ensembles of protein conformations demonstrating a potential in efficiently detecting the dynamic signatures central to biological function. Moreover, examples of the use of SOMs to address problems relevant to other stages of the drug-design process, including clustering of docking poses, have been reported. In this contribution we review recent applications of ANN algorithms in analysing conformational and structural ensembles and we discuss their potential in computer-based approaches for medicinal chemistry.
Fluctuating observation time ensembles in the thermodynamics of trajectories

NASA Astrophysics Data System (ADS)

Budini, Adrián A.; Turner, Robert M.; Garrahan, Juan P.

2014-03-01

The dynamics of stochastic systems, both classical and quantum, can be studied by analysing the statistical properties of dynamical trajectories. The properties of ensembles of such trajectories for long, but fixed, times are described by large-deviation (LD) rate functions. These LD functions play the role of dynamical free energies: they are cumulant generating functions for time-integrated observables, and their analytic structure encodes dynamical phase behaviour. This ‘thermodynamics of trajectories’ approach is to trajectories and dynamics what the equilibrium ensemble method of statistical mechanics is to configurations and statics. Here we show that, just like in the static case, there are a variety of alternative ensembles of trajectories, each defined by their global constraints, with that of trajectories of fixed total time being just one of these. We show how the LD functions that describe an ensemble of trajectories where some time-extensive quantity is constant (and large) but where total observation time fluctuates can be mapped to those of the fixed-time ensemble. We discuss how the correspondence between generalized ensembles can be exploited in path sampling schemes for generating rare dynamical trajectories.
Skill (or lack thereof) of data-model fusion techniques to provide an early warning signal for an approaching tipping point.

PubMed

Singh, Riddhi; Quinn, Julianne D; Reed, Patrick M; Keller, Klaus

2018-01-01

Many coupled human-natural systems have the potential to exhibit a highly nonlinear threshold response to external forcings resulting in fast transitions to undesirable states (such as eutrophication in a lake). Often, there are considerable uncertainties that make identifying the threshold challenging. Thus, rapid learning is critical for guiding management actions to avoid abrupt transitions. Here, we adopt the shallow lake problem as a test case to compare the performance of four common data assimilation schemes to predict an approaching transition. In order to demonstrate the complex interactions between management strategies and the ability of the data assimilation schemes to predict eutrophication, we also analyze our results across two different management strategies governing phosphorus emissions into the shallow lake. The compared data assimilation schemes are: ensemble Kalman filtering (EnKF), particle filtering (PF), pre-calibration (PC), and Markov Chain Monte Carlo (MCMC) estimation. While differing in their core assumptions, each data assimilation scheme is based on Bayes' theorem and updates prior beliefs about a system based on new information. For large computational investments, EnKF, PF and MCMC show similar skill in capturing the observed phosphorus in the lake (measured as expected root mean squared prediction error). EnKF, followed by PF, displays the highest learning rates at low computational cost, thus providing a more reliable signal of an impending transition. MCMC approaches the true probability of eutrophication only after a strong signal of an impending transition emerges from the observations. Overall, we find that learning rates are greatest near regions of abrupt transitions, posing a challenge to early learning and preemptive management of systems with such abrupt transitions.
Skill (or lack thereof) of data-model fusion techniques to provide an early warning signal for an approaching tipping point

PubMed Central

Quinn, Julianne D.; Reed, Patrick M.; Keller, Klaus

2018-01-01

Many coupled human-natural systems have the potential to exhibit a highly nonlinear threshold response to external forcings resulting in fast transitions to undesirable states (such as eutrophication in a lake). Often, there are considerable uncertainties that make identifying the threshold challenging. Thus, rapid learning is critical for guiding management actions to avoid abrupt transitions. Here, we adopt the shallow lake problem as a test case to compare the performance of four common data assimilation schemes to predict an approaching transition. In order to demonstrate the complex interactions between management strategies and the ability of the data assimilation schemes to predict eutrophication, we also analyze our results across two different management strategies governing phosphorus emissions into the shallow lake. The compared data assimilation schemes are: ensemble Kalman filtering (EnKF), particle filtering (PF), pre-calibration (PC), and Markov Chain Monte Carlo (MCMC) estimation. While differing in their core assumptions, each data assimilation scheme is based on Bayes’ theorem and updates prior beliefs about a system based on new information. For large computational investments, EnKF, PF and MCMC show similar skill in capturing the observed phosphorus in the lake (measured as expected root mean squared prediction error). EnKF, followed by PF, displays the highest learning rates at low computational cost, thus providing a more reliable signal of an impending transition. MCMC approaches the true probability of eutrophication only after a strong signal of an impending transition emerges from the observations. Overall, we find that learning rates are greatest near regions of abrupt transitions, posing a challenge to early learning and preemptive management of systems with such abrupt transitions. PMID:29389938
A Generalized Mixture Framework for Multi-label Classification

PubMed Central

Hong, Charmgil; Batal, Iyad; Hauskrecht, Milos

2015-01-01

We develop a novel probabilistic ensemble framework for multi-label classification that is based on the mixtures-of-experts architecture. In this framework, we combine multi-label classification models in the classifier chains family that decompose the class posterior distribution P(Y1, …, Yd|X) using a product of posterior distributions over components of the output space. Our approach captures different input–output and output–output relations that tend to change across data. As a result, we can recover a rich set of dependency relations among inputs and outputs that a single multi-label classification model cannot capture due to its modeling simplifications. We develop and present algorithms for learning the mixtures-of-experts models from data and for performing multi-label predictions on unseen data instances. Experiments on multiple benchmark datasets demonstrate that our approach achieves highly competitive results and outperforms the existing state-of-the-art multi-label classification methods. PMID:26613069
Adaptive Nonparametric Kinematic Modeling of Concentric Tube Robots.

PubMed

Fagogenis, Georgios; Bergeles, Christos; Dupont, Pierre E

2016-10-01

Concentric tube robots comprise telescopic precurved elastic tubes. The robot's tip and shape are controlled via relative tube motions, i.e. tube rotations and translations. Non-linear interactions between the tubes, e.g. friction and torsion, as well as uncertainty in the physical properties of the tubes themselves, e.g. the Young's modulus, curvature, or stiffness, hinder accurate kinematic modelling. In this paper, we present a machine-learning-based methodology for kinematic modelling of concentric tube robots and in situ model adaptation. Our approach is based on Locally Weighted Projection Regression (LWPR). The model comprises an ensemble of linear models, each of which locally approximates the original complex kinematic relation. LWPR can accommodate for model deviations by adjusting the respective local models at run-time, resulting in an adaptive kinematics framework. We evaluated our approach on data gathered from a three-tube robot, and report high accuracy across the robot's configuration space.
Composing problem solvers for simulation experimentation: a case study on steady state estimation.

PubMed

Leye, Stefan; Ewald, Roland; Uhrmacher, Adelinde M

2014-01-01

Simulation experiments involve various sub-tasks, e.g., parameter optimization, simulation execution, or output data analysis. Many algorithms can be applied to such tasks, but their performance depends on the given problem. Steady state estimation in systems biology is a typical example for this: several estimators have been proposed, each with its own (dis-)advantages. Experimenters, therefore, must choose from the available options, even though they may not be aware of the consequences. To support those users, we propose a general scheme to aggregate such algorithms to so-called synthetic problem solvers, which exploit algorithm differences to improve overall performance. Our approach subsumes various aggregation mechanisms, supports automatic configuration from training data (e.g., via ensemble learning or portfolio selection), and extends the plugin system of the open source modeling and simulation framework James II. We show the benefits of our approach by applying it to steady state estimation for cell-biological models.
Ensemble learning for spatial interpolation of soil potassium content based on environmental information.

PubMed

Liu, Wei; Du, Peijun; Wang, Dongchen

2015-01-01

One important method to obtain the continuous surfaces of soil properties from point samples is spatial interpolation. In this paper, we propose a method that combines ensemble learning with ancillary environmental information for improved interpolation of soil properties (hereafter, EL-SP). First, we calculated the trend value for soil potassium contents at the Qinghai Lake region in China based on measured values. Then, based on soil types, geology types, land use types, and slope data, the remaining residual was simulated with the ensemble learning model. Next, the EL-SP method was applied to interpolate soil potassium contents at the study site. To evaluate the utility of the EL-SP method, we compared its performance with other interpolation methods including universal kriging, inverse distance weighting, ordinary kriging, and ordinary kriging combined geographic information. Results show that EL-SP had a lower mean absolute error and root mean square error than the data produced by the other models tested in this paper. Notably, the EL-SP maps can describe more locally detailed information and more accurate spatial patterns for soil potassium content than the other methods because of the combined use of different types of environmental information; these maps are capable of showing abrupt boundary information for soil potassium content. Furthermore, the EL-SP method not only reduces prediction errors, but it also compliments other environmental information, which makes the spatial interpolation of soil potassium content more reasonable and useful.
Deep Learning at Chest Radiography: Automated Classification of Pulmonary Tuberculosis by Using Convolutional Neural Networks.

PubMed

Lakhani, Paras; Sundaram, Baskaran

2017-08-01

Purpose To evaluate the efficacy of deep convolutional neural networks (DCNNs) for detecting tuberculosis (TB) on chest radiographs. Materials and Methods Four deidentified HIPAA-compliant datasets were used in this study that were exempted from review by the institutional review board, which consisted of 1007 posteroanterior chest radiographs. The datasets were split into training (68.0%), validation (17.1%), and test (14.9%). Two different DCNNs, AlexNet and GoogLeNet, were used to classify the images as having manifestations of pulmonary TB or as healthy. Both untrained and pretrained networks on ImageNet were used, and augmentation with multiple preprocessing techniques. Ensembles were performed on the best-performing algorithms. For cases where the classifiers were in disagreement, an independent board-certified cardiothoracic radiologist blindly interpreted the images to evaluate a potential radiologist-augmented workflow. Receiver operating characteristic curves and areas under the curve (AUCs) were used to assess model performance by using the DeLong method for statistical comparison of receiver operating characteristic curves. Results The best-performing classifier had an AUC of 0.99, which was an ensemble of the AlexNet and GoogLeNet DCNNs. The AUCs of the pretrained models were greater than that of the untrained models (P < .001). Augmenting the dataset further increased accuracy (P values for AlexNet and GoogLeNet were .03 and .02, respectively). The DCNNs had disagreement in 13 of the 150 test cases, which were blindly reviewed by a cardiothoracic radiologist, who correctly interpreted all 13 cases (100%). This radiologist-augmented approach resulted in a sensitivity of 97.3% and specificity 100%. Conclusion Deep learning with DCNNs can accurately classify TB at chest radiography with an AUC of 0.99. A radiologist-augmented approach for cases where there was disagreement among the classifiers further improved accuracy. © RSNA, 2017.
IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework.

PubMed

Bashir, Saba; Qamar, Usman; Khan, Farhan Hassan

2016-02-01

Accuracy plays a vital role in the medical field as it concerns with the life of an individual. Extensive research has been conducted on disease classification and prediction using machine learning techniques. However, there is no agreement on which classifier produces the best results. A specific classifier may be better than others for a specific dataset, but another classifier could perform better for some other dataset. Ensemble of classifiers has been proved to be an effective way to improve classification accuracy. In this research we present an ensemble framework with multi-layer classification using enhanced bagging and optimized weighting. The proposed model called "HM-BagMoov" overcomes the limitations of conventional performance bottlenecks by utilizing an ensemble of seven heterogeneous classifiers. The framework is evaluated on five different heart disease datasets, four breast cancer datasets, two diabetes datasets, two liver disease datasets and one hepatitis dataset obtained from public repositories. The analysis of the results show that ensemble framework achieved the highest accuracy, sensitivity and F-Measure when compared with individual classifiers for all the diseases. In addition to this, the ensemble framework also achieved the highest accuracy when compared with the state of the art techniques. An application named "IntelliHealth" is also developed based on proposed model that may be used by hospitals/doctors for diagnostic advice. Copyright © 2015 Elsevier Inc. All rights reserved.
Glyph-based analysis of multimodal directional distributions in vector field ensembles

NASA Astrophysics Data System (ADS)

Jarema, Mihaela; Demir, Ismail; Kehrer, Johannes; Westermann, Rüdiger

2015-04-01

Ensemble simulations are increasingly often performed in the geosciences in order to study the uncertainty and variability of model predictions. Describing ensemble data by mean and standard deviation can be misleading in case of multimodal distributions. We present first results of a glyph-based visualization of multimodal directional distributions in 2D and 3D vector ensemble data. Directional information on the circle/sphere is modeled using mixtures of probability density functions (pdfs), which enables us to characterize the distributions with relatively few parameters. The resulting mixture models are represented by 2D and 3D lobular glyphs showing direction, spread and strength of each principal mode of the distributions. A 3D extension of our approach is realized by means of an efficient GPU rendering technique. We demonstrate our method in the context of ensemble weather simulations.
A data-driven multi-model methodology with deep feature selection for short-term wind forecasting

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feng, Cong; Cui, Mingjian; Hodge, Bri-Mathias

With the growing wind penetration into the power system worldwide, improving wind power forecasting accuracy is becoming increasingly important to ensure continued economic and reliable power system operations. In this paper, a data-driven multi-model wind forecasting methodology is developed with a two-layer ensemble machine learning technique. The first layer is composed of multiple machine learning models that generate individual forecasts. A deep feature selection framework is developed to determine the most suitable inputs to the first layer machine learning models. Then, a blending algorithm is applied in the second layer to create an ensemble of the forecasts produced by firstmore » layer models and generate both deterministic and probabilistic forecasts. This two-layer model seeks to utilize the statistically different characteristics of each machine learning algorithm. A number of machine learning algorithms are selected and compared in both layers. This developed multi-model wind forecasting methodology is compared to several benchmarks. The effectiveness of the proposed methodology is evaluated to provide 1-hour-ahead wind speed forecasting at seven locations of the Surface Radiation network. Numerical results show that comparing to the single-algorithm models, the developed multi-model framework with deep feature selection procedure has improved the forecasting accuracy by up to 30%.« less
Hyperparameterization of soil moisture statistical models for North America with Ensemble Learning Models (Elm)

NASA Astrophysics Data System (ADS)

Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.

2017-12-01

Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.
Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI.

PubMed

An, Yi; Wang, Jiawei; Li, Chen; Leier, André; Marquez-Lago, Tatiana; Wilksch, Jonathan; Zhang, Yang; Webb, Geoffrey I; Song, Jiangning; Lithgow, Trevor

2018-01-01

Bacterial effector proteins secreted by various protein secretion systems play crucial roles in host-pathogen interactions. In this context, computational tools capable of accurately predicting effector proteins of the various types of bacterial secretion systems are highly desirable. Existing computational approaches use different machine learning (ML) techniques and heterogeneous features derived from protein sequences and/or structural information. These predictors differ not only in terms of the used ML methods but also with respect to the used curated data sets, the features selection and their prediction performance. Here, we provide a comprehensive survey and benchmarking of currently available tools for the prediction of effector proteins of bacterial types III, IV and VI secretion systems (T3SS, T4SS and T6SS, respectively). We review core algorithms, feature selection techniques, tool availability and applicability and evaluate the prediction performance based on carefully curated independent test data sets. In an effort to improve predictive performance, we constructed three ensemble models based on ML algorithms by integrating the output of all individual predictors reviewed. Our benchmarks demonstrate that these ensemble models outperform all the reviewed tools for the prediction of effector proteins of T3SS and T4SS. The webserver of the proposed ensemble methods for T3SS and T4SS effector protein prediction is freely available at http://tbooster.erc.monash.edu/index.jsp. We anticipate that this survey will serve as a useful guide for interested users and that the new ensemble predictors will stimulate research into host-pathogen relationships and inspiration for the development of new bioinformatics tools for predicting effector proteins of T3SS, T4SS and T6SS. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Appraisal of geodynamic inversion results: a data mining approach

NASA Astrophysics Data System (ADS)

Baumann, T. S.

2016-11-01

Bayesian sampling based inversions require many thousands or even millions of forward models, depending on how nonlinear or non-unique the inverse problem is, and how many unknowns are involved. The result of such a probabilistic inversion is not a single `best-fit' model, but rather a probability distribution that is represented by the entire model ensemble. Often, a geophysical inverse problem is non-unique, and the corresponding posterior distribution is multimodal, meaning that the distribution consists of clusters with similar models that represent the observations equally well. In these cases, we would like to visualize the characteristic model properties within each of these clusters of models. However, even for a moderate number of inversion parameters, a manual appraisal for a large number of models is not feasible. This poses the question whether it is possible to extract end-member models that represent each of the best-fit regions including their uncertainties. Here, I show how a machine learning tool can be used to characterize end-member models, including their uncertainties, from a complete model ensemble that represents a posterior probability distribution. The model ensemble used here results from a nonlinear geodynamic inverse problem, where rheological properties of the lithosphere are constrained from multiple geophysical observations. It is demonstrated that by taking vertical cross-sections through the effective viscosity structure of each of the models, the entire model ensemble can be classified into four end-member model categories that have a similar effective viscosity structure. These classification results are helpful to explore the non-uniqueness of the inverse problem and can be used to compute representative data fits for each of the end-member models. Conversely, these insights also reveal how new observational constraints could reduce the non-uniqueness. The method is not limited to geodynamic applications and a generalized MATLAB code is provided to perform the appraisal analysis.
A statistical analysis of three ensembles of crop model responses to temperature and CO2 concentration

USDA-ARS?s Scientific Manuscript database

Ensembles of process-based crop models are now commonly used to simulate crop growth and development for climate scenarios of temperature and/or precipitation changes corresponding to different projections of atmospheric CO2 concentrations. This approach generates large datasets with thousands of de...
Korean Percussion Ensemble ("Samulnori") in the General Music Classroom

ERIC Educational Resources Information Center

Kang, Sangmi; Yoo, Hyesoo

2016-01-01

This article introduces "samulnori" (Korean percussion ensemble), its cultural background, and instructional methods as parts of a classroom approach to teaching upper-level general music. We introduce five of eight sections from "youngnam nong-ak" (a style of samulnori) as a repertoire for teaching Korean percussion music to…
Ensemble-marginalized Kalman filter for linear time-dependent PDEs with noisy boundary conditions: application to heat transfer in building walls

NASA Astrophysics Data System (ADS)

Iglesias, Marco; Sawlan, Zaid; Scavino, Marco; Tempone, Raúl; Wood, Christopher

2018-07-01

In this work, we present the ensemble-marginalized Kalman filter (EnMKF), a sequential algorithm analogous to our previously proposed approach (Ruggeri et al 2017 Bayesian Anal. 12 407–33, Iglesias et al 2018 Int. J. Heat Mass Transfer 116 417–31), for estimating the state and parameters of linear parabolic partial differential equations in initial-boundary value problems when the boundary data are noisy. We apply EnMKF to infer the thermal properties of building walls and to estimate the corresponding heat flux from real and synthetic data. Compared with a modified ensemble Kalman filter (EnKF) that is not marginalized, EnMKF reduces the bias error, avoids the collapse of the ensemble without needing to add inflation, and converges to the mean field posterior using or less of the ensemble size required by EnKF. According to our results, the marginalization technique in EnMKF is key to performance improvement with smaller ensembles at any fixed time.
Similarity Measures for Protein Ensembles

PubMed Central

Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper

2009-01-01

Analyses of similarities and changes in protein conformation can provide important information regarding protein function and evolution. Many scores, including the commonly used root mean square deviation, have therefore been developed to quantify the similarities of different protein conformations. However, instead of examining individual conformations it is in many cases more relevant to analyse ensembles of conformations that have been obtained either through experiments or from methods such as molecular dynamics simulations. We here present three approaches that can be used to compare conformational ensembles in the same way as the root mean square deviation is used to compare individual pairs of structures. The methods are based on the estimation of the probability distributions underlying the ensembles and subsequent comparison of these distributions. We first validate the methods using a synthetic example from molecular dynamics simulations. We then apply the algorithms to revisit the problem of ensemble averaging during structure determination of proteins, and find that an ensemble refinement method is able to recover the correct distribution of conformations better than standard single-molecule refinement. PMID:19145244

Evolutionary Wavelet Neural Network ensembles for breast cancer and Parkinson's disease prediction.

PubMed

Khan, Maryam Mahsal; Mendes, Alexandre; Chalup, Stephan K

2018-01-01

Wavelet Neural Networks are a combination of neural networks and wavelets and have been mostly used in the area of time-series prediction and control. Recently, Evolutionary Wavelet Neural Networks have been employed to develop cancer prediction models. The present study proposes to use ensembles of Evolutionary Wavelet Neural Networks. The search for a high quality ensemble is directed by a fitness function that incorporates the accuracy of the classifiers both independently and as part of the ensemble itself. The ensemble approach is tested on three publicly available biomedical benchmark datasets, one on Breast Cancer and two on Parkinson's disease, using a 10-fold cross-validation strategy. Our experimental results show that, for the first dataset, the performance was similar to previous studies reported in literature. On the second dataset, the Evolutionary Wavelet Neural Network ensembles performed better than all previous methods. The third dataset is relatively new and this study is the first to report benchmark results.
Evolutionary Wavelet Neural Network ensembles for breast cancer and Parkinson’s disease prediction

PubMed Central

Mendes, Alexandre; Chalup, Stephan K.

2018-01-01

Wavelet Neural Networks are a combination of neural networks and wavelets and have been mostly used in the area of time-series prediction and control. Recently, Evolutionary Wavelet Neural Networks have been employed to develop cancer prediction models. The present study proposes to use ensembles of Evolutionary Wavelet Neural Networks. The search for a high quality ensemble is directed by a fitness function that incorporates the accuracy of the classifiers both independently and as part of the ensemble itself. The ensemble approach is tested on three publicly available biomedical benchmark datasets, one on Breast Cancer and two on Parkinson’s disease, using a 10-fold cross-validation strategy. Our experimental results show that, for the first dataset, the performance was similar to previous studies reported in literature. On the second dataset, the Evolutionary Wavelet Neural Network ensembles performed better than all previous methods. The third dataset is relatively new and this study is the first to report benchmark results. PMID:29420578
Understanding independence

NASA Astrophysics Data System (ADS)

Annan, James; Hargreaves, Julia

2016-04-01

In order to perform any Bayesian processing of a model ensemble, we need a prior over the ensemble members. In the case of multimodel ensembles such as CMIP, the historical approach of ``model democracy'' (i.e. equal weight for all models in the sample) is no longer credible (if it ever was) due to model duplication and inbreeding. The question of ``model independence'' is central to the question of prior weights. However, although this question has been repeatedly raised, it has not yet been satisfactorily addressed. Here I will discuss the issue of independence and present a theoretical foundation for understanding and analysing the ensemble in this context. I will also present some simple examples showing how these ideas may be applied and developed.
Using the Elements of Cooperative Learning in School Band Classes in the United States

ERIC Educational Resources Information Center

Whitener, John L.

2016-01-01

The purpose of this article is to answer the question of how we might use the elements of cooperative learning in school band classes in the United States. Current school band programs use age-old traditions that overemphasize group and individual competitiveness, stress large ensemble performance at the expense of all other activities, are…
An ensemble model of QSAR tools for regulatory risk assessment.

PubMed

Pradeep, Prachi; Povinelli, Richard J; White, Shannon; Merrill, Stephen J

2016-01-01

Quantitative structure activity relationships (QSARs) are theoretical models that relate a quantitative measure of chemical structure to a physical property or a biological effect. QSAR predictions can be used for chemical risk assessment for protection of human and environmental health, which makes them interesting to regulators, especially in the absence of experimental data. For compatibility with regulatory use, QSAR models should be transparent, reproducible and optimized to minimize the number of false negatives. In silico QSAR tools are gaining wide acceptance as a faster alternative to otherwise time-consuming clinical and animal testing methods. However, different QSAR tools often make conflicting predictions for a given chemical and may also vary in their predictive performance across different chemical datasets. In a regulatory context, conflicting predictions raise interpretation, validation and adequacy concerns. To address these concerns, ensemble learning techniques in the machine learning paradigm can be used to integrate predictions from multiple tools. By leveraging various underlying QSAR algorithms and training datasets, the resulting consensus prediction should yield better overall predictive ability. We present a novel ensemble QSAR model using Bayesian classification. The model allows for varying a cut-off parameter that allows for a selection in the desirable trade-off between model sensitivity and specificity. The predictive performance of the ensemble model is compared with four in silico tools (Toxtree, Lazar, OECD Toolbox, and Danish QSAR) to predict carcinogenicity for a dataset of air toxins (332 chemicals) and a subset of the gold carcinogenic potency database (480 chemicals). Leave-one-out cross validation results show that the ensemble model achieves the best trade-off between sensitivity and specificity (accuracy: 83.8 % and 80.4 %, and balanced accuracy: 80.6 % and 80.8 %) and highest inter-rater agreement [kappa ( κ ): 0.63 and 0.62] for both the datasets. The ROC curves demonstrate the utility of the cut-off feature in the predictive ability of the ensemble model. This feature provides an additional control to the regulators in grading a chemical based on the severity of the toxic endpoint under study.
An ensemble model of QSAR tools for regulatory risk assessment

DOE PAGES

Pradeep, Prachi; Povinelli, Richard J.; White, Shannon; ...

2016-09-22

Quantitative structure activity relationships (QSARs) are theoretical models that relate a quantitative measure of chemical structure to a physical property or a biological effect. QSAR predictions can be used for chemical risk assessment for protection of human and environmental health, which makes them interesting to regulators, especially in the absence of experimental data. For compatibility with regulatory use, QSAR models should be transparent, reproducible and optimized to minimize the number of false negatives. In silico QSAR tools are gaining wide acceptance as a faster alternative to otherwise time-consuming clinical and animal testing methods. However, different QSAR tools often make conflictingmore » predictions for a given chemical and may also vary in their predictive performance across different chemical datasets. In a regulatory context, conflicting predictions raise interpretation, validation and adequacy concerns. To address these concerns, ensemble learning techniques in the machine learning paradigm can be used to integrate predictions from multiple tools. By leveraging various underlying QSAR algorithms and training datasets, the resulting consensus prediction should yield better overall predictive ability. We present a novel ensemble QSAR model using Bayesian classification. The model allows for varying a cut-off parameter that allows for a selection in the desirable trade-off between model sensitivity and specificity. The predictive performance of the ensemble model is compared with four in silico tools (Toxtree, Lazar, OECD Toolbox, and Danish QSAR) to predict carcinogenicity for a dataset of air toxins (332 chemicals) and a subset of the gold carcinogenic potency database (480 chemicals). Leave-one-out cross validation results show that the ensemble model achieves the best trade-off between sensitivity and specificity (accuracy: 83.8 % and 80.4 %, and balanced accuracy: 80.6 % and 80.8 %) and highest inter-rater agreement [kappa (κ): 0.63 and 0.62] for both the datasets. The ROC curves demonstrate the utility of the cut-off feature in the predictive ability of the ensemble model. In conclusion, this feature provides an additional control to the regulators in grading a chemical based on the severity of the toxic endpoint under study.« less
Prediction of conformationally dependent atomic multipole moments in carbohydrates

PubMed Central

Cardamone, Salvatore

2015-01-01

The conformational flexibility of carbohydrates is challenging within the field of computational chemistry. This flexibility causes the electron density to change, which leads to fluctuating atomic multipole moments. Quantum Chemical Topology (QCT) allows for the partitioning of an “atom in a molecule,” thus localizing electron density to finite atomic domains, which permits the unambiguous evaluation of atomic multipole moments. By selecting an ensemble of physically realistic conformers of a chemical system, one evaluates the various multipole moments at defined points in configuration space. The subsequent implementation of the machine learning method kriging delivers the evaluation of an analytical function, which smoothly interpolates between these points. This allows for the prediction of atomic multipole moments at new points in conformational space, not trained for but within prediction range. In this work, we demonstrate that the carbohydrates erythrose and threose are amenable to the above methodology. We investigate how kriging models respond when the training ensemble incorporating multiple energy minima and their environment in conformational space. Additionally, we evaluate the gains in predictive capacity of our models as the size of the training ensemble increases. We believe this approach to be entirely novel within the field of carbohydrates. For a modest training set size of 600, more than 90% of the external test configurations have an error in the total (predicted) electrostatic energy (relative to ab initio) of maximum 1 kJ mol−1 for open chains and just over 90% an error of maximum 4 kJ mol−1 for rings. © 2015 Wiley Periodicals, Inc. PMID:26547500
Prediction of conformationally dependent atomic multipole moments in carbohydrates.

PubMed

Cardamone, Salvatore; Popelier, Paul L A

2015-12-15

The conformational flexibility of carbohydrates is challenging within the field of computational chemistry. This flexibility causes the electron density to change, which leads to fluctuating atomic multipole moments. Quantum Chemical Topology (QCT) allows for the partitioning of an "atom in a molecule," thus localizing electron density to finite atomic domains, which permits the unambiguous evaluation of atomic multipole moments. By selecting an ensemble of physically realistic conformers of a chemical system, one evaluates the various multipole moments at defined points in configuration space. The subsequent implementation of the machine learning method kriging delivers the evaluation of an analytical function, which smoothly interpolates between these points. This allows for the prediction of atomic multipole moments at new points in conformational space, not trained for but within prediction range. In this work, we demonstrate that the carbohydrates erythrose and threose are amenable to the above methodology. We investigate how kriging models respond when the training ensemble incorporating multiple energy minima and their environment in conformational space. Additionally, we evaluate the gains in predictive capacity of our models as the size of the training ensemble increases. We believe this approach to be entirely novel within the field of carbohydrates. For a modest training set size of 600, more than 90% of the external test configurations have an error in the total (predicted) electrostatic energy (relative to ab initio) of maximum 1 kJ mol(-1) for open chains and just over 90% an error of maximum 4 kJ mol(-1) for rings. © 2015 Wiley Periodicals, Inc.
An Ensemble Approach for Drug Side Effect Prediction

PubMed Central

Jahid, Md Jamiul; Ruan, Jianhua

2014-01-01

In silico prediction of drug side-effects in early stage of drug development is becoming more popular now days, which not only reduces the time for drug design but also reduces the drug development costs. In this article we propose an ensemble approach to predict drug side-effects of drug molecules based on their chemical structure. Our idea originates from the observation that similar drugs have similar side-effects. Based on this observation we design an ensemble approach that combine the results from different classification models where each model is generated by a different set of similar drugs. We applied our approach to 1385 side-effects in the SIDER database for 888 drugs. Results show that our approach outperformed previously published approaches and standard classifiers. Furthermore, we applied our method to a number of uncharacterized drug molecules in DrugBank database and predict their side-effect profiles for future usage. Results from various sources confirm that our method is able to predict the side-effects for uncharacterized drugs and more importantly able to predict rare side-effects which are often ignored by other approaches. The method described in this article can be useful to predict side-effects in drug design in an early stage to reduce experimental cost and time. PMID:25327524
Simultaneous calibration of ensemble river flow predictions over an entire range of lead times

NASA Astrophysics Data System (ADS)

Hemri, S.; Fundel, F.; Zappa, M.

2013-10-01

Probabilistic estimates of future water levels and river discharge are usually simulated with hydrologic models using ensemble weather forecasts as main inputs. As hydrologic models are imperfect and the meteorological ensembles tend to be biased and underdispersed, the ensemble forecasts for river runoff typically are biased and underdispersed, too. Thus, in order to achieve both reliable and sharp predictions statistical postprocessing is required. In this work Bayesian model averaging (BMA) is applied to statistically postprocess ensemble runoff raw forecasts for a catchment in Switzerland, at lead times ranging from 1 to 240 h. The raw forecasts have been obtained using deterministic and ensemble forcing meteorological models with different forecast lead time ranges. First, BMA is applied based on mixtures of univariate normal distributions, subject to the assumption of independence between distinct lead times. Then, the independence assumption is relaxed in order to estimate multivariate runoff forecasts over the entire range of lead times simultaneously, based on a BMA version that uses multivariate normal distributions. Since river runoff is a highly skewed variable, Box-Cox transformations are applied in order to achieve approximate normality. Both univariate and multivariate BMA approaches are able to generate well calibrated probabilistic forecasts that are considerably sharper than climatological forecasts. Additionally, multivariate BMA provides a promising approach for incorporating temporal dependencies into the postprocessed forecasts. Its major advantage against univariate BMA is an increase in reliability when the forecast system is changing due to model availability.
Analysis of the interface variability in NMR structure ensembles of protein-protein complexes.

PubMed

Calvanese, Luisa; D'Auria, Gabriella; Vangone, Anna; Falcigno, Lucia; Oliva, Romina

2016-06-01

NMR structures consist in ensembles of conformers, all satisfying the experimental restraints, which exhibit a certain degree of structural variability. We analyzed here the interface in NMR ensembles of protein-protein heterodimeric complexes and found it to span a wide range of different conservations. The different exhibited conservations do not simply correlate with the size of the systems/interfaces, and are most probably the result of an interplay between different factors, including the quality of experimental data and the intrinsic complex flexibility. In any case, this information is not to be missed when NMR structures of protein-protein complexes are analyzed; especially considering that, as we also show here, the first NMR conformer is usually not the one which best reflects the overall interface. To quantify the interface conservation and to analyze it, we used an approach originally conceived for the analysis and ranking of ensembles of docking models, which has now been extended to directly deal with NMR ensembles. We propose this approach, based on the conservation of the inter-residue contacts at the interface, both for the analysis of the interface in whole ensembles of NMR complexes and for the possible selection of a single conformer as the best representative of the overall interface. In order to make the analyses automatic and fast, we made the protocol available as a web tool at: https://www.molnac.unisa.it/BioTools/consrank/consrank-nmr.html. Copyright © 2016 Elsevier Inc. All rights reserved.
An 'Observational Large Ensemble' to compare observed and modeled temperature trend uncertainty due to internal variability.

NASA Astrophysics Data System (ADS)

Poppick, A. N.; McKinnon, K. A.; Dunn-Sigouin, E.; Deser, C.

2017-12-01

Initial condition climate model ensembles suggest that regional temperature trends can be highly variable on decadal timescales due to characteristics of internal climate variability. Accounting for trend uncertainty due to internal variability is therefore necessary to contextualize recent observed temperature changes. However, while the variability of trends in a climate model ensemble can be evaluated directly (as the spread across ensemble members), internal variability simulated by a climate model may be inconsistent with observations. Observation-based methods for assessing the role of internal variability on trend uncertainty are therefore required. Here, we use a statistical resampling approach to assess trend uncertainty due to internal variability in historical 50-year (1966-2015) winter near-surface air temperature trends over North America. We compare this estimate of trend uncertainty to simulated trend variability in the NCAR CESM1 Large Ensemble (LENS), finding that uncertainty in wintertime temperature trends over North America due to internal variability is largely overestimated by CESM1, on average by a factor of 32%. Our observation-based resampling approach is combined with the forced signal from LENS to produce an 'Observational Large Ensemble' (OLENS). The members of OLENS indicate a range of spatially coherent fields of temperature trends resulting from different sequences of internal variability consistent with observations. The smaller trend variability in OLENS suggests that uncertainty in the historical climate change signal in observations due to internal variability is less than suggested by LENS.
iRSpot-EL: identify recombination spots with an ensemble learning approach.

PubMed

Liu, Bin; Wang, Shanyi; Long, Ren; Chou, Kuo-Chen

2017-01-01

Coexisting in a DNA system, meiosis and recombination are two indispensible aspects for cell reproduction and growth. With the avalanche of genome sequences emerging in the post-genomic age, it is an urgent challenge to acquire the information of DNA recombination spots because it can timely provide very useful insights into the mechanism of meiotic recombination and the process of genome evolution. To address such a challenge, we have developed a predictor, called IRSPOT-EL: , by fusing different modes of pseudo K-tuple nucleotide composition and mode of dinucleotide-based auto-cross covariance into an ensemble classifier of clustering approach. Five-fold cross tests on a widely used benchmark dataset have indicated that the new predictor remarkably outperforms its existing counterparts. Particularly, far beyond their reach, the new predictor can be easily used to conduct the genome-wide analysis and the results obtained are quite consistent with the experimental map. For the convenience of most experimental scientists, a user-friendly web-server for iRSpot-EL has been established at http://bioinformatics.hitsz.edu.cn/iRSpot-EL/, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. bliu@gordonlifescience.org or bliu@insun.hit.edu.cnSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A common fallacy in climate model evaluation

NASA Astrophysics Data System (ADS)

Annan, J. D.; Hargreaves, J. C.; Tachiiri, K.

2012-04-01

We discuss the assessment of model ensembles such as that arising from the CMIP3 coordinated multi-model experiments. An important aspect of this is not merely the closeness of the models to observations in absolute terms but also the reliability of the ensemble spread as an indication of uncertainty. In this context, it has been widely argued that the multi-model ensemble of opportunity is insufficiently broad to adequately represent uncertainties regarding future climate change. For example, the IPCC AR4 summarises the consensus with the sentence: "Those studies also suggest that the current AOGCMs may not cover the full range of uncertainty for climate sensitivity." Similar claims have been made in the literature for other properties of the climate system, including the transient climate response and efficiency of ocean heat uptake. Comparison of model outputs with observations of the climate system forms an essential component of model assessment and is crucial for building our confidence in model predictions. However, methods for undertaking this comparison are not always clearly justified and understood. Here we show that the popular approach which forms the basis for the above claims, of comparing the ensemble spread to a so-called "observationally-constrained pdf", can be highly misleading. Such a comparison will almost certainly result in disagreement, but in reality tells us little about the performance of the ensemble. We present an alternative approach based on an assessment of the predictive performance of the ensemble, and show how it may lead to very different, and rather more encouraging, conclusions. We additionally outline some necessary conditions for an ensemble (or more generally, a probabilistic prediction) to be challenged by an observation.
An operational mesoscale ensemble data assimilation and prediction system: E-RTFDDA

NASA Astrophysics Data System (ADS)

Liu, Y.; Hopson, T.; Roux, G.; Hacker, J.; Xu, M.; Warner, T.; Swerdlin, S.

2009-04-01

Mesoscale (2-2000 km) meteorological processes differ from synoptic circulations in that mesoscale weather changes rapidly in space and time, and physics processes that are parameterized in NWP models play a great role. Complex interactions of synoptic circulations, regional and local terrain, land-surface heterogeneity, and associated physical properties, and the physical processes of radiative transfer, cloud and precipitation and boundary layer mixing, are crucial in shaping regional weather and climate. Mesoscale ensemble analysis and prediction should sample the uncertainties of mesoscale modeling systems in representing these factors. An innovative mesoscale Ensemble Real-Time Four Dimensional Data Assimilation (E-RTFDDA) and forecasting system has been developed at NCAR. E-RTFDDA contains diverse ensemble perturbation approaches that consider uncertainties in all major system components to produce multi-scale continuously-cycling probabilistic data assimilation and forecasting. A 30-member E-RTFDDA system with three nested domains with grid sizes of 30, 10 and 3.33 km has been running on a Department of Defense high-performance computing platform since September 2007. It has been applied at two very different US geographical locations; one in the western inter-mountain area and the other in the northeastern states, producing 6 hour analyses and 48 hour forecasts, with 4 forecast cycles a day. The operational model outputs are analyzed to a) assess overall ensemble performance and properties, b) study terrain effect on mesoscale predictability, c) quantify the contribution of different ensemble perturbation approaches to the overall forecast skill, and d) assess the additional contributed skill from an ensemble calibration process based on a quantile-regression algorithm. The system and the results will be reported at the meeting.
Ensemble method for dengue prediction.

PubMed

Buczak, Anna L; Baugher, Benjamin; Moniz, Linda J; Bagley, Thomas; Babin, Steven M; Guven, Erhan

2018-01-01

In the 2015 NOAA Dengue Challenge, participants made three dengue target predictions for two locations (Iquitos, Peru, and San Juan, Puerto Rico) during four dengue seasons: 1) peak height (i.e., maximum weekly number of cases during a transmission season; 2) peak week (i.e., week in which the maximum weekly number of cases occurred); and 3) total number of cases reported during a transmission season. A dengue transmission season is the 12-month period commencing with the location-specific, historical week with the lowest number of cases. At the beginning of the Dengue Challenge, participants were provided with the same input data for developing the models, with the prediction testing data provided at a later date. Our approach used ensemble models created by combining three disparate types of component models: 1) two-dimensional Method of Analogues models incorporating both dengue and climate data; 2) additive seasonal Holt-Winters models with and without wavelet smoothing; and 3) simple historical models. Of the individual component models created, those with the best performance on the prior four years of data were incorporated into the ensemble models. There were separate ensembles for predicting each of the three targets at each of the two locations. Our ensemble models scored higher for peak height and total dengue case counts reported in a transmission season for Iquitos than all other models submitted to the Dengue Challenge. However, the ensemble models did not do nearly as well when predicting the peak week. The Dengue Challenge organizers scored the dengue predictions of the Challenge participant groups. Our ensemble approach was the best in predicting the total number of dengue cases reported for transmission season and peak height for Iquitos, Peru.
Ensemble method for dengue prediction

PubMed Central

Baugher, Benjamin; Moniz, Linda J.; Bagley, Thomas; Babin, Steven M.; Guven, Erhan

2018-01-01

Background In the 2015 NOAA Dengue Challenge, participants made three dengue target predictions for two locations (Iquitos, Peru, and San Juan, Puerto Rico) during four dengue seasons: 1) peak height (i.e., maximum weekly number of cases during a transmission season; 2) peak week (i.e., week in which the maximum weekly number of cases occurred); and 3) total number of cases reported during a transmission season. A dengue transmission season is the 12-month period commencing with the location-specific, historical week with the lowest number of cases. At the beginning of the Dengue Challenge, participants were provided with the same input data for developing the models, with the prediction testing data provided at a later date. Methods Our approach used ensemble models created by combining three disparate types of component models: 1) two-dimensional Method of Analogues models incorporating both dengue and climate data; 2) additive seasonal Holt-Winters models with and without wavelet smoothing; and 3) simple historical models. Of the individual component models created, those with the best performance on the prior four years of data were incorporated into the ensemble models. There were separate ensembles for predicting each of the three targets at each of the two locations. Principal findings Our ensemble models scored higher for peak height and total dengue case counts reported in a transmission season for Iquitos than all other models submitted to the Dengue Challenge. However, the ensemble models did not do nearly as well when predicting the peak week. Conclusions The Dengue Challenge organizers scored the dengue predictions of the Challenge participant groups. Our ensemble approach was the best in predicting the total number of dengue cases reported for transmission season and peak height for Iquitos, Peru. PMID:29298320
Impact assessment of climate change on tourism in the Pacific small islands based on the database of long-term high-resolution climate ensemble experiments

NASA Astrophysics Data System (ADS)

Watanabe, S.; Utsumi, N.; Take, M.; Iida, A.

2016-12-01

This study aims to develop a new approach to assess the impact of climate change on the small oceanic islands in the Pacific. In the new approach, the change of the probabilities of various situations was projected with considering the spread of projection derived from ensemble simulations, instead of projecting the most probable situation. The database for Policy Decision making for Future climate change (d4PDF) is a database of long-term high-resolution climate ensemble experiments, which has the results of 100 ensemble simulations. We utilized the database for Policy Decision making for Future climate change (d4PDF), which was (a long-term and high-resolution database) composed of results of 100 ensemble experiments. A new methodology, Multi Threshold Ensemble Assessment (MTEA), was developed using the d4PDF in order to assess the impact of climate change. We focused on the impact of climate change on tourism because it has played an important role in the economy of the Pacific Islands. The Yaeyama Region, one of the tourist destinations in Okinawa, Japan, was selected as the case study site. Two kinds of impact were assessed: change in probability of extreme climate phenomena and tourist satisfaction associated with weather. The database of long-term high-resolution climate ensemble experiments and the questionnaire survey conducted by a local government were used for the assessment. The result indicated that the strength of extreme events would be increased, whereas the probability of occurrence would be decreased. This change should result in increase of the number of clear days and it could contribute to improve the tourist satisfaction.
NMR Studies of Dynamic Biomolecular Conformational Ensembles

PubMed Central

Torchia, Dennis A.

2015-01-01

Multidimensional heteronuclear NMR approaches can provide nearly complete sequential signal assignments of isotopically enriched biomolecules. The availability of assignments together with measurements of spin relaxation rates, residual spin interactions, J-couplings and chemical shifts provides information at atomic resolution about internal dynamics on timescales ranging from ps to ms, both in solution and in the solid state. However, due to the complexity of biomolecules, it is not possible to extract a unique atomic-resolution description of biomolecular motions even from extensive NMR data when many conformations are sampled on multiple timescales. For this reason, powerful computational approaches are increasingly applied to large NMR data sets to elucidate conformational ensembles sampled by biomolecules. In the past decade, considerable attention has been directed at an important class of biomolecules that function by binding to a wide variety of target molecules. Questions of current interest are: “Does the free biomolecule sample a conformational ensemble that encompasses the conformations found when it binds to various targets; and if so, on what time scale is the ensemble sampled?” This article reviews recent efforts to answer these questions, with a focus on comparing ensembles obtained for the same biomolecules by different investigators. A detailed comparison of results obtained is provided for three biomolecules: ubiquitin, calmodulin and the HIV-1 trans-activation response RNA. PMID:25669739
A New Method for Determining Structure Ensemble: Application to a RNA Binding Di-Domain Protein.

PubMed

Liu, Wei; Zhang, Jingfeng; Fan, Jing-Song; Tria, Giancarlo; Grüber, Gerhard; Yang, Daiwen

2016-05-10

Structure ensemble determination is the basis of understanding the structure-function relationship of a multidomain protein with weak domain-domain interactions. Paramagnetic relaxation enhancement has been proven a powerful tool in the study of structure ensembles, but there exist a number of challenges such as spin-label flexibility, domain dynamics, and overfitting. Here we propose a new (to our knowledge) method to describe structure ensembles using a minimal number of conformers. In this method, individual domains are considered rigid; the position of each spin-label conformer and the structure of each protein conformer are defined by three and six orthogonal parameters, respectively. First, the spin-label ensemble is determined by optimizing the positions and populations of spin-label conformers against intradomain paramagnetic relaxation enhancements with a genetic algorithm. Subsequently, the protein structure ensemble is optimized using a more efficient genetic algorithm-based approach and an overfitting indicator, both of which were established in this work. The method was validated using a reference ensemble with a set of conformers whose populations and structures are known. This method was also applied to study the structure ensemble of the tandem di-domain of a poly (U) binding protein. The determined ensemble was supported by small-angle x-ray scattering and nuclear magnetic resonance relaxation data. The ensemble obtained suggests an induced fit mechanism for recognition of target RNA by the protein. Copyright © 2016 Biophysical Society. Published by Elsevier Inc. All rights reserved.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.