EFS: an ensemble feature selection tool implemented as R-package and web-application.
Neumann, Ursula; Genze, Nikita; Heider, Dominik
2017-01-01
Feature selection methods aim at identifying a subset of features that improve the prediction performance of subsequent classification models and thereby also simplify their interpretability. Preceding studies demonstrated that single feature selection methods can have specific biases, whereas an ensemble feature selection has the advantage to alleviate and compensate for these biases. The software EFS (Ensemble Feature Selection) makes use of multiple feature selection methods and combines their normalized outputs to a quantitative ensemble importance. Currently, eight different feature selection methods have been integrated in EFS, which can be used separately or combined in an ensemble. EFS identifies relevant features while compensating specific biases of single methods due to an ensemble approach. Thereby, EFS can improve the prediction accuracy and interpretability in subsequent binary classification models. EFS can be downloaded as an R-package from CRAN or used via a web application at http://EFS.heiderlab.de.
Oliveira, Roberta B; Pereira, Aledir S; Tavares, João Manuel R S
2017-10-01
The number of deaths worldwide due to melanoma has risen in recent times, in part because melanoma is the most aggressive type of skin cancer. Computational systems have been developed to assist dermatologists in early diagnosis of skin cancer, or even to monitor skin lesions. However, there still remains a challenge to improve classifiers for the diagnosis of such skin lesions. The main objective of this article is to evaluate different ensemble classification models based on input feature manipulation to diagnose skin lesions. Input feature manipulation processes are based on feature subset selections from shape properties, colour variation and texture analysis to generate diversity for the ensemble models. Three subset selection models are presented here: (1) a subset selection model based on specific feature groups, (2) a correlation-based subset selection model, and (3) a subset selection model based on feature selection algorithms. Each ensemble classification model is generated using an optimum-path forest classifier and integrated with a majority voting strategy. The proposed models were applied on a set of 1104 dermoscopic images using a cross-validation procedure. The best results were obtained by the first ensemble classification model that generates a feature subset ensemble based on specific feature groups. The skin lesion diagnosis computational system achieved 94.3% accuracy, 91.8% sensitivity and 96.7% specificity. The input feature manipulation process based on specific feature subsets generated the greatest diversity for the ensemble classification model with very promising results. Copyright © 2017 Elsevier B.V. All rights reserved.
A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.
Ni, Qianwu; Chen, Lei
2017-01-01
Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification. In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model. Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure. The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Dimensionality Reduction Through Classifier Ensembles
NASA Technical Reports Server (NTRS)
Oza, Nikunj C.; Tumer, Kagan; Norwig, Peter (Technical Monitor)
1999-01-01
In data mining, one often needs to analyze datasets with a very large number of attributes. Performing machine learning directly on such data sets is often impractical because of extensive run times, excessive complexity of the fitted model (often leading to overfitting), and the well-known "curse of dimensionality." In practice, to avoid such problems, feature selection and/or extraction are often used to reduce data dimensionality prior to the learning step. However, existing feature selection/extraction algorithms either evaluate features by their effectiveness across the entire data set or simply disregard class information altogether (e.g., principal component analysis). Furthermore, feature extraction algorithms such as principal components analysis create new features that are often meaningless to human users. In this article, we present input decimation, a method that provides "feature subsets" that are selected for their ability to discriminate among the classes. These features are subsequently used in ensembles of classifiers, yielding results superior to single classifiers, ensembles that use the full set of features, and ensembles based on principal component analysis on both real and synthetic datasets.
An efficient ensemble learning method for gene microarray classification.
Osareh, Alireza; Shadgar, Bita
2013-01-01
The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.
Sørensen, Lauge; Nielsen, Mads
2018-05-15
The International Challenge for Automated Prediction of MCI from MRI data offered independent, standardized comparison of machine learning algorithms for multi-class classification of normal control (NC), mild cognitive impairment (MCI), converting MCI (cMCI), and Alzheimer's disease (AD) using brain imaging and general cognition. We proposed to use an ensemble of support vector machines (SVMs) that combined bagging without replacement and feature selection. SVM is the most commonly used algorithm in multivariate classification of dementia, and it was therefore valuable to evaluate the potential benefit of ensembling this type of classifier. The ensemble SVM, using either a linear or a radial basis function (RBF) kernel, achieved multi-class classification accuracies of 55.6% and 55.0% in the challenge test set (60 NC, 60 MCI, 60 cMCI, 60 AD), resulting in a third place in the challenge. Similar feature subset sizes were obtained for both kernels, and the most frequently selected MRI features were the volumes of the two hippocampal subregions left presubiculum and right subiculum. Post-challenge analysis revealed that enforcing a minimum number of selected features and increasing the number of ensemble classifiers improved classification accuracy up to 59.1%. The ensemble SVM outperformed single SVM classifications consistently in the challenge test set. Ensemble methods using bagging and feature selection can improve the performance of the commonly applied SVM classifier in dementia classification. This resulted in competitive classification accuracies in the International Challenge for Automated Prediction of MCI from MRI data. Copyright © 2018 Elsevier B.V. All rights reserved.
Ensemble Feature Learning of Genomic Data Using Support Vector Machine
Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R.; Braytee, Ali; Kennedy, Paul J.
2016-01-01
The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. PMID:27304923
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Oza, Nikunj C.; Clancy, Daniel (Technical Monitor)
2001-01-01
Using an ensemble of classifiers instead of a single classifier has been shown to improve generalization performance in many pattern recognition problems. However, the extent of such improvement depends greatly on the amount of correlation among the errors of the base classifiers. Therefore, reducing those correlations while keeping the classifiers' performance levels high is an important area of research. In this article, we explore input decimation (ID), a method which selects feature subsets for their ability to discriminate among the classes and uses them to decouple the base classifiers. We provide a summary of the theoretical benefits of correlation reduction, along with results of our method on two underwater sonar data sets, three benchmarks from the Probenl/UCI repositories, and two synthetic data sets. The results indicate that input decimated ensembles (IDEs) outperform ensembles whose base classifiers use all the input features; randomly selected subsets of features; and features created using principal components analysis, on a wide range of domains.
Feature selection for the classification of traced neurons.
López-Cabrera, José D; Lorenzo-Ginori, Juan V
2018-06-01
The great availability of computational tools to calculate the properties of traced neurons leads to the existence of many descriptors which allow the automated classification of neurons from these reconstructions. This situation determines the necessity to eliminate irrelevant features as well as making a selection of the most appropriate among them, in order to improve the quality of the classification obtained. The dataset used contains a total of 318 traced neurons, classified by human experts in 192 GABAergic interneurons and 126 pyramidal cells. The features were extracted by means of the L-measure software, which is one of the most used computational tools in neuroinformatics to quantify traced neurons. We review some current feature selection techniques as filter, wrapper, embedded and ensemble methods. The stability of the feature selection methods was measured. For the ensemble methods, several aggregation methods based on different metrics were applied to combine the subsets obtained during the feature selection process. The subsets obtained applying feature selection methods were evaluated using supervised classifiers, among which Random Forest, C4.5, SVM, Naïve Bayes, Knn, Decision Table and the Logistic classifier were used as classification algorithms. Feature selection methods of types filter, embedded, wrappers and ensembles were compared and the subsets returned were tested in classification tasks for different classification algorithms. L-measure features EucDistanceSD, PathDistanceSD, Branch_pathlengthAve, Branch_pathlengthSD and EucDistanceAve were present in more than 60% of the selected subsets which provides evidence about their importance in the classification of this neurons. Copyright © 2018 Elsevier B.V. All rights reserved.
Jin, Mingwu; Deng, Weishu
2018-05-15
There is a spectrum of the progression from healthy control (HC) to mild cognitive impairment (MCI) without conversion to Alzheimer's disease (AD), to MCI with conversion to AD (cMCI), and to AD. This study aims to predict the different disease stages using brain structural information provided by magnetic resonance imaging (MRI) data. The neighborhood component analysis (NCA) is applied to select most powerful features for prediction. The ensemble decision tree classifier is built to predict which group the subject belongs to. The best features and model parameters are determined by cross validation of the training data. Our results show that 16 out of a total of 429 features were selected by NCA using 240 training subjects, including MMSE score and structural measures in memory-related regions. The boosting tree model with NCA features can achieve prediction accuracy of 56.25% on 160 test subjects. Principal component analysis (PCA) and sequential feature selection (SFS) are used for feature selection, while support vector machine (SVM) is used for classification. The boosting tree model with NCA features outperforms all other combinations of feature selection and classification methods. The results suggest that NCA be a better feature selection strategy than PCA and SFS for the data used in this study. Ensemble tree classifier with boosting is more powerful than SVM to predict the subject group. However, more advanced feature selection and classification methods or additional measures besides structural MRI may be needed to improve the prediction performance. Copyright © 2018 Elsevier B.V. All rights reserved.
Ozcift, Akin
2012-08-01
Parkinson disease (PD) is an age-related deterioration of certain nerve systems, which affects movement, balance, and muscle control of clients. PD is one of the common diseases which affect 1% of people older than 60 years. A new classification scheme based on support vector machine (SVM) selected features to train rotation forest (RF) ensemble classifiers is presented for improving diagnosis of PD. The dataset contains records of voice measurements from 31 people, 23 with PD and each record in the dataset is defined with 22 features. The diagnosis model first makes use of a linear SVM to select ten most relevant features from 22. As a second step of the classification model, six different classifiers are trained with the subset of features. Subsequently, at the third step, the accuracies of classifiers are improved by the utilization of RF ensemble classification strategy. The results of the experiments are evaluated using three metrics; classification accuracy (ACC), Kappa Error (KE) and Area under the Receiver Operating Characteristic (ROC) Curve (AUC). Performance measures of two base classifiers, i.e. KStar and IBk, demonstrated an apparent increase in PD diagnosis accuracy compared to similar studies in literature. After all, application of RF ensemble classification scheme improved PD diagnosis in 5 of 6 classifiers significantly. We, numerically, obtained about 97% accuracy in RF ensemble of IBk (a K-Nearest Neighbor variant) algorithm, which is a quite high performance for Parkinson disease diagnosis.
A target recognition method for maritime surveillance radars based on hybrid ensemble selection
NASA Astrophysics Data System (ADS)
Fan, Xueman; Hu, Shengliang; He, Jingbo
2017-11-01
In order to improve the generalisation ability of the maritime surveillance radar, a novel ensemble selection technique, termed Optimisation and Dynamic Selection (ODS), is proposed. During the optimisation phase, the non-dominated sorting genetic algorithm II for multi-objective optimisation is used to find the Pareto front, i.e. a set of ensembles of classifiers representing different tradeoffs between the classification error and diversity. During the dynamic selection phase, the meta-learning method is used to predict whether a candidate ensemble is competent enough to classify a query instance based on three different aspects, namely, feature space, decision space and the extent of consensus. The classification performance and time complexity of ODS are compared against nine other ensemble methods using a self-built full polarimetric high resolution range profile data-set. The experimental results clearly show the effectiveness of ODS. In addition, the influence of the selection of diversity measures is studied concurrently.
Farhan, Saima; Fahiem, Muhammad Abuzar; Tauseef, Huma
2014-01-01
Structural brain imaging is playing a vital role in identification of changes that occur in brain associated with Alzheimer's disease. This paper proposes an automated image processing based approach for the identification of AD from MRI of the brain. The proposed approach is novel in a sense that it has higher specificity/accuracy values despite the use of smaller feature set as compared to existing approaches. Moreover, the proposed approach is capable of identifying AD patients in early stages. The dataset selected consists of 85 age and gender matched individuals from OASIS database. The features selected are volume of GM, WM, and CSF and size of hippocampus. Three different classification models (SVM, MLP, and J48) are used for identification of patients and controls. In addition, an ensemble of classifiers, based on majority voting, is adopted to overcome the error caused by an independent base classifier. Ten-fold cross validation strategy is applied for the evaluation of our scheme. Moreover, to evaluate the performance of proposed approach, individual features and combination of features are fed to individual classifiers and ensemble based classifier. Using size of left hippocampus as feature, the accuracy achieved with ensemble of classifiers is 93.75%, with 100% specificity and 87.5% sensitivity.
No-Reference Image Quality Assessment by Wide-Perceptual-Domain Scorer Ensemble Method.
Liu, Tsung-Jung; Liu, Kuan-Hsien
2018-03-01
A no-reference (NR) learning-based approach to assess image quality is presented in this paper. The devised features are extracted from wide perceptual domains, including brightness, contrast, color, distortion, and texture. These features are used to train a model (scorer) which can predict scores. The scorer selection algorithms are utilized to help simplify the proposed system. In the final stage, the ensemble method is used to combine the prediction results from selected scorers. Two multiple-scale versions of the proposed approach are also presented along with the single-scale one. They turn out to have better performances than the original single-scale method. Because of having features from five different domains at multiple image scales and using the outputs (scores) from selected score prediction models as features for multi-scale or cross-scale fusion (i.e., ensemble), the proposed NR image quality assessment models are robust with respect to more than 24 image distortion types. They also can be used on the evaluation of images with authentic distortions. The extensive experiments on three well-known and representative databases confirm the performance robustness of our proposed model.
Minimalist ensemble algorithms for genome-wide protein localization prediction.
Lin, Jhih-Rong; Mondal, Ananda Mohan; Liu, Rong; Hu, Jianjun
2012-07-03
Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.
Minimalist ensemble algorithms for genome-wide protein localization prediction
2012-01-01
Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. PMID:22759391
Sequence Based Prediction of Antioxidant Proteins Using a Classifier Selection Strategy
Zhang, Lina; Zhang, Chengjin; Gao, Rui; Yang, Runtao; Song, Qing
2016-01-01
Antioxidant proteins perform significant functions in maintaining oxidation/antioxidation balance and have potential therapies for some diseases. Accurate identification of antioxidant proteins could contribute to revealing physiological processes of oxidation/antioxidation balance and developing novel antioxidation-based drugs. In this study, an ensemble method is presented to predict antioxidant proteins with hybrid features, incorporating SSI (Secondary Structure Information), PSSM (Position Specific Scoring Matrix), RSA (Relative Solvent Accessibility), and CTD (Composition, Transition, Distribution). The prediction results of the ensemble predictor are determined by an average of prediction results of multiple base classifiers. Based on a classifier selection strategy, we obtain an optimal ensemble classifier composed of RF (Random Forest), SMO (Sequential Minimal Optimization), NNA (Nearest Neighbor Algorithm), and J48 with an accuracy of 0.925. A Relief combined with IFS (Incremental Feature Selection) method is adopted to obtain optimal features from hybrid features. With the optimal features, the ensemble method achieves improved performance with a sensitivity of 0.95, a specificity of 0.93, an accuracy of 0.94, and an MCC (Matthew’s Correlation Coefficient) of 0.880, far better than the existing method. To evaluate the prediction performance objectively, the proposed method is compared with existing methods on the same independent testing dataset. Encouragingly, our method performs better than previous studies. In addition, our method achieves more balanced performance with a sensitivity of 0.878 and a specificity of 0.860. These results suggest that the proposed ensemble method can be a potential candidate for antioxidant protein prediction. For public access, we develop a user-friendly web server for antioxidant protein identification that is freely accessible at http://antioxidant.weka.cc. PMID:27662651
NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.
Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan
2014-01-01
One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.
NASA Astrophysics Data System (ADS)
Tang, Jian; Qiao, Junfei; Wu, ZhiWei; Chai, Tianyou; Zhang, Jian; Yu, Wen
2018-01-01
Frequency spectral data of mechanical vibration and acoustic signals relate to difficult-to-measure production quality and quantity parameters of complex industrial processes. A selective ensemble (SEN) algorithm can be used to build a soft sensor model of these process parameters by fusing valued information selectively from different perspectives. However, a combination of several optimized ensemble sub-models with SEN cannot guarantee the best prediction model. In this study, we use several techniques to construct mechanical vibration and acoustic frequency spectra of a data-driven industrial process parameter model based on selective fusion multi-condition samples and multi-source features. Multi-layer SEN (MLSEN) strategy is used to simulate the domain expert cognitive process. Genetic algorithm and kernel partial least squares are used to construct the inside-layer SEN sub-model based on each mechanical vibration and acoustic frequency spectral feature subset. Branch-and-bound and adaptive weighted fusion algorithms are integrated to select and combine outputs of the inside-layer SEN sub-models. Then, the outside-layer SEN is constructed. Thus, "sub-sampling training examples"-based and "manipulating input features"-based ensemble construction methods are integrated, thereby realizing the selective information fusion process based on multi-condition history samples and multi-source input features. This novel approach is applied to a laboratory-scale ball mill grinding process. A comparison with other methods indicates that the proposed MLSEN approach effectively models mechanical vibration and acoustic signals.
Hwang, Yoo Na; Lee, Ju Hwan; Kim, Ga Young; Shin, Eun Seok; Kim, Sung Min
2018-01-01
The purpose of this study was to propose a hybrid ensemble classifier to characterize coronary plaque regions in intravascular ultrasound (IVUS) images. Pixels were allocated to one of four tissues (fibrous tissue (FT), fibro-fatty tissue (FFT), necrotic core (NC), and dense calcium (DC)) through processes of border segmentation, feature extraction, feature selection, and classification. Grayscale IVUS images and their corresponding virtual histology images were acquired from 11 patients with known or suspected coronary artery disease using 20 MHz catheter. A total of 102 hybrid textural features including first order statistics (FOS), gray level co-occurrence matrix (GLCM), extended gray level run-length matrix (GLRLM), Laws, local binary pattern (LBP), intensity, and discrete wavelet features (DWF) were extracted from IVUS images. To select optimal feature sets, genetic algorithm was implemented. A hybrid ensemble classifier based on histogram and texture information was then used for plaque characterization in this study. The optimal feature set was used as input of this ensemble classifier. After tissue characterization, parameters including sensitivity, specificity, and accuracy were calculated to validate the proposed approach. A ten-fold cross validation approach was used to determine the statistical significance of the proposed method. Our experimental results showed that the proposed method had reliable performance for tissue characterization in IVUS images. The hybrid ensemble classification method outperformed other existing methods by achieving characterization accuracy of 81% for FFT and 75% for NC. In addition, this study showed that Laws features (SSV and SAV) were key indicators for coronary tissue characterization. The proposed method had high clinical applicability for image-based tissue characterization. Copyright © 2017 Elsevier B.V. All rights reserved.
Anomaly Detection Using an Ensemble of Feature Models
Noto, Keith; Brodley, Carla; Slonim, Donna
2011-01-01
We present a new approach to semi-supervised anomaly detection. Given a set of training examples believed to come from the same distribution or class, the task is to learn a model that will be able to distinguish examples in the future that do not belong to the same class. Traditional approaches typically compare the position of a new data point to the set of “normal” training data points in a chosen representation of the feature space. For some data sets, the normal data may not have discernible positions in feature space, but do have consistent relationships among some features that fail to appear in the anomalous examples. Our approach learns to predict the values of training set features from the values of other features. After we have formed an ensemble of predictors, we apply this ensemble to new data points. To combine the contribution of each predictor in our ensemble, we have developed a novel, information-theoretic anomaly measure that our experimental results show selects against noisy and irrelevant features. Our results on 47 data sets show that for most data sets, this approach significantly improves performance over current state-of-the-art feature space distance and density-based approaches. PMID:22020249
NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms
Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan
2014-01-01
One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available. PMID:24667482
The Cross-Entropy Based Multi-Filter Ensemble Method for Gene Selection.
Sun, Yingqiang; Lu, Chengbo; Li, Xiaobo
2018-05-17
The gene expression profile has the characteristics of a high dimension, low sample, and continuous type, and it is a great challenge to use gene expression profile data for the classification of tumor samples. This paper proposes a cross-entropy based multi-filter ensemble (CEMFE) method for microarray data classification. Firstly, multiple filters are used to select the microarray data in order to obtain a plurality of the pre-selected feature subsets with a different classification ability. The top N genes with the highest rank of each subset are integrated so as to form a new data set. Secondly, the cross-entropy algorithm is used to remove the redundant data in the data set. Finally, the wrapper method, which is based on forward feature selection, is used to select the best feature subset. The experimental results show that the proposed method is more efficient than other gene selection methods and that it can achieve a higher classification accuracy under fewer characteristic genes.
Genetic programming based ensemble system for microarray data classification.
Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To
2015-01-01
Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved.
Genetic Programming Based Ensemble System for Microarray Data Classification
Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To
2015-01-01
Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved. PMID:25810748
Comparative study of feature selection with ensemble learning using SOM variants
NASA Astrophysics Data System (ADS)
Filali, Ameni; Jlassi, Chiraz; Arous, Najet
2017-03-01
Ensemble learning has succeeded in the growth of stability and clustering accuracy, but their runtime prohibits them from scaling up to real-world applications. This study deals the problem of selecting a subset of the most pertinent features for every cluster from a dataset. The proposed method is another extension of the Random Forests approach using self-organizing maps (SOM) variants to unlabeled data that estimates the out-of-bag feature importance from a set of partitions. Every partition is created using a various bootstrap sample and a random subset of the features. Then, we show that the process internal estimates are used to measure variable pertinence in Random Forests are also applicable to feature selection in unsupervised learning. This approach aims to the dimensionality reduction, visualization and cluster characterization at the same time. Hence, we provide empirical results on nineteen benchmark data sets indicating that RFS can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art unsupervised methods, with a very limited subset of features. The approach proves promise to treat with very broad domains.
NASA Astrophysics Data System (ADS)
Zhongqin, G.; Chen, Y.
2017-12-01
Abstract Quickly identify the spatial distribution of landslides automatically is essential for the prevention, mitigation and assessment of the landslide hazard. It's still a challenging job owing to the complicated characteristics and vague boundary of the landslide areas on the image. The high resolution remote sensing image has multi-scales, complex spatial distribution and abundant features, the object-oriented image classification methods can make full use of the above information and thus effectively detect the landslides after the hazard happened. In this research we present a new semi-supervised workflow, taking advantages of recent object-oriented image analysis and machine learning algorithms to quick locate the different origins of landslides of some areas on the southwest part of China. Besides a sequence of image segmentation, feature selection, object classification and error test, this workflow ensemble the feature selection and classifier selection. The feature this study utilized were normalized difference vegetation index (NDVI) change, textural feature derived from the gray level co-occurrence matrices (GLCM), spectral feature and etc. The improvement of this study shows this algorithm significantly removes some redundant feature and the classifiers get fully used. All these improvements lead to a higher accuracy on the determination of the shape of landslides on the high resolution remote sensing image, in particular the flexibility aimed at different kinds of landslides.
Cant, Jonathan S; Xu, Yaoda
2017-02-01
Our visual system can extract summary statistics from large collections of objects without forming detailed representations of the individual objects in the ensemble. In a region in ventral visual cortex encompassing the collateral sulcus and the parahippocampal gyrus and overlapping extensively with the scene-selective parahippocampal place area (PPA), we have previously reported fMRI adaptation to object ensembles when ensemble statistics repeated, even when local image features differed across images (e.g., two different images of the same strawberry pile). We additionally showed that this ensemble representation is similar to (but still distinct from) how visual texture patterns are processed in this region and is not explained by appealing to differences in the color of the elements that make up the ensemble. To further explore the nature of ensemble representation in this brain region, here we used PPA as our ROI and investigated in detail how the shape and surface properties (i.e., both texture and color) of the individual objects constituting an ensemble affect the ensemble representation in anterior-medial ventral visual cortex. We photographed object ensembles of stone beads that varied in shape and surface properties. A given ensemble always contained beads of the same shape and surface properties (e.g., an ensemble of star-shaped rose quartz beads). A change to the shape and/or surface properties of all the beads in an ensemble resulted in a significant release from adaptation in PPA compared with conditions in which no ensemble feature changed. In contrast, in the object-sensitive lateral occipital area (LO), we only observed a significant release from adaptation when the shape of the ensemble elements varied, and found no significant results in additional scene-sensitive regions, namely, the retrosplenial complex and occipital place area. Together, these results demonstrate that the shape and surface properties of the individual objects comprising an ensemble both contribute significantly to object ensemble representation in anterior-medial ventral visual cortex and further demonstrate a functional dissociation between object- (LO) and scene-selective (PPA) visual cortical regions and within the broader scene-processing network itself.
Ozçift, Akin
2011-05-01
Supervised classification algorithms are commonly used in the designing of computer-aided diagnosis systems. In this study, we present a resampling strategy based Random Forests (RF) ensemble classifier to improve diagnosis of cardiac arrhythmia. Random forests is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. In this way, an RF ensemble classifier performs better than a single tree from classification performance point of view. In general, multiclass datasets having unbalanced distribution of sample sizes are difficult to analyze in terms of class discrimination. Cardiac arrhythmia is such a dataset that has multiple classes with small sample sizes and it is therefore adequate to test our resampling based training strategy. The dataset contains 452 samples in fourteen types of arrhythmias and eleven of these classes have sample sizes less than 15. Our diagnosis strategy consists of two parts: (i) a correlation based feature selection algorithm is used to select relevant features from cardiac arrhythmia dataset. (ii) RF machine learning algorithm is used to evaluate the performance of selected features with and without simple random sampling to evaluate the efficiency of proposed training strategy. The resultant accuracy of the classifier is found to be 90.0% and this is a quite high diagnosis performance for cardiac arrhythmia. Furthermore, three case studies, i.e., thyroid, cardiotocography and audiology, are used to benchmark the effectiveness of the proposed method. The results of experiments demonstrated the efficiency of random sampling strategy in training RF ensemble classification algorithm. Copyright © 2011 Elsevier Ltd. All rights reserved.
Force Sensor Based Tool Condition Monitoring Using a Heterogeneous Ensemble Learning Model
Wang, Guofeng; Yang, Yinwei; Li, Zhimeng
2014-01-01
Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability. PMID:25405514
Force sensor based tool condition monitoring using a heterogeneous ensemble learning model.
Wang, Guofeng; Yang, Yinwei; Li, Zhimeng
2014-11-14
Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability.
Zhai, Binxu; Chen, Jianguo
2018-04-18
A stacked ensemble model is developed for forecasting and analyzing the daily average concentrations of fine particulate matter (PM 2.5 ) in Beijing, China. Special feature extraction procedures, including those of simplification, polynomial, transformation and combination, are conducted before modeling to identify potentially significant features based on an exploratory data analysis. Stability feature selection and tree-based feature selection methods are applied to select important variables and evaluate the degrees of feature importance. Single models including LASSO, Adaboost, XGBoost and multi-layer perceptron optimized by the genetic algorithm (GA-MLP) are established in the level 0 space and are then integrated by support vector regression (SVR) in the level 1 space via stacked generalization. A feature importance analysis reveals that nitrogen dioxide (NO 2 ) and carbon monoxide (CO) concentrations measured from the city of Zhangjiakou are taken as the most important elements of pollution factors for forecasting PM 2.5 concentrations. Local extreme wind speeds and maximal wind speeds are considered to extend the most effects of meteorological factors to the cross-regional transportation of contaminants. Pollutants found in the cities of Zhangjiakou and Chengde have a stronger impact on air quality in Beijing than other surrounding factors. Our model evaluation shows that the ensemble model generally performs better than a single nonlinear forecasting model when applied to new data with a coefficient of determination (R 2 ) of 0.90 and a root mean squared error (RMSE) of 23.69μg/m 3 . For single pollutant grade recognition, the proposed model performs better when applied to days characterized by good air quality than when applied to days registering high levels of pollution. The overall classification accuracy level is 73.93%, with most misclassifications made among adjacent categories. The results demonstrate the interpretability and generalizability of the stacked ensemble model. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Recognition and classification of colon cells applying the ensemble of classifiers.
Kruk, M; Osowski, S; Koktysz, R
2009-02-01
The paper presents the application of an ensemble of classifiers for the recognition of colon cells on the basis of the microscope colon image. The solved task include: segmentation of the individual cells from the image using the morphological operations, the preprocessing stages, leading to the extraction of features, selection of the most important features, and the classification stage applying the classifiers arranged in the form of ensemble. The paper presents and discusses the results concerning the recognition of four most important colon cell types: eosinophylic granulocyte, neutrophilic granulocyte, lymphocyte and plasmocyte. The proposed system is able to recognize the cells with the accuracy comparable to the human expert (around 5% of discrepancy of both results).
Instances selection algorithm by ensemble margin
NASA Astrophysics Data System (ADS)
Saidi, Meryem; Bechar, Mohammed El Amine; Settouti, Nesma; Chikh, Mohamed Amine
2018-05-01
The main limit of data mining algorithms is their inability to deal with the huge amount of available data in a reasonable processing time. A solution of producing fast and accurate results is instances and features selection. This process eliminates noisy or redundant data in order to reduce the storage and computational cost without performances degradation. In this paper, a new instance selection approach called Ensemble Margin Instance Selection (EMIS) algorithm is proposed. This approach is based on the ensemble margin. To evaluate our approach, we have conducted several experiments on different real-world classification problems from UCI Machine learning repository. The pixel-based image segmentation is a field where the storage requirement and computational cost of applied model become higher. To solve these limitations we conduct a study based on the application of EMIS and other instance selection techniques for the segmentation and automatic recognition of white blood cells WBC (nucleus and cytoplasm) in cytological images.
Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.
Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G
2017-09-01
To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.
Prediction of lysine ubiquitylation with ensemble classifier and feature selection.
Zhao, Xiaowei; Li, Xiangtao; Ma, Zhiqiang; Yin, Minghao
2011-01-01
Ubiquitylation is an important process of post-translational modification. Correct identification of protein lysine ubiquitylation sites is of fundamental importance to understand the molecular mechanism of lysine ubiquitylation in biological systems. This paper develops a novel computational method to effectively identify the lysine ubiquitylation sites based on the ensemble approach. In the proposed method, 468 ubiquitylation sites from 323 proteins retrieved from the Swiss-Prot database were encoded into feature vectors by using four kinds of protein sequences information. An effective feature selection method was then applied to extract informative feature subsets. After different feature subsets were obtained by setting different starting points in the search procedure, they were used to train multiple random forests classifiers and then aggregated into a consensus classifier by majority voting. Evaluated by jackknife tests and independent tests respectively, the accuracy of the proposed predictor reached 76.82% for the training dataset and 79.16% for the test dataset, indicating that this predictor is a useful tool to predict lysine ubiquitylation sites. Furthermore, site-specific feature analysis was performed and it was shown that ubiquitylation is intimately correlated with the features of its surrounding sites in addition to features derived from the lysine site itself. The feature selection method is available upon request.
Pérez-Castillo, Yunierkis; Lazar, Cosmin; Taminau, Jonatan; Froeyen, Mathy; Cabrera-Pérez, Miguel Ángel; Nowé, Ann
2012-09-24
Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.
Bonet, Isis; Franco-Montero, Pedro; Rivero, Virginia; Teijeira, Marta; Borges, Fernanda; Uriarte, Eugenio; Morales Helguera, Aliuska
2013-12-23
A(2B) adenosine receptor antagonists may be beneficial in treating diseases like asthma, diabetes, diabetic retinopathy, and certain cancers. This has stimulated research for the development of potent ligands for this subtype, based on quantitative structure-affinity relationships. In this work, a new ensemble machine learning algorithm is proposed for classification and prediction of the ligand-binding affinity of A(2B) adenosine receptor antagonists. This algorithm is based on the training of different classifier models with multiple training sets (composed of the same compounds but represented by diverse features). The k-nearest neighbor, decision trees, neural networks, and support vector machines were used as single classifiers. To select the base classifiers for combining into the ensemble, several diversity measures were employed. The final multiclassifier prediction results were computed from the output obtained by using a combination of selected base classifiers output, by utilizing different mathematical functions including the following: majority vote, maximum and average probability. In this work, 10-fold cross- and external validation were used. The strategy led to the following results: i) the single classifiers, together with previous features selections, resulted in good overall accuracy, ii) a comparison between single classifiers, and their combinations in the multiclassifier model, showed that using our ensemble gave a better performance than the single classifier model, and iii) our multiclassifier model performed better than the most widely used multiclassifier models in the literature. The results and statistical analysis demonstrated the supremacy of our multiclassifier approach for predicting the affinity of A(2B) adenosine receptor antagonists, and it can be used to develop other QSAR models.
NASA Astrophysics Data System (ADS)
Lo, Joseph Y.; Gavrielides, Marios A.; Markey, Mia K.; Jesneck, Jonathan L.
2003-05-01
We developed an ensemble classifier for the task of computer-aided diagnosis of breast microcalcification clusters,which are very challenging to characterize for radiologists and computer models alike. The purpose of this study is to help radiologists identify whether suspicious calcification clusters are benign vs. malignant, such that they may potentially recommend fewer unnecessary biopsies for actually benign lesions. The data consists of mammographic features extracted by automated image processing algorithms as well as manually interpreted by radiologists according to a standardized lexicon. We used 292 cases from a publicly available mammography database. From each cases, we extracted 22 image processing features pertaining to lesion morphology, 5 radiologist features also pertaining to morphology, and the patient age. Linear discriminant analysis (LDA) models were designed using each of the three data types. Each local model performed poorly; the best was one based upon image processing features which yielded ROC area index AZ of 0.59 +/- 0.03 and partial AZ above 90% sensitivity of 0.08 +/- 0.03. We then developed ensemble models using different combinations of those data types, and these models all improved performance compared to the local models. The final ensemble model was based upon 5 features selected by stepwise LDA from all 28 available features. This ensemble performed with AZ of 0.69 +/- 0.03 and partial AZ of 0.21 +/- 0.04, which was statistically significantly better than the model based on the image processing features alone (p<0.001 and p=0.01 for full and partial AZ respectively). This demonstrated the value of the radiologist-extracted features as a source of information for this task. It also suggested there is potential for improved performance using this ensemble classifier approach to combine different sources of currently available data.
Drug-target interaction prediction using ensemble learning and dimensionality reduction.
Ezzat, Ali; Wu, Min; Li, Xiao-Li; Kwoh, Chee-Keong
2017-10-01
Experimental prediction of drug-target interactions is expensive, time-consuming and tedious. Fortunately, computational methods help narrow down the search space for interaction candidates to be further examined via wet-lab techniques. Nowadays, the number of attributes/features for drugs and targets, as well as the amount of their interactions, are increasing, making these computational methods inefficient or occasionally prohibitive. This motivates us to derive a reduced feature set for prediction. In addition, since ensemble learning techniques are widely used to improve the classification performance, it is also worthwhile to design an ensemble learning framework to enhance the performance for drug-target interaction prediction. In this paper, we propose a framework for drug-target interaction prediction leveraging both feature dimensionality reduction and ensemble learning. First, we conducted feature subspacing to inject diversity into the classifier ensemble. Second, we applied three different dimensionality reduction methods to the subspaced features. Third, we trained homogeneous base learners with the reduced features and then aggregated their scores to derive the final predictions. For base learners, we selected two classifiers, namely Decision Tree and Kernel Ridge Regression, resulting in two variants of ensemble models, EnsemDT and EnsemKRR, respectively. In our experiments, we utilized AUC (Area under ROC Curve) as an evaluation metric. We compared our proposed methods with various state-of-the-art methods under 5-fold cross validation. Experimental results showed EnsemKRR achieving the highest AUC (94.3%) for predicting drug-target interactions. In addition, dimensionality reduction helped improve the performance of EnsemDT. In conclusion, our proposed methods produced significant improvements for drug-target interaction prediction. Copyright © 2017 Elsevier Inc. All rights reserved.
Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo
2016-01-01
Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble's output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) - k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer's disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.
Encoding of Spatial Attention by Primate Prefrontal Cortex Neuronal Ensembles
Treue, Stefan
2018-01-01
Abstract Single neurons in the primate lateral prefrontal cortex (LPFC) encode information about the allocation of visual attention and the features of visual stimuli. However, how this compares to the performance of neuronal ensembles at encoding the same information is poorly understood. Here, we recorded the responses of neuronal ensembles in the LPFC of two macaque monkeys while they performed a task that required attending to one of two moving random dot patterns positioned in different hemifields and ignoring the other pattern. We found single units selective for the location of the attended stimulus as well as for its motion direction. To determine the coding of both variables in the population of recorded units, we used a linear classifier and progressively built neuronal ensembles by iteratively adding units according to their individual performance (best single units), or by iteratively adding units based on their contribution to the ensemble performance (best ensemble). For both methods, ensembles of relatively small sizes (n < 60) yielded substantially higher decoding performance relative to individual single units. However, the decoder reached similar performance using fewer neurons with the best ensemble building method compared with the best single units method. Our results indicate that neuronal ensembles within the LPFC encode more information about the attended spatial and nonspatial features of visual stimuli than individual neurons. They further suggest that efficient coding of attention can be achieved by relatively small neuronal ensembles characterized by a certain relationship between signal and noise correlation structures. PMID:29568798
Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo
2016-01-01
Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble’s output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) − k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer’s disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases. PMID:26764911
A mechatronics platform to study prosthetic hand control using EMG signals.
Geethanjali, P
2016-09-01
In this paper, a low-cost mechatronics platform for the design and development of robotic hands as well as a surface electromyogram (EMG) pattern recognition system is proposed. This paper also explores various EMG classification techniques using a low-cost electronics system in prosthetic hand applications. The proposed platform involves the development of a four channel EMG signal acquisition system; pattern recognition of acquired EMG signals; and development of a digital controller for a robotic hand. Four-channel surface EMG signals, acquired from ten healthy subjects for six different movements of the hand, were used to analyse pattern recognition in prosthetic hand control. Various time domain features were extracted and grouped into five ensembles to compare the influence of features in feature-selective classifiers (SLR) with widely considered non-feature-selective classifiers, such as neural networks (NN), linear discriminant analysis (LDA) and support vector machines (SVM) applied with different kernels. The results divulged that the average classification accuracy of the SVM, with a linear kernel function, outperforms other classifiers with feature ensembles, Hudgin's feature set and auto regression (AR) coefficients. However, the slight improvement in classification accuracy of SVM incurs more processing time and memory space in the low-level controller. The Kruskal-Wallis (KW) test also shows that there is no significant difference in the classification performance of SLR with Hudgin's feature set to that of SVM with Hudgin's features along with AR coefficients. In addition, the KW test shows that SLR was found to be better in respect to computation time and memory space, which is vital in a low-level controller. Similar to SVM, with a linear kernel function, other non-feature selective LDA and NN classifiers also show a slight improvement in performance using twice the features but with the drawback of increased memory space requirement and time. This prototype facilitated the study of various issues of pattern recognition and identified an efficient classifier, along with a feature ensemble, in the implementation of EMG controlled prosthetic hands in a laboratory setting at low-cost. This platform may help to motivate and facilitate prosthetic hand research in developing countries.
Detection of chewing from piezoelectric film sensor signals using ensemble classifiers.
Farooq, Muhammad; Sazonov, Edward
2016-08-01
Selection and use of pattern recognition algorithms is application dependent. In this work, we explored the use of several ensembles of weak classifiers to classify signals captured from a wearable sensor system to detect food intake based on chewing. Three sensor signals (Piezoelectric sensor, accelerometer, and hand to mouth gesture) were collected from 12 subjects in free-living conditions for 24 hrs. Sensor signals were divided into 10 seconds epochs and for each epoch combination of time and frequency domain features were computed. In this work, we present a comparison of three different ensemble techniques: boosting (AdaBoost), bootstrap aggregation (bagging) and stacking, each trained with 3 different weak classifiers (Decision Trees, Linear Discriminant Analysis (LDA) and Logistic Regression). Type of feature normalization used can also impact the classification results. For each ensemble method, three feature normalization techniques: (no-normalization, z-score normalization, and minmax normalization) were tested. A 12 fold cross-validation scheme was used to evaluate the performance of each model where the performance was evaluated in terms of precision, recall, and accuracy. Best results achieved here show an improvement of about 4% over our previous algorithms.
A study of fuzzy logic ensemble system performance on face recognition problem
NASA Astrophysics Data System (ADS)
Polyakova, A.; Lipinskiy, L.
2017-02-01
Some problems are difficult to solve by using a single intelligent information technology (IIT). The ensemble of the various data mining (DM) techniques is a set of models which are able to solve the problem by itself, but the combination of which allows increasing the efficiency of the system as a whole. Using the IIT ensembles can improve the reliability and efficiency of the final decision, since it emphasizes on the diversity of its components. The new method of the intellectual informational technology ensemble design is considered in this paper. It is based on the fuzzy logic and is designed to solve the classification and regression problems. The ensemble consists of several data mining algorithms: artificial neural network, support vector machine and decision trees. These algorithms and their ensemble have been tested by solving the face recognition problems. Principal components analysis (PCA) is used for feature selection.
ANALYSIS OF SAMPLING TECHNIQUES FOR IMBALANCED DATA: AN N=648 ADNI STUDY
Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin; Thompson, Paul M.; Ye, Jieping
2013-01-01
Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer’s disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and under sampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1). a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2). sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results. PMID:24176869
Analysis of sampling techniques for imbalanced data: An n = 648 ADNI study.
Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin; Thompson, Paul M; Ye, Jieping
2014-02-15
Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer's disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and undersampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1) a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2) sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results. © 2013 Elsevier Inc. All rights reserved.
An Ensemble Framework Coping with Instability in the Gene Selection Process.
Castellanos-Garzón, José A; Ramos, Juan; López-Sánchez, Daniel; de Paz, Juan F; Corchado, Juan M
2018-03-01
This paper proposes an ensemble framework for gene selection, which is aimed at addressing instability problems presented in the gene filtering task. The complex process of gene selection from gene expression data faces different instability problems from the informative gene subsets found by different filter methods. This makes the identification of significant genes by the experts difficult. The instability of results can come from filter methods, gene classifier methods, different datasets of the same disease and multiple valid groups of biomarkers. Even though there is a wide number of proposals, the complexity imposed by this problem remains a challenge today. This work proposes a framework involving five stages of gene filtering to discover biomarkers for diagnosis and classification tasks. This framework performs a process of stable feature selection, facing the problems above and, thus, providing a more suitable and reliable solution for clinical and research purposes. Our proposal involves a process of multistage gene filtering, in which several ensemble strategies for gene selection were added in such a way that different classifiers simultaneously assess gene subsets to face instability. Firstly, we apply an ensemble of recent gene selection methods to obtain diversity in the genes found (stability according to filter methods). Next, we apply an ensemble of known classifiers to filter genes relevant to all classifiers at a time (stability according to classification methods). The achieved results were evaluated in two different datasets of the same disease (pancreatic ductal adenocarcinoma), in search of stability according to the disease, for which promising results were achieved.
EnsembleGASVR: a novel ensemble method for classifying missense single nucleotide polymorphisms.
Rapakoulia, Trisevgeni; Theofilatos, Konstantinos; Kleftogiannis, Dimitrios; Likothanasis, Spiros; Tsakalidis, Athanasios; Mavroudi, Seferina
2014-08-15
Single nucleotide polymorphisms (SNPs) are considered the most frequently occurring DNA sequence variations. Several computational methods have been proposed for the classification of missense SNPs to neutral and disease associated. However, existing computational approaches fail to select relevant features by choosing them arbitrarily without sufficient documentation. Moreover, they are limited to the problem of missing values, imbalance between the learning datasets and most of them do not support their predictions with confidence scores. To overcome these limitations, a novel ensemble computational methodology is proposed. EnsembleGASVR facilitates a two-step algorithm, which in its first step applies a novel evolutionary embedded algorithm to locate close to optimal Support Vector Regression models. In its second step, these models are combined to extract a universal predictor, which is less prone to overfitting issues, systematizes the rebalancing of the learning sets and uses an internal approach for solving the missing values problem without loss of information. Confidence scores support all the predictions and the model becomes tunable by modifying the classification thresholds. An extensive study was performed for collecting the most relevant features for the problem of classifying SNPs, and a superset of 88 features was constructed. Experimental results show that the proposed framework outperforms well-known algorithms in terms of classification performance in the examined datasets. Finally, the proposed algorithmic framework was able to uncover the significant role of certain features such as the solvent accessibility feature, and the top-scored predictions were further validated by linking them with disease phenotypes. Datasets and codes are freely available on the Web at http://prlab.ceid.upatras.gr/EnsembleGASVR/dataset-codes.zip. All the required information about the article is available through http://prlab.ceid.upatras.gr/EnsembleGASVR/site.html. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Park, Sang-Hoon; Lee, David; Lee, Sang-Goog
2018-02-01
For the last few years, many feature extraction methods have been proposed based on biological signals. Among these, the brain signals have the advantage that they can be obtained, even by people with peripheral nervous system damage. Motor imagery electroencephalograms (EEG) are inexpensive to measure, offer a high temporal resolution, and are intuitive. Therefore, these have received a significant amount of attention in various fields, including signal processing, cognitive science, and medicine. The common spatial pattern (CSP) algorithm is a useful method for feature extraction from motor imagery EEG. However, performance degradation occurs in a small-sample setting (SSS), because the CSP depends on sample-based covariance. Since the active frequency range is different for each subject, it is also inconvenient to set the frequency range to be different every time. In this paper, we propose the feature extraction method based on a filter bank to solve these problems. The proposed method consists of five steps. First, motor imagery EEG is divided by a using filter bank. Second, the regularized CSP (R-CSP) is applied to the divided EEG. Third, we select the features according to mutual information based on the individual feature algorithm. Fourth, parameter sets are selected for the ensemble. Finally, we classify using ensemble based on features. The brain-computer interface competition III data set IVa is used to evaluate the performance of the proposed method. The proposed method improves the mean classification accuracy by 12.34%, 11.57%, 9%, 4.95%, and 4.47% compared with CSP, SR-CSP, R-CSP, filter bank CSP (FBCSP), and SR-FBCSP. Compared with the filter bank R-CSP ( , ), which is a parameter selection version of the proposed method, the classification accuracy is improved by 3.49%. In particular, the proposed method shows a large improvement in performance in the SSS.
Fernández, Alberto; Carmona, Cristobal José; José Del Jesus, María; Herrera, Francisco
2017-09-01
Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this research, we overcome these problems by carrying out a combination between feature and instance selections. Feature selection will allow simplifying the overlapping areas easing the generation of rules to distinguish among the classes. Selection of instances from all classes will address the imbalance itself by finding the most appropriate class distribution for the learning task, as well as possibly removing noise and difficult borderline examples. For the sake of obtaining an optimal joint set of features and instances, we embedded the searching for both parameters in a Multi-Objective Evolutionary Algorithm, using the C4.5 decision tree as baseline classifier in this wrapper approach. The multi-objective scheme allows taking a double advantage: the search space becomes broader, and we may provide a set of different solutions in order to build an ensemble of classifiers. This proposal has been contrasted versus several state-of-the-art solutions on imbalanced classification showing excellent results in both binary and multi-class problems.
Pourhoseingholi, Mohamad Amin; Kheirian, Sedigheh; Zali, Mohammad Reza
2017-12-01
Colorectal cancer (CRC) is one of the most common malignancies and cause of cancer mortality worldwide. Given the importance of predicting the survival of CRC patients and the growing use of data mining methods, this study aims to compare the performance of models for predicting 5-year survival of CRC patients using variety of basic and ensemble data mining methods. The CRC dataset from The Shahid Beheshti University of Medical Sciences Research Center for Gastroenterology and Liver Diseases were used for prediction and comparative study of the base and ensemble data mining techniques. Feature selection methods were used to select predictor attributes for classification. The WEKA toolkit and MedCalc software were respectively utilized for creating and comparing the models. The obtained results showed that the predictive performance of developed models was altogether high (all greater than 90%). Overall, the performance of ensemble models was higher than that of basic classifiers and the best result achieved by ensemble voting model in terms of area under the ROC curve (AUC= 0.96). AUC Comparison of models showed that the ensemble voting method significantly outperformed all models except for two methods of Random Forest (RF) and Bayesian Network (BN) considered the overlapping 95% confidence intervals. This result may indicate high predictive power of these two methods along with ensemble voting for predicting 5-year survival of CRC patients.
Ensemble methods with simple features for document zone classification
NASA Astrophysics Data System (ADS)
Obafemi-Ajayi, Tayo; Agam, Gady; Xie, Bingqing
2012-01-01
Document layout analysis is of fundamental importance for document image understanding and information retrieval. It requires the identification of blocks extracted from a document image via features extraction and block classification. In this paper, we focus on the classification of the extracted blocks into five classes: text (machine printed), handwriting, graphics, images, and noise. We propose a new set of features for efficient classifications of these blocks. We present a comparative evaluation of three ensemble based classification algorithms (boosting, bagging, and combined model trees) in addition to other known learning algorithms. Experimental results are demonstrated for a set of 36503 zones extracted from 416 document images which were randomly selected from the tobacco legacy document collection. The results obtained verify the robustness and effectiveness of the proposed set of features in comparison to the commonly used Ocropus recognition features. When used in conjunction with the Ocropus feature set, we further improve the performance of the block classification system to obtain a classification accuracy of 99.21%.
A data-driven multi-model methodology with deep feature selection for short-term wind forecasting
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feng, Cong; Cui, Mingjian; Hodge, Bri-Mathias
With the growing wind penetration into the power system worldwide, improving wind power forecasting accuracy is becoming increasingly important to ensure continued economic and reliable power system operations. In this paper, a data-driven multi-model wind forecasting methodology is developed with a two-layer ensemble machine learning technique. The first layer is composed of multiple machine learning models that generate individual forecasts. A deep feature selection framework is developed to determine the most suitable inputs to the first layer machine learning models. Then, a blending algorithm is applied in the second layer to create an ensemble of the forecasts produced by firstmore » layer models and generate both deterministic and probabilistic forecasts. This two-layer model seeks to utilize the statistically different characteristics of each machine learning algorithm. A number of machine learning algorithms are selected and compared in both layers. This developed multi-model wind forecasting methodology is compared to several benchmarks. The effectiveness of the proposed methodology is evaluated to provide 1-hour-ahead wind speed forecasting at seven locations of the Surface Radiation network. Numerical results show that comparing to the single-algorithm models, the developed multi-model framework with deep feature selection procedure has improved the forecasting accuracy by up to 30%.« less
Wong, Raymond
2013-01-01
Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC. PMID:24288684
Improved method for predicting protein fold patterns with ensemble classifiers.
Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C
2012-01-27
Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.
A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets
NASA Astrophysics Data System (ADS)
Bikku, Thulasi; Sambasiva Rao, N., Dr; Rao, Akepogu Ananda, Dr
2017-08-01
This paper mainly focuseson developing aHadoop based framework for feature selection and classification models to classify high dimensionality data in heterogeneous biomedical databases. Wide research has been performing in the fields of Machine learning, Big data and Data mining for identifying patterns. The main challenge is extracting useful features generated from diverse biological systems. The proposed model can be used for predicting diseases in various applications and identifying the features relevant to particular diseases. There is an exponential growth of biomedical repositories such as PubMed and Medline, an accurate predictive model is essential for knowledge discovery in Hadoop environment. Extracting key features from unstructured documents often lead to uncertain results due to outliers and missing values. In this paper, we proposed a two phase map-reduce framework with text preprocessor and classification model. In the first phase, mapper based preprocessing method was designed to eliminate irrelevant features, missing values and outliers from the biomedical data. In the second phase, a Map-Reduce based multi-class ensemble decision tree model was designed and implemented in the preprocessed mapper data to improve the true positive rate and computational time. The experimental results on the complex biomedical datasets show that the performance of our proposed Hadoop based multi-class ensemble model significantly outperforms state-of-the-art baselines.
Ozcift, Akin; Gulten, Arif
2011-12-01
Improving accuracies of machine learning algorithms is vital in designing high performance computer-aided diagnosis (CADx) systems. Researches have shown that a base classifier performance might be enhanced by ensemble classification strategies. In this study, we construct rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to evaluate their classification performances using Parkinson's, diabetes and heart diseases from literature. While making experiments, first the feature dimension of three datasets is reduced using correlation based feature selection (CFS) algorithm. Second, classification performances of 30 machine learning algorithms are calculated for three datasets. Third, 30 classifier ensembles are constructed based on RF algorithm to assess performances of respective classifiers with the same disease data. All the experiments are carried out with leave-one-out validation strategy and the performances of the 60 algorithms are evaluated using three metrics; classification accuracy (ACC), kappa error (KE) and area under the receiver operating characteristic (ROC) curve (AUC). Base classifiers succeeded 72.15%, 77.52% and 84.43% average accuracies for diabetes, heart and Parkinson's datasets, respectively. As for RF classifier ensembles, they produced average accuracies of 74.47%, 80.49% and 87.13% for respective diseases. RF, a newly proposed classifier ensemble algorithm, might be used to improve accuracy of miscellaneous machine learning algorithms to design advanced CADx systems. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
An ensemble method for extracting adverse drug events from social media.
Liu, Jing; Zhao, Songzheng; Zhang, Xiaodi
2016-06-01
Because adverse drug events (ADEs) are a serious health problem and a leading cause of death, it is of vital importance to identify them correctly and in a timely manner. With the development of Web 2.0, social media has become a large data source for information on ADEs. The objective of this study is to develop a relation extraction system that uses natural language processing techniques to effectively distinguish between ADEs and non-ADEs in informal text on social media. We develop a feature-based approach that utilizes various lexical, syntactic, and semantic features. Information-gain-based feature selection is performed to address high-dimensional features. Then, we evaluate the effectiveness of four well-known kernel-based approaches (i.e., subset tree kernel, tree kernel, shortest dependency path kernel, and all-paths graph kernel) and several ensembles that are generated by adopting different combination methods (i.e., majority voting, weighted averaging, and stacked generalization). All of the approaches are tested using three data sets: two health-related discussion forums and one general social networking site (i.e., Twitter). When investigating the contribution of each feature subset, the feature-based approach attains the best area under the receiver operating characteristics curve (AUC) values, which are 78.6%, 72.2%, and 79.2% on the three data sets. When individual methods are used, we attain the best AUC values of 82.1%, 73.2%, and 77.0% using the subset tree kernel, shortest dependency path kernel, and feature-based approach on the three data sets, respectively. When using classifier ensembles, we achieve the best AUC values of 84.5%, 77.3%, and 84.5% on the three data sets, outperforming the baselines. Our experimental results indicate that ADE extraction from social media can benefit from feature selection. With respect to the effectiveness of different feature subsets, lexical features and semantic features can enhance the ADE extraction capability. Kernel-based approaches, which can stay away from the feature sparsity issue, are qualified to address the ADE extraction problem. Combining different individual classifiers using suitable combination methods can further enhance the ADE extraction effectiveness. Copyright © 2016 Elsevier B.V. All rights reserved.
Dimitriadis, S I; Liparas, Dimitris; Tsolaki, Magda N
2018-05-15
In the era of computer-assisted diagnostic tools for various brain diseases, Alzheimer's disease (AD) covers a large percentage of neuroimaging research, with the main scope being its use in daily practice. However, there has been no study attempting to simultaneously discriminate among Healthy Controls (HC), early mild cognitive impairment (MCI), late MCI (cMCI) and stable AD, using features derived from a single modality, namely MRI. Based on preprocessed MRI images from the organizers of a neuroimaging challenge, 3 we attempted to quantify the prediction accuracy of multiple morphological MRI features to simultaneously discriminate among HC, MCI, cMCI and AD. We explored the efficacy of a novel scheme that includes multiple feature selections via Random Forest from subsets of the whole set of features (e.g. whole set, left/right hemisphere etc.), Random Forest classification using a fusion approach and ensemble classification via majority voting. From the ADNI database, 60 HC, 60 MCI, 60 cMCI and 60 CE were used as a training set with known labels. An extra dataset of 160 subjects (HC: 40, MCI: 40, cMCI: 40 and AD: 40) was used as an external blind validation dataset to evaluate the proposed machine learning scheme. In the second blind dataset, we succeeded in a four-class classification of 61.9% by combining MRI-based features with a Random Forest-based Ensemble Strategy. We achieved the best classification accuracy of all teams that participated in this neuroimaging competition. The results demonstrate the effectiveness of the proposed scheme to simultaneously discriminate among four groups using morphological MRI features for the very first time in the literature. Hence, the proposed machine learning scheme can be used to define single and multi-modal biomarkers for AD. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Liu, Li; Xu, Yue-Ping
2017-04-01
Ensemble flood forecasting driven by numerical weather prediction products is becoming more commonly used in operational flood forecasting applications.In this study, a hydrological ensemble flood forecasting system based on Variable Infiltration Capacity (VIC) model and quantitative precipitation forecasts from TIGGE dataset is constructed for Lanjiang Basin, Southeast China. The impacts of calibration strategies and ensemble methods on the performance of the system are then evaluated.The hydrological model is optimized by parallel programmed ɛ-NSGAII multi-objective algorithm and two respectively parameterized models are determined to simulate daily flows and peak flows coupled with a modular approach.The results indicatethat the ɛ-NSGAII algorithm permits more efficient optimization and rational determination on parameter setting.It is demonstrated that the multimodel ensemble streamflow mean have better skills than the best singlemodel ensemble mean (ECMWF) and the multimodel ensembles weighted on members and skill scores outperform other multimodel ensembles. For typical flood event, it is proved that the flood can be predicted 3-4 days in advance, but the flows in rising limb can be captured with only 1-2 days ahead due to the flash feature. With respect to peak flows selected by Peaks Over Threshold approach, the ensemble means from either singlemodel or multimodels are generally underestimated as the extreme values are smoothed out by ensemble process.
A hybrid method for classifying cognitive states from fMRI data.
Parida, S; Dehuri, S; Cho, S-B; Cacha, L A; Poznanski, R R
2015-09-01
Functional magnetic resonance imaging (fMRI) makes it possible to detect brain activities in order to elucidate cognitive-states. The complex nature of fMRI data requires under-standing of the analyses applied to produce possible avenues for developing models of cognitive state classification and improving brain activity prediction. While many models of classification task of fMRI data analysis have been developed, in this paper, we present a novel hybrid technique through combining the best attributes of genetic algorithms (GAs) and ensemble decision tree technique that consistently outperforms all other methods which are being used for cognitive-state classification. Specifically, this paper illustrates the combined effort of decision-trees ensemble and GAs for feature selection through an extensive simulation study and discusses the classification performance with respect to fMRI data. We have shown that our proposed method exhibits significant reduction of the number of features with clear edge classification accuracy over ensemble of decision-trees.
A hybrid filtering method based on a novel empirical mode decomposition for friction signals
NASA Astrophysics Data System (ADS)
Li, Chengwei; Zhan, Liwei
2015-12-01
During a measurement, the measured signal usually contains noise. To remove the noise and preserve the important feature of the signal, we introduce a hybrid filtering method that uses a new intrinsic mode function (NIMF) and a modified Hausdorff distance. The NIMF is defined as the difference between the noisy signal and each intrinsic mode function (IMF), which is obtained by empirical mode decomposition (EMD), ensemble EMD, complementary ensemble EMD, or complete ensemble EMD with adaptive noise (CEEMDAN). The relevant mode selecting is based on the similarity between the first NIMF and the rest of the NIMFs. With this filtering method, the EMD and improved versions are used to filter the simulation and friction signals. The friction signal between an airplane tire and the runaway is recorded during a simulated airplane touchdown and features spikes of various amplitudes and noise. The filtering effectiveness of the four hybrid filtering methods are compared and discussed. The results show that the filtering method based on CEEMDAN outperforms other signal filtering methods.
Mitosis detection using generic features and an ensemble of cascade adaboosts.
Tek, F Boray
2013-01-01
Mitosis count is one of the factors that pathologists use to assess the risk of metastasis and survival of the patients, which are affected by the breast cancer. We investigate an application of a set of generic features and an ensemble of cascade adaboosts to the automated mitosis detection. Calculation of the features rely minimally on object-level descriptions and thus require minimal segmentation. The proposed work was developed and tested on International Conference on Pattern Recognition (ICPR) 2012 mitosis detection contest data. We plotted receiver operating characteristics curves of true positive versus false positive rates; calculated recall, precision, F-measure, and region overlap ratio measures. WE TESTED OUR FEATURES WITH TWO DIFFERENT CLASSIFIER CONFIGURATIONS: 1) An ensemble of single adaboosts, 2) an ensemble of cascade adaboosts. On the ICPR 2012 mitosis detection contest evaluation, the cascade ensemble scored 54, 62.7, and 58, whereas the non-cascade version scored 68, 28.1, and 39.7 for the recall, precision, and F-measure measures, respectively. Mostly used features in the adaboost classifier rules were a shape-based feature, which counted granularity and a color-based feature, which relied on Red, Green, and Blue channel statistics. The features, which express the granular structure and color variations, are found useful for mitosis detection. The ensemble of adaboosts performs better than the individual adaboost classifiers. Moreover, the ensemble of cascaded adaboosts was better than the ensemble of single adaboosts for mitosis detection.
Dynamic Dimensionality Selection for Bayesian Classifier Ensembles
2015-03-19
learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but much more...classifier, Generative learning, Discriminative learning, Naïve Bayes, Feature selection, Logistic regression , higher order attribute independence 16...discriminative learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but
A Ruby API to query the Ensembl database for genomic features.
Strozzi, Francesco; Aerts, Jan
2011-04-01
The Ensembl database makes genomic features available via its Genome Browser. It is also possible to access the underlying data through a Perl API for advanced querying. We have developed a full-featured Ruby API to the Ensembl databases, providing the same functionality as the Perl interface with additional features. A single Ruby API is used to access different releases of the Ensembl databases and is also able to query multi-species databases. Most functionality of the API is provided using the ActiveRecord pattern. The library depends on introspection to make it release independent. The API is available through the Rubygem system and can be installed with the command gem install ruby-ensembl-api.
Kruskal-Wallis-based computationally efficient feature selection for face recognition.
Ali Khan, Sajid; Hussain, Ayyaz; Basit, Abdul; Akram, Sheeraz
2014-01-01
Face recognition in today's technological world, and face recognition applications attain much more importance. Most of the existing work used frontal face images to classify face image. However these techniques fail when applied on real world face images. The proposed technique effectively extracts the prominent facial features. Most of the features are redundant and do not contribute to representing face. In order to eliminate those redundant features, computationally efficient algorithm is used to select the more discriminative face features. Extracted features are then passed to classification step. In the classification step, different classifiers are ensemble to enhance the recognition accuracy rate as single classifier is unable to achieve the high accuracy. Experiments are performed on standard face database images and results are compared with existing techniques.
Evolutionary Ensemble for In Silico Prediction of Ames Test Mutagenicity
NASA Astrophysics Data System (ADS)
Chen, Huanhuan; Yao, Xin
Driven by new regulations and animal welfare, the need to develop in silico models has increased recently as alternative approaches to safety assessment of chemicals without animal testing. This paper describes a novel machine learning ensemble approach to building an in silico model for the prediction of the Ames test mutagenicity, one of a battery of the most commonly used experimental in vitro and in vivo genotoxicity tests for safety evaluation of chemicals. Evolutionary random neural ensemble with negative correlation learning (ERNE) [1] was developed based on neural networks and evolutionary algorithms. ERNE combines the method of bootstrap sampling on training data with the method of random subspace feature selection to ensure diversity in creating individuals within an initial ensemble. Furthermore, while evolving individuals within the ensemble, it makes use of the negative correlation learning, enabling individual NNs to be trained as accurate as possible while still manage to maintain them as diverse as possible. Therefore, the resulting individuals in the final ensemble are capable of cooperating collectively to achieve better generalization of prediction. The empirical experiment suggest that ERNE is an effective ensemble approach for predicting the Ames test mutagenicity of chemicals.
Mougiakakou, Stavroula G; Valavanis, Ioannis K; Nikita, Alexandra; Nikita, Konstantina S
2007-09-01
The aim of the present study is to define an optimally performing computer-aided diagnosis (CAD) architecture for the classification of liver tissue from non-enhanced computed tomography (CT) images into normal liver (C1), hepatic cyst (C2), hemangioma (C3), and hepatocellular carcinoma (C4). To this end, various CAD architectures, based on texture features and ensembles of classifiers (ECs), are comparatively assessed. Number of regions of interests (ROIs) corresponding to C1-C4 have been defined by experienced radiologists in non-enhanced liver CT images. For each ROI, five distinct sets of texture features were extracted using first order statistics, spatial gray level dependence matrix, gray level difference method, Laws' texture energy measures, and fractal dimension measurements. Two different ECs were constructed and compared. The first one consists of five multilayer perceptron neural networks (NNs), each using as input one of the computed texture feature sets or its reduced version after genetic algorithm-based feature selection. The second EC comprised five different primary classifiers, namely one multilayer perceptron NN, one probabilistic NN, and three k-nearest neighbor classifiers, each fed with the combination of the five texture feature sets or their reduced versions. The final decision of each EC was extracted by using appropriate voting schemes, while bootstrap re-sampling was utilized in order to estimate the generalization ability of the CAD architectures based on the available relatively small-sized data set. The best mean classification accuracy (84.96%) is achieved by the second EC using a fused feature set, and the weighted voting scheme. The fused feature set was obtained after appropriate feature selection applied to specific subsets of the original feature set. The comparative assessment of the various CAD architectures shows that combining three types of classifiers with a voting scheme, fed with identical feature sets obtained after appropriate feature selection and fusion, may result in an accurate system able to assist differential diagnosis of focal liver lesions from non-enhanced CT images.
Ensemble Classifier Strategy Based on Transient Feature Fusion in Electronic Nose
NASA Astrophysics Data System (ADS)
Bagheri, Mohammad Ali; Montazer, Gholam Ali
2011-09-01
In this paper, we test the performance of several ensembles of classifiers and each base learner has been trained on different types of extracted features. Experimental results show the potential benefits introduced by the usage of simple ensemble classification systems for the integration of different types of transient features.
Minimal ensemble based on subset selection using ECG to diagnose categories of CAN.
Abawajy, Jemal; Kelarev, Andrei; Yi, Xun; Jelinek, Herbert F
2018-07-01
Early diagnosis of cardiac autonomic neuropathy (CAN) is critical for reversing or decreasing its progression and prevent complications. Diagnostic accuracy or precision is one of the core requirements of CAN detection. As the standard Ewing battery tests suffer from a number of shortcomings, research in automating and improving the early detection of CAN has recently received serious attention in identifying additional clinical variables and designing advanced ensembles of classifiers to improve the accuracy or precision of CAN diagnostics. Although large ensembles are commonly proposed for the automated diagnosis of CAN, large ensembles are characterized by slow processing speed and computational complexity. This paper applies ECG features and proposes a new ensemble-based approach for diagnosis of CAN progression. We introduce a Minimal Ensemble Based On Subset Selection (MEBOSS) for the diagnosis of all categories of CAN including early, definite and atypical CAN. MEBOSS is based on a novel multi-tier architecture applying classifier subset selection as well as the training subset selection during several steps of its operation. Our experiments determined the diagnostic accuracy or precision obtained in 5 × 2 cross-validation for various options employed in MEBOSS and other classification systems. The experiments demonstrate the operation of the MEBOSS procedure invoking the most effective classifiers available in the open source software environment SageMath. The results of our experiments show that for the large DiabHealth database of CAN related parameters MEBOSS outperformed other classification systems available in SageMath and achieved 94% to 97% precision in 5 × 2 cross-validation correctly distinguishing any two CAN categories to a maximum of five categorizations including control, early, definite, severe and atypical CAN. These results show that MEBOSS architecture is effective and can be recommended for practical implementations in systems for the diagnosis of CAN progression. Copyright © 2018 Elsevier B.V. All rights reserved.
An, Yi; Wang, Jiawei; Li, Chen; Leier, André; Marquez-Lago, Tatiana; Wilksch, Jonathan; Zhang, Yang; Webb, Geoffrey I; Song, Jiangning; Lithgow, Trevor
2018-01-01
Bacterial effector proteins secreted by various protein secretion systems play crucial roles in host-pathogen interactions. In this context, computational tools capable of accurately predicting effector proteins of the various types of bacterial secretion systems are highly desirable. Existing computational approaches use different machine learning (ML) techniques and heterogeneous features derived from protein sequences and/or structural information. These predictors differ not only in terms of the used ML methods but also with respect to the used curated data sets, the features selection and their prediction performance. Here, we provide a comprehensive survey and benchmarking of currently available tools for the prediction of effector proteins of bacterial types III, IV and VI secretion systems (T3SS, T4SS and T6SS, respectively). We review core algorithms, feature selection techniques, tool availability and applicability and evaluate the prediction performance based on carefully curated independent test data sets. In an effort to improve predictive performance, we constructed three ensemble models based on ML algorithms by integrating the output of all individual predictors reviewed. Our benchmarks demonstrate that these ensemble models outperform all the reviewed tools for the prediction of effector proteins of T3SS and T4SS. The webserver of the proposed ensemble methods for T3SS and T4SS effector protein prediction is freely available at http://tbooster.erc.monash.edu/index.jsp. We anticipate that this survey will serve as a useful guide for interested users and that the new ensemble predictors will stimulate research into host-pathogen relationships and inspiration for the development of new bioinformatics tools for predicting effector proteins of T3SS, T4SS and T6SS. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Distinct cognitive mechanisms involved in the processing of single objects and object ensembles
Cant, Jonathan S.; Sun, Sol Z.; Xu, Yaoda
2015-01-01
Behavioral research has demonstrated that the shape and texture of single objects can be processed independently. Similarly, neuroimaging results have shown that an object's shape and texture are processed in distinct brain regions with shape in the lateral occipital area and texture in parahippocampal cortex. Meanwhile, objects are not always seen in isolation and are often grouped together as an ensemble. We recently showed that the processing of ensembles also involves parahippocampal cortex and that the shape and texture of ensemble elements are processed together within this region. These neural data suggest that the independence seen between shape and texture in single-object perception would not be observed in object-ensemble perception. Here we tested this prediction by examining whether observers could attend to the shape of ensemble elements while ignoring changes in an unattended texture feature and vice versa. Across six behavioral experiments, we replicated previous findings of independence between shape and texture in single-object perception. In contrast, we observed that changes in an unattended ensemble feature negatively impacted the processing of an attended ensemble feature only when ensemble features were attended globally. When they were attended locally, thereby making ensemble processing similar to single-object processing, interference was abolished. Overall, these findings confirm previous neuroimaging results and suggest that distinct cognitive mechanisms may be involved in single-object and object-ensemble perception. Additionally, they show that the scope of visual attention plays a critical role in determining which type of object processing (ensemble or single object) is engaged by the visual system. PMID:26360156
Effects of ensemble and summary displays on interpretations of geospatial uncertainty data.
Padilla, Lace M; Ruginski, Ian T; Creem-Regehr, Sarah H
2017-01-01
Ensemble and summary displays are two widely used methods to represent visual-spatial uncertainty; however, there is disagreement about which is the most effective technique to communicate uncertainty to the general public. Visualization scientists create ensemble displays by plotting multiple data points on the same Cartesian coordinate plane. Despite their use in scientific practice, it is more common in public presentations to use visualizations of summary displays, which scientists create by plotting statistical parameters of the ensemble members. While prior work has demonstrated that viewers make different decisions when viewing summary and ensemble displays, it is unclear what components of the displays lead to diverging judgments. This study aims to compare the salience of visual features - or visual elements that attract bottom-up attention - as one possible source of diverging judgments made with ensemble and summary displays in the context of hurricane track forecasts. We report that salient visual features of both ensemble and summary displays influence participant judgment. Specifically, we find that salient features of summary displays of geospatial uncertainty can be misunderstood as displaying size information. Further, salient features of ensemble displays evoke judgments that are indicative of accurate interpretations of the underlying probability distribution of the ensemble data. However, when participants use ensemble displays to make point-based judgments, they may overweight individual ensemble members in their decision-making process. We propose that ensemble displays are a promising alternative to summary displays in a geospatial context but that decisions about visualization methods should be informed by the viewer's task.
Exponential Sensitivity and its Cost in Quantum Physics
Gilyén, András; Kiss, Tamás; Jex, Igor
2016-01-01
State selective protocols, like entanglement purification, lead to an essentially non-linear quantum evolution, unusual in naturally occurring quantum processes. Sensitivity to initial states in quantum systems, stemming from such non-linear dynamics, is a promising perspective for applications. Here we demonstrate that chaotic behaviour is a rather generic feature in state selective protocols: exponential sensitivity can exist for all initial states in an experimentally realisable optical scheme. Moreover, any complex rational polynomial map, including the example of the Mandelbrot set, can be directly realised. In state selective protocols, one needs an ensemble of initial states, the size of which decreases with each iteration. We prove that exponential sensitivity to initial states in any quantum system has to be related to downsizing the initial ensemble also exponentially. Our results show that magnifying initial differences of quantum states (a Schrödinger microscope) is possible; however, there is a strict bound on the number of copies needed. PMID:26861076
Exponential Sensitivity and its Cost in Quantum Physics.
Gilyén, András; Kiss, Tamás; Jex, Igor
2016-02-10
State selective protocols, like entanglement purification, lead to an essentially non-linear quantum evolution, unusual in naturally occurring quantum processes. Sensitivity to initial states in quantum systems, stemming from such non-linear dynamics, is a promising perspective for applications. Here we demonstrate that chaotic behaviour is a rather generic feature in state selective protocols: exponential sensitivity can exist for all initial states in an experimentally realisable optical scheme. Moreover, any complex rational polynomial map, including the example of the Mandelbrot set, can be directly realised. In state selective protocols, one needs an ensemble of initial states, the size of which decreases with each iteration. We prove that exponential sensitivity to initial states in any quantum system has to be related to downsizing the initial ensemble also exponentially. Our results show that magnifying initial differences of quantum states (a Schrödinger microscope) is possible; however, there is a strict bound on the number of copies needed.
Ensemble of sparse classifiers for high-dimensional biological data.
Kim, Sunghan; Scalzo, Fabien; Telesca, Donatello; Hu, Xiao
2015-01-01
Biological data are often high in dimension while the number of samples is small. In such cases, the performance of classification can be improved by reducing the dimension of data, which is referred to as feature selection. Recently, a novel feature selection method has been proposed utilising the sparsity of high-dimensional biological data where a small subset of features accounts for most variance of the dataset. In this study we propose a new classification method for high-dimensional biological data, which performs both feature selection and classification within a single framework. Our proposed method utilises a sparse linear solution technique and the bootstrap aggregating algorithm. We tested its performance on four public mass spectrometry cancer datasets along with two other conventional classification techniques such as Support Vector Machines and Adaptive Boosting. The results demonstrate that our proposed method performs more accurate classification across various cancer datasets than those conventional classification techniques.
Dynamic adaptive learning for decision-making supporting systems
NASA Astrophysics Data System (ADS)
He, Haibo; Cao, Yuan; Chen, Sheng; Desai, Sachi; Hohil, Myron E.
2008-03-01
This paper proposes a novel adaptive learning method for data mining in support of decision-making systems. Due to the inherent characteristics of information ambiguity/uncertainty, high dimensionality and noisy in many homeland security and defense applications, such as surveillances, monitoring, net-centric battlefield, and others, it is critical to develop autonomous learning methods to efficiently learn useful information from raw data to help the decision making process. The proposed method is based on a dynamic learning principle in the feature spaces. Generally speaking, conventional approaches of learning from high dimensional data sets include various feature extraction (principal component analysis, wavelet transform, and others) and feature selection (embedded approach, wrapper approach, filter approach, and others) methods. However, very limited understandings of adaptive learning from different feature spaces have been achieved. We propose an integrative approach that takes advantages of feature selection and hypothesis ensemble techniques to achieve our goal. Based on the training data distributions, a feature score function is used to provide a measurement of the importance of different features for learning purpose. Then multiple hypotheses are iteratively developed in different feature spaces according to their learning capabilities. Unlike the pre-set iteration steps in many of the existing ensemble learning approaches, such as adaptive boosting (AdaBoost) method, the iterative learning process will automatically stop when the intelligent system can not provide a better understanding than a random guess in that particular subset of feature spaces. Finally, a voting algorithm is used to combine all the decisions from different hypotheses to provide the final prediction results. Simulation analyses of the proposed method on classification of different US military aircraft databases show the effectiveness of this method.
Ensemble based on static classifier selection for automated diagnosis of Mild Cognitive Impairment.
Nanni, Loris; Lumini, Alessandra; Zaffonato, Nicolò
2018-05-15
Alzheimer's disease (AD) is the most common cause of neurodegenerative dementia in the elderly population. Scientific research is very active in the challenge of designing automated approaches to achieve an early and certain diagnosis. Recently an international competition among AD predictors has been organized: "A Machine learning neuroimaging challenge for automated diagnosis of Mild Cognitive Impairment" (MLNeCh). This competition is based on pre-processed sets of T1-weighted Magnetic Resonance Images (MRI) to be classified in four categories: stable AD, individuals with MCI who converted to AD, individuals with MCI who did not convert to AD and healthy controls. In this work, we propose a method to perform early diagnosis of AD, which is evaluated on MLNeCh dataset. Since the automatic classification of AD is based on the use of feature vectors of high dimensionality, different techniques of feature selection/reduction are compared in order to avoid the curse-of-dimensionality problem, then the classification method is obtained as the combination of Support Vector Machines trained using different clusters of data extracted from the whole training set. The multi-classifier approach proposed in this work outperforms all the stand-alone method tested in our experiments. The final ensemble is based on a set of classifiers, each trained on a different cluster of the training data. The proposed ensemble has the great advantage of performing well using a very reduced version of the data (the reduction factor is more than 90%). The MATLAB code for the ensemble of classifiers will be publicly available 1 to other researchers for future comparisons. Copyright © 2017 Elsevier B.V. All rights reserved.
A Hybrid Classification System for Heart Disease Diagnosis Based on the RFRS Method.
Liu, Xiao; Wang, Xiaoli; Su, Qiang; Zhang, Mo; Zhu, Yanhong; Wang, Qiugen; Wang, Qian
2017-01-01
Heart disease is one of the most common diseases in the world. The objective of this study is to aid the diagnosis of heart disease using a hybrid classification system based on the ReliefF and Rough Set (RFRS) method. The proposed system contains two subsystems: the RFRS feature selection system and a classification system with an ensemble classifier. The first system includes three stages: (i) data discretization, (ii) feature extraction using the ReliefF algorithm, and (iii) feature reduction using the heuristic Rough Set reduction algorithm that we developed. In the second system, an ensemble classifier is proposed based on the C4.5 classifier. The Statlog (Heart) dataset, obtained from the UCI database, was used for experiments. A maximum classification accuracy of 92.59% was achieved according to a jackknife cross-validation scheme. The results demonstrate that the performance of the proposed system is superior to the performances of previously reported classification techniques.
Dehzangi, Abdollah; Paliwal, Kuldip; Sharma, Alok; Dehzangi, Omid; Sattar, Abdul
2013-01-01
Better understanding of structural class of a given protein reveals important information about its overall folding type and its domain. It can also be directly used to provide critical information on general tertiary structure of a protein which has a profound impact on protein function determination and drug design. Despite tremendous enhancements made by pattern recognition-based approaches to solve this problem, it still remains as an unsolved issue for bioinformatics that demands more attention and exploration. In this study, we propose a novel feature extraction model that incorporates physicochemical and evolutionary-based information simultaneously. We also propose overlapped segmented distribution and autocorrelation-based feature extraction methods to provide more local and global discriminatory information. The proposed feature extraction methods are explored for 15 most promising attributes that are selected from a wide range of physicochemical-based attributes. Finally, by applying an ensemble of different classifiers namely, Adaboost.M1, LogitBoost, naive Bayes, multilayer perceptron (MLP), and support vector machine (SVM) we show enhancement of the protein structural class prediction accuracy for four popular benchmarks.
SVM and SVM Ensembles in Breast Cancer Prediction.
Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong
2017-01-01
Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.
SVM and SVM Ensembles in Breast Cancer Prediction
Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong
2017-01-01
Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers. PMID:28060807
Hampson, Robert E.; Song, Dong; Chan, Rosa H.M.; Sweatt, Andrew J.; Riley, Mitchell R.; Goonawardena, Anushka V.; Marmarelis, Vasilis Z.; Gerhardt, Greg A.; Berger, Theodore W.; Deadwyler, Sam A.
2012-01-01
A major factor involved in providing closed loop feedback for control of neural function is to understand how neural ensembles encode online information critical to the final behavioral endpoint. This issue was directly assessed in rats performing a short-term delay memory task in which successful encoding of task information is dependent upon specific spatiotemporal firing patterns recorded from ensembles of CA3 and CA1 hippocampal neurons. Such patterns, extracted by a specially designed nonlinear multi-input multi-output (MIMO) nonlinear mathematical model, were used to predict successful performance online via a closed loop paradigm which regulated trial difficulty (time of retention) as a function of the “strength” of stimulus encoding. The significance of the MIMO model as a neural prosthesis has been demonstrated by substituting trains of electrical stimulation pulses to mimic these same ensemble firing patterns. This feature was used repeatedly to vary “normal” encoding as a means of understanding how neural ensembles can be “tuned” to mimic the inherent process of selecting codes of different strength and functional specificity. The capacity to enhance and tune hippocampal encoding via MIMO model detection and insertion of critical ensemble firing patterns shown here provides the basis for possible extension to other disrupted brain circuitry. PMID:22498704
Ensemble transcript interaction networks: a case study on Alzheimer's disease.
Armañanzas, Rubén; Larrañaga, Pedro; Bielza, Concha
2012-10-01
Systems biology techniques are a topic of recent interest within the neurological field. Computational intelligence (CI) addresses this holistic perspective by means of consensus or ensemble techniques ultimately capable of uncovering new and relevant findings. In this paper, we propose the application of a CI approach based on ensemble Bayesian network classifiers and multivariate feature subset selection to induce probabilistic dependences that could match or unveil biological relationships. The research focuses on the analysis of high-throughput Alzheimer's disease (AD) transcript profiling. The analysis is conducted from two perspectives. First, we compare the expression profiles of hippocampus subregion entorhinal cortex (EC) samples of AD patients and controls. Second, we use the ensemble approach to study four types of samples: EC and dentate gyrus (DG) samples from both patients and controls. Results disclose transcript interaction networks with remarkable structures and genes not directly related to AD by previous studies. The ensemble is able to identify a variety of transcripts that play key roles in other neurological pathologies. Classical statistical assessment by means of non-parametric tests confirms the relevance of the majority of the transcripts. The ensemble approach pinpoints key metabolic mechanisms that could lead to new findings in the pathogenesis and development of AD. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Random ensemble learning for EEG classification.
Hosseini, Mohammad-Parsa; Pompili, Dario; Elisevich, Kost; Soltanian-Zadeh, Hamid
2018-01-01
Real-time detection of seizure activity in epilepsy patients is critical in averting seizure activity and improving patients' quality of life. Accurate evaluation, presurgical assessment, seizure prevention, and emergency alerts all depend on the rapid detection of seizure onset. A new method of feature selection and classification for rapid and precise seizure detection is discussed wherein informative components of electroencephalogram (EEG)-derived data are extracted and an automatic method is presented using infinite independent component analysis (I-ICA) to select independent features. The feature space is divided into subspaces via random selection and multichannel support vector machines (SVMs) are used to classify these subspaces. The result of each classifier is then combined by majority voting to establish the final output. In addition, a random subspace ensemble using a combination of SVM, multilayer perceptron (MLP) neural network and an extended k-nearest neighbors (k-NN), called extended nearest neighbor (ENN), is developed for the EEG and electrocorticography (ECoG) big data problem. To evaluate the solution, a benchmark ECoG of eight patients with temporal and extratemporal epilepsy was implemented in a distributed computing framework as a multitier cloud-computing architecture. Using leave-one-out cross-validation, the accuracy, sensitivity, specificity, and both false positive and false negative ratios of the proposed method were found to be 0.97, 0.98, 0.96, 0.04, and 0.02, respectively. Application of the solution to cases under investigation with ECoG has also been effected to demonstrate its utility. Copyright © 2017 Elsevier B.V. All rights reserved.
Ensemble-type numerical uncertainty information from single model integrations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rauser, Florian, E-mail: florian.rauser@mpimet.mpg.de; Marotzke, Jochem; Korn, Peter
2015-07-01
We suggest an algorithm that quantifies the discretization error of time-dependent physical quantities of interest (goals) for numerical models of geophysical fluid dynamics. The goal discretization error is estimated using a sum of weighted local discretization errors. The key feature of our algorithm is that these local discretization errors are interpreted as realizations of a random process. The random process is determined by the model and the flow state. From a class of local error random processes we select a suitable specific random process by integrating the model over a short time interval at different resolutions. The weights of themore » influences of the local discretization errors on the goal are modeled as goal sensitivities, which are calculated via automatic differentiation. The integration of the weighted realizations of local error random processes yields a posterior ensemble of goal approximations from a single run of the numerical model. From the posterior ensemble we derive the uncertainty information of the goal discretization error. This algorithm bypasses the requirement of detailed knowledge about the models discretization to generate numerical error estimates. The algorithm is evaluated for the spherical shallow-water equations. For two standard test cases we successfully estimate the error of regional potential energy, track its evolution, and compare it to standard ensemble techniques. The posterior ensemble shares linear-error-growth properties with ensembles of multiple model integrations when comparably perturbed. The posterior ensemble numerical error estimates are of comparable size as those of a stochastic physics ensemble.« less
Ramírez, J; Górriz, J M; Ortiz, A; Martínez-Murcia, F J; Segovia, F; Salas-Gonzalez, D; Castillo-Barnes, D; Illán, I A; Puntonet, C G
2018-05-15
Alzheimer's disease (AD) is the most common cause of dementia in the elderly and affects approximately 30 million individuals worldwide. Mild cognitive impairment (MCI) is very frequently a prodromal phase of AD, and existing studies have suggested that people with MCI tend to progress to AD at a rate of about 10-15% per year. However, the ability of clinicians and machine learning systems to predict AD based on MRI biomarkers at an early stage is still a challenging problem that can have a great impact in improving treatments. The proposed system, developed by the SiPBA-UGR team for this challenge, is based on feature standardization, ANOVA feature selection, partial least squares feature dimension reduction and an ensemble of One vs. Rest random forest classifiers. With the aim of improving its performance when discriminating healthy controls (HC) from MCI, a second binary classification level was introduced that reconsiders the HC and MCI predictions of the first level. The system was trained and evaluated on an ADNI datasets that consist of T1-weighted MRI morphological measurements from HC, stable MCI, converter MCI and AD subjects. The proposed system yields a 56.25% classification score on the test subset which consists of 160 real subjects. The classifier yielded the best performance when compared to: (i) One vs. One (OvO), One vs. Rest (OvR) and error correcting output codes (ECOC) as strategies for reducing the multiclass classification task to multiple binary classification problems, (ii) support vector machines, gradient boosting classifier and random forest as base binary classifiers, and (iii) bagging ensemble learning. A robust method has been proposed for the international challenge on MCI prediction based on MRI data. The system yielded the second best performance during the competition with an accuracy rate of 56.25% when evaluated on the real subjects of the test set. Copyright © 2017 Elsevier B.V. All rights reserved.
Embedded feature ranking for ensemble MLP classifiers.
Windeatt, Terry; Duangsoithong, Rakkrit; Smith, Raymond
2011-06-01
A feature ranking scheme for multilayer perceptron (MLP) ensembles is proposed, along with a stopping criterion based upon the out-of-bootstrap estimate. To solve multi-class problems feature ranking is combined with modified error-correcting output coding. Experimental results on benchmark data demonstrate the versatility of the MLP base classifier in removing irrelevant features.
Pairwise diversity ranking of polychotomous features for ensemble physiological signal classifiers.
Gupta, Lalit; Kota, Srinivas; Molfese, Dennis L; Vaidyanathan, Ravi
2013-06-01
It is well known that fusion classifiers for physiological signal classification with diverse components (classifiers or data sets) outperform those with less diverse components. Determining component diversity, therefore, is of the utmost importance in the design of fusion classifiers that are often employed in clinical diagnostic and numerous other pattern recognition problems. In this article, a new pairwise diversity-based ranking strategy is introduced to select a subset of ensemble components, which when combined will be more diverse than any other component subset of the same size. The strategy is unified in the sense that the components can be classifiers or data sets. Moreover, the classifiers and data sets can be polychotomous. Classifier-fusion and data-fusion systems are formulated based on the diversity-based selection strategy, and the application of the two fusion strategies are demonstrated through the classification of multichannel event-related potentials. It is observed that for both classifier and data fusion, the classification accuracy tends to increase/decrease when the diversity of the component ensemble increases/decreases. For the four sets of 14-channel event-related potentials considered, it is shown that data fusion outperforms classifier fusion. Furthermore, it is demonstrated that the combination of data components that yield the best performance, in a relative sense, can be determined through the diversity-based selection strategy.
Absolute cosine-based SVM-RFE feature selection method for prostate histopathological grading.
Sahran, Shahnorbanun; Albashish, Dheeb; Abdullah, Azizi; Shukor, Nordashima Abd; Hayati Md Pauzi, Suria
2018-04-18
Feature selection (FS) methods are widely used in grading and diagnosing prostate histopathological images. In this context, FS is based on the texture features obtained from the lumen, nuclei, cytoplasm and stroma, all of which are important tissue components. However, it is difficult to represent the high-dimensional textures of these tissue components. To solve this problem, we propose a new FS method that enables the selection of features with minimal redundancy in the tissue components. We categorise tissue images based on the texture of individual tissue components via the construction of a single classifier and also construct an ensemble learning model by merging the values obtained by each classifier. Another issue that arises is overfitting due to the high-dimensional texture of individual tissue components. We propose a new FS method, SVM-RFE(AC), that integrates a Support Vector Machine-Recursive Feature Elimination (SVM-RFE) embedded procedure with an absolute cosine (AC) filter method to prevent redundancy in the selected features of the SV-RFE and an unoptimised classifier in the AC. We conducted experiments on H&E histopathological prostate and colon cancer images with respect to three prostate classifications, namely benign vs. grade 3, benign vs. grade 4 and grade 3 vs. grade 4. The colon benchmark dataset requires a distinction between grades 1 and 2, which are the most difficult cases to distinguish in the colon domain. The results obtained by both the single and ensemble classification models (which uses the product rule as its merging method) confirm that the proposed SVM-RFE(AC) is superior to the other SVM and SVM-RFE-based methods. We developed an FS method based on SVM-RFE and AC and successfully showed that its use enabled the identification of the most crucial texture feature of each tissue component. Thus, it makes possible the distinction between multiple Gleason grades (e.g. grade 3 vs. grade 4) and its performance is far superior to other reported FS methods. Copyright © 2018 Elsevier B.V. All rights reserved.
Cortical processing of dynamic sound envelope transitions.
Zhou, Yi; Wang, Xiaoqin
2010-12-08
Slow envelope fluctuations in the range of 2-20 Hz provide important segmental cues for processing communication sounds. For a successful segmentation, a neural processor must capture envelope features associated with the rise and fall of signal energy, a process that is often challenged by the interference of background noise. This study investigated the neural representations of slowly varying envelopes in quiet and in background noise in the primary auditory cortex (A1) of awake marmoset monkeys. We characterized envelope features based on the local average and rate of change of sound level in envelope waveforms and identified envelope features to which neurons were selective by reverse correlation. Our results showed that envelope feature selectivity of A1 neurons was correlated with the degree of nonmonotonicity in their static rate-level functions. Nonmonotonic neurons exhibited greater feature selectivity than monotonic neurons in quiet and in background noise. The diverse envelope feature selectivity decreased spike-timing correlation among A1 neurons in response to the same envelope waveforms. As a result, the variability, but not the average, of the ensemble responses of A1 neurons represented more faithfully the dynamic transitions in low-frequency sound envelopes both in quiet and in background noise.
Automatic Estimation of Osteoporotic Fracture Cases by Using Ensemble Learning Approaches.
Kilic, Niyazi; Hosgormez, Erkan
2016-03-01
Ensemble learning methods are one of the most powerful tools for the pattern classification problems. In this paper, the effects of ensemble learning methods and some physical bone densitometry parameters on osteoporotic fracture detection were investigated. Six feature set models were constructed including different physical parameters and they fed into the ensemble classifiers as input features. As ensemble learning techniques, bagging, gradient boosting and random subspace (RSM) were used. Instance based learning (IBk) and random forest (RF) classifiers applied to six feature set models. The patients were classified into three groups such as osteoporosis, osteopenia and control (healthy), using ensemble classifiers. Total classification accuracy and f-measure were also used to evaluate diagnostic performance of the proposed ensemble classification system. The classification accuracy has reached to 98.85 % by the combination of model 6 (five BMD + five T-score values) using RSM-RF classifier. The findings of this paper suggest that the patients will be able to be warned before a bone fracture occurred, by just examining some physical parameters that can easily be measured without invasive operations.
A hybrid cost-sensitive ensemble for imbalanced breast thermogram classification.
Krawczyk, Bartosz; Schaefer, Gerald; Woźniak, Michał
2015-11-01
Early recognition of breast cancer, the most commonly diagnosed form of cancer in women, is of crucial importance, given that it leads to significantly improved chances of survival. Medical thermography, which uses an infrared camera for thermal imaging, has been demonstrated as a particularly useful technique for early diagnosis, because it detects smaller tumors than the standard modality of mammography. In this paper, we analyse breast thermograms by extracting features describing bilateral symmetries between the two breast areas, and present a classification system for decision making. Clearly, the costs associated with missing a cancer case are much higher than those for mislabelling a benign case. At the same time, datasets contain significantly fewer malignant cases than benign ones. Standard classification approaches fail to consider either of these aspects. In this paper, we introduce a hybrid cost-sensitive classifier ensemble to address this challenging problem. Our approach entails a pool of cost-sensitive decision trees which assign a higher misclassification cost to the malignant class, thereby boosting its recognition rate. A genetic algorithm is employed for simultaneous feature selection and classifier fusion. As an optimisation criterion, we use a combination of misclassification cost and diversity to achieve both a high sensitivity and a heterogeneous ensemble. Furthermore, we prune our ensemble by discarding classifiers that contribute minimally to the decision making. For a challenging dataset of about 150 thermograms, our approach achieves an excellent sensitivity of 83.10%, while maintaining a high specificity of 89.44%. This not only signifies improved recognition of malignant cases, it also statistically outperforms other state-of-the-art algorithms designed for imbalanced classification, and hence provides an effective approach for analysing breast thermograms. Our proposed hybrid cost-sensitive ensemble can facilitate a highly accurate early diagnostic of breast cancer based on thermogram features. It overcomes the difficulties posed by the imbalanced distribution of patients in the two analysed groups. Copyright © 2015 Elsevier B.V. All rights reserved.
Hayat, Maqsood; Tahir, Muhammad
2015-08-01
Membrane protein is a central component of the cell that manages intra and extracellular processes. Membrane proteins execute a diversity of functions that are vital for the survival of organisms. The topology of transmembrane proteins describes the number of transmembrane (TM) helix segments and its orientation. However, owing to the lack of its recognized structures, the identification of TM helix and its topology through experimental methods is laborious with low throughput. In order to identify TM helix segments reliably, accurately, and effectively from topogenic sequences, we propose the PSOFuzzySVM-TMH model. In this model, evolutionary based information position specific scoring matrix and discrete based information 6-letter exchange group are used to formulate transmembrane protein sequences. The noisy and extraneous attributes are eradicated using an optimization selection technique, particle swarm optimization, from both feature spaces. Finally, the selected feature spaces are combined in order to form ensemble feature space. Fuzzy-support vector Machine is utilized as a classification algorithm. Two benchmark datasets, including low and high resolution datasets, are used. At various levels, the performance of the PSOFuzzySVM-TMH model is assessed through 10-fold cross validation test. The empirical results reveal that the proposed framework PSOFuzzySVM-TMH outperforms in terms of classification performance in the examined datasets. It is ascertained that the proposed model might be a useful and high throughput tool for academia and research community for further structure and functional studies on transmembrane proteins.
Pilkiw, Maryna; Insel, Nathan; Cui, Younghua; Finney, Caitlin; Morrissey, Mark D; Takehara-Nishiuchi, Kaori
2017-07-06
The lateral entorhinal cortex (LEC) is thought to bind sensory events with the environment where they took place. To compare the relative influence of transient events and temporally stable environmental stimuli on the firing of LEC cells, we recorded neuron spiking patterns in the region during blocks of a trace eyeblink conditioning paradigm performed in two environments and with different conditioning stimuli. Firing rates of some neurons were phasically selective for conditioned stimuli in a way that depended on which room the rat was in; nearly all neurons were tonically selective for environments in a way that depended on which stimuli had been presented in those environments. As rats moved from one environment to another, tonic neuron ensemble activity exhibited prospective information about the conditioned stimulus associated with the environment. Thus, the LEC formed phasic and tonic codes for event-environment associations, thereby accurately differentiating multiple experiences with overlapping features.
Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP.
Shim, Yoonsik; Philippides, Andrew; Staras, Kevin; Husbands, Phil
2016-10-01
We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP). The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM) networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.
Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP
Staras, Kevin
2016-01-01
We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP). The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM) networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture. PMID:27760125
MSEBAG: a dynamic classifier ensemble generation based on `minimum-sufficient ensemble' and bagging
NASA Astrophysics Data System (ADS)
Chen, Lei; Kamel, Mohamed S.
2016-01-01
In this paper, we propose a dynamic classifier system, MSEBAG, which is characterised by searching for the 'minimum-sufficient ensemble' and bagging at the ensemble level. It adopts an 'over-generation and selection' strategy and aims to achieve a good bias-variance trade-off. In the training phase, MSEBAG first searches for the 'minimum-sufficient ensemble', which maximises the in-sample fitness with the minimal number of base classifiers. Then, starting from the 'minimum-sufficient ensemble', a backward stepwise algorithm is employed to generate a collection of ensembles. The objective is to create a collection of ensembles with a descending fitness on the data, as well as a descending complexity in the structure. MSEBAG dynamically selects the ensembles from the collection for the decision aggregation. The extended adaptive aggregation (EAA) approach, a bagging-style algorithm performed at the ensemble level, is employed for this task. EAA searches for the competent ensembles using a score function, which takes into consideration both the in-sample fitness and the confidence of the statistical inference, and averages the decisions of the selected ensembles to label the test pattern. The experimental results show that the proposed MSEBAG outperforms the benchmarks on average.
Programming in the Zone: Repertoire Selection for the Large Ensemble
ERIC Educational Resources Information Center
Hopkins, Michael
2013-01-01
One of the great challenges ensemble directors face is selecting high-quality repertoire that matches the musical and technical levels of their ensembles. Thoughtful repertoire selection can lead to increased student motivation as well as greater enthusiasm for the music program from parents, administrators, teachers, and community members. Common…
Computer Based Behavioral Biometric Authentication via Multi-Modal Fusion
2013-03-01
the decisions made by each individual modality. Fusion of features is the simple concatenation of feature vectors from multiple modalities to be...of Features BayesNet MDL 330 LibSVM PCA 80 J48 Wrapper Evaluator 11 3.5.3 Ensemble Based Decision Level Fusion. In ensemble learning multiple ...The high fusion percentages validate our hypothesis that by combining features from multiple modalities, classification accuracy can be improved. As
IMMAN: free software for information theory-based chemometric analysis.
Urias, Ricardo W Pino; Barigye, Stephen J; Marrero-Ponce, Yovani; García-Jacas, César R; Valdes-Martiní, José R; Perez-Gimenez, Facundo
2015-05-01
The features and theoretical background of a new and free computational program for chemometric analysis denominated IMMAN (acronym for Information theory-based CheMoMetrics ANalysis) are presented. This is multi-platform software developed in the Java programming language, designed with a remarkably user-friendly graphical interface for the computation of a collection of information-theoretic functions adapted for rank-based unsupervised and supervised feature selection tasks. A total of 20 feature selection parameters are presented, with the unsupervised and supervised frameworks represented by 10 approaches in each case. Several information-theoretic parameters traditionally used as molecular descriptors (MDs) are adapted for use as unsupervised rank-based feature selection methods. On the other hand, a generalization scheme for the previously defined differential Shannon's entropy is discussed, as well as the introduction of Jeffreys information measure for supervised feature selection. Moreover, well-known information-theoretic feature selection parameters, such as information gain, gain ratio, and symmetrical uncertainty are incorporated to the IMMAN software ( http://mobiosd-hub.com/imman-soft/ ), following an equal-interval discretization approach. IMMAN offers data pre-processing functionalities, such as missing values processing, dataset partitioning, and browsing. Moreover, single parameter or ensemble (multi-criteria) ranking options are provided. Consequently, this software is suitable for tasks like dimensionality reduction, feature ranking, as well as comparative diversity analysis of data matrices. Simple examples of applications performed with this program are presented. A comparative study between IMMAN and WEKA feature selection tools using the Arcene dataset was performed, demonstrating similar behavior. In addition, it is revealed that the use of IMMAN unsupervised feature selection methods improves the performance of both IMMAN and WEKA supervised algorithms. Graphic representation for Shannon's distribution of MD calculating software.
Robust Feature Selection Technique using Rank Aggregation.
Sarkar, Chandrima; Cooley, Sarah; Srivastava, Jaideep
2014-01-01
Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique which produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases while classifiers exploit different statistical properties of data for evaluation. In numerous situations this can put researchers into dilemma as to which feature selection method and a classifiers to choose from a vast range of choices. In this paper, we propose a technique that aggregates the consensus properties of various feature selection methods to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable towards achieving similar and ideally higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the Robustness Index (RI). We perform an extensive empirical evaluation of our technique on eight data sets with different dimensions including Arrythmia, Lung Cancer, Madelon, mfeat-fourier, internet-ads, Leukemia-3c and Embryonal Tumor and a real world data set namely Acute Myeloid Leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that compared to other techniques our algorithm improves the classification accuracy by approximately 3-4% (in data set with less than 500 features) and by more than 5% (in data set with more than 500 features), across a wide range of classifiers.
Kinetic rate constant prediction supports the conformational selection mechanism of protein binding.
Moal, Iain H; Bates, Paul A
2012-01-01
The prediction of protein-protein kinetic rate constants provides a fundamental test of our understanding of molecular recognition, and will play an important role in the modeling of complex biological systems. In this paper, a feature selection and regression algorithm is applied to mine a large set of molecular descriptors and construct simple models for association and dissociation rate constants using empirical data. Using separate test data for validation, the predicted rate constants can be combined to calculate binding affinity with accuracy matching that of state of the art empirical free energy functions. The models show that the rate of association is linearly related to the proportion of unbound proteins in the bound conformational ensemble relative to the unbound conformational ensemble, indicating that the binding partners must adopt a geometry near to that of the bound prior to binding. Mirroring the conformational selection and population shift mechanism of protein binding, the models provide a strong separate line of evidence for the preponderance of this mechanism in protein-protein binding, complementing structural and theoretical studies.
Jung, Wonmo; Bülthoff, Isabelle; Armann, Regine G M
2017-11-01
The brain can only attend to a fraction of all the information that is entering the visual system at any given moment. One way of overcoming the so-called bottleneck of selective attention (e.g., J. M. Wolfe, Võ, Evans, & Greene, 2011) is to make use of redundant visual information and extract summarized statistical information of the whole visual scene. Such ensemble representation occurs for low-level features of textures or simple objects, but it has also been reported for complex high-level properties. While the visual system has, for example, been shown to compute summary representations of facial expression, gender, or identity, it is less clear whether perceptual input from all parts of the visual field contributes equally to the ensemble percept. Here we extend the line of ensemble-representation research into the realm of race and look at the possibility that ensemble perception relies on weighting visual information differently depending on its origin from either the fovea or the visual periphery. We find that observers can judge the mean race of a set of faces, similar to judgments of mean emotion from faces and ensemble representations in low-level domains of visual processing. We also find that while peripheral faces seem to be taken into account for the ensemble percept, far more weight is given to stimuli presented foveally than peripherally. Whether this precision weighting of information stems from differences in the accuracy with which the visual system processes information across the visual field or from statistical inferences about the world needs to be determined by further research.
Ali, Safdar; Majid, Abdul; Khan, Asifullah
2014-04-01
Development of an accurate and reliable intelligent decision-making method for the construction of cancer diagnosis system is one of the fast growing research areas of health sciences. Such decision-making system can provide adequate information for cancer diagnosis and drug discovery. Descriptors derived from physicochemical properties of protein sequences are very useful for classifying cancerous proteins. Recently, several interesting research studies have been reported on breast cancer classification. To this end, we propose the exploitation of the physicochemical properties of amino acids in protein primary sequences such as hydrophobicity (Hd) and hydrophilicity (Hb) for breast cancer classification. Hd and Hb properties of amino acids, in recent literature, are reported to be quite effective in characterizing the constituent amino acids and are used to study protein foldings, interactions, structures, and sequence-order effects. Especially, using these physicochemical properties, we observed that proline, serine, tyrosine, cysteine, arginine, and asparagine amino acids offer high discrimination between cancerous and healthy proteins. In addition, unlike traditional ensemble classification approaches, the proposed 'IDM-PhyChm-Ens' method was developed by combining the decision spaces of a specific classifier trained on different feature spaces. The different feature spaces used were amino acid composition, split amino acid composition, and pseudo amino acid composition. Consequently, we have exploited different feature spaces using Hd and Hb properties of amino acids to develop an accurate method for classification of cancerous protein sequences. We developed ensemble classifiers using diverse learning algorithms such as random forest (RF), support vector machines (SVM), and K-nearest neighbor (KNN) trained on different feature spaces. We observed that ensemble-RF, in case of cancer classification, performed better than ensemble-SVM and ensemble-KNN. Our analysis demonstrates that ensemble-RF, ensemble-SVM and ensemble-KNN are more effective than their individual counterparts. The proposed 'IDM-PhyChm-Ens' method has shown improved performance compared to existing techniques.
Improving ensemble decision tree performance using Adaboost and Bagging
NASA Astrophysics Data System (ADS)
Hasan, Md. Rajib; Siraj, Fadzilah; Sainin, Mohd Shamrie
2015-12-01
Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.
Yang, Runtao; Zhang, Chengjin; Gao, Rui; Zhang, Lina
2015-09-07
Antifreeze proteins (AFPs) play a pivotal role in the antifreeze effect of overwintering organisms. They have a wide range of applications in numerous fields, such as improving the production of crops and the quality of frozen foods. Accurate identification of AFPs may provide important clues to decipher the underlying mechanisms of AFPs in ice-binding and to facilitate the selection of the most appropriate AFPs for several applications. Based on an ensemble learning technique, this study proposes an AFP identification system called AFP-Ensemble. In this system, random forest classifiers are trained by different training subsets and then aggregated into a consensus classifier by majority voting. The resulting predictor yields a sensitivity of 0.892, a specificity of 0.940, an accuracy of 0.938 and a balanced accuracy of 0.916 on an independent dataset, which are far better than the results obtained by previous methods. These results reveal that AFP-Ensemble is an effective and promising predictor for large-scale determination of AFPs. The detailed feature analysis in this study may give useful insights into the molecular mechanisms of AFP-ice interactions and provide guidance for the related experimental validation. A web server has been designed to implement the proposed method.
A novel feature extraction scheme with ensemble coding for protein-protein interaction prediction.
Du, Xiuquan; Cheng, Jiaxing; Zheng, Tingting; Duan, Zheng; Qian, Fulan
2014-07-18
Protein-protein interactions (PPIs) play key roles in most cellular processes, such as cell metabolism, immune response, endocrine function, DNA replication, and transcription regulation. PPI prediction is one of the most challenging problems in functional genomics. Although PPI data have been increasing because of the development of high-throughput technologies and computational methods, many problems are still far from being solved. In this study, a novel predictor was designed by using the Random Forest (RF) algorithm with the ensemble coding (EC) method. To reduce computational time, a feature selection method (DX) was adopted to rank the features and search the optimal feature combination. The DXEC method integrates many features and physicochemical/biochemical properties to predict PPIs. On the Gold Yeast dataset, the DXEC method achieves 67.2% overall precision, 80.74% recall, and 70.67% accuracy. On the Silver Yeast dataset, the DXEC method achieves 76.93% precision, 77.98% recall, and 77.27% accuracy. On the human dataset, the prediction accuracy reaches 80% for the DXEC-RF method. We extended the experiment to a bigger and more realistic dataset that maintains 50% recall on the Yeast All dataset and 80% recall on the Human All dataset. These results show that the DXEC method is suitable for performing PPI prediction. The prediction service of the DXEC-RF classifier is available at http://ailab.ahu.edu.cn:8087/ DXECPPI/index.jsp.
A Simple Approach to Account for Climate Model Interdependence in Multi-Model Ensembles
NASA Astrophysics Data System (ADS)
Herger, N.; Abramowitz, G.; Angelil, O. M.; Knutti, R.; Sanderson, B.
2016-12-01
Multi-model ensembles are an indispensable tool for future climate projection and its uncertainty quantification. Ensembles containing multiple climate models generally have increased skill, consistency and reliability. Due to the lack of agreed-on alternatives, most scientists use the equally-weighted multi-model mean as they subscribe to model democracy ("one model, one vote").Different research groups are known to share sections of code, parameterizations in their model, literature, or even whole model components. Therefore, individual model runs do not represent truly independent estimates. Ignoring this dependence structure might lead to a false model consensus, wrong estimation of uncertainty and effective number of independent models.Here, we present a way to partially address this problem by selecting a subset of CMIP5 model runs so that its climatological mean minimizes the RMSE compared to a given observation product. Due to the cancelling out of errors, regional biases in the ensemble mean are reduced significantly.Using a model-as-truth experiment we demonstrate that those regional biases persist into the future and we are not fitting noise, thus providing improved observationally-constrained projections of the 21st century. The optimally selected ensemble shows significantly higher global mean surface temperature projections than the original ensemble, where all the model runs are considered. Moreover, the spread is decreased well beyond that expected from the decreased ensemble size.Several previous studies have recommended an ensemble selection approach based on performance ranking of the model runs. Here, we show that this approach can perform even worse than randomly selecting ensemble members and can thus be harmful. We suggest that accounting for interdependence in the ensemble selection process is a necessary step for robust projections for use in impact assessments, adaptation and mitigation of climate change.
Bayesian network ensemble as a multivariate strategy to predict radiation pneumonitis risk
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Sangkyu, E-mail: sangkyu.lee@mail.mcgill.ca; Ybarra, Norma; Jeyaseelan, Krishinima
2015-05-15
Purpose: Prediction of radiation pneumonitis (RP) has been shown to be challenging due to the involvement of a variety of factors including dose–volume metrics and radiosensitivity biomarkers. Some of these factors are highly correlated and might affect prediction results when combined. Bayesian network (BN) provides a probabilistic framework to represent variable dependencies in a directed acyclic graph. The aim of this study is to integrate the BN framework and a systems’ biology approach to detect possible interactions among RP risk factors and exploit these relationships to enhance both the understanding and prediction of RP. Methods: The authors studied 54 nonsmall-cellmore » lung cancer patients who received curative 3D-conformal radiotherapy. Nineteen RP events were observed (common toxicity criteria for adverse events grade 2 or higher). Serum concentration of the following four candidate biomarkers were measured at baseline and midtreatment: alpha-2-macroglobulin, angiotensin converting enzyme (ACE), transforming growth factor, interleukin-6. Dose-volumetric and clinical parameters were also included as covariates. Feature selection was performed using a Markov blanket approach based on the Koller–Sahami filter. The Markov chain Monte Carlo technique estimated the posterior distribution of BN graphs built from the observed data of the selected variables and causality constraints. RP probability was estimated using a limited number of high posterior graphs (ensemble) and was averaged for the final RP estimate using Bayes’ rule. A resampling method based on bootstrapping was applied to model training and validation in order to control under- and overfit pitfalls. Results: RP prediction power of the BN ensemble approach reached its optimum at a size of 200. The optimized performance of the BN model recorded an area under the receiver operating characteristic curve (AUC) of 0.83, which was significantly higher than multivariate logistic regression (0.77), mean heart dose (0.69), and a pre-to-midtreatment change in ACE (0.66). When RP prediction was made only with pretreatment information, the AUC ranged from 0.76 to 0.81 depending on the ensemble size. Bootstrap validation of graph features in the ensemble quantified confidence of association between variables in the graphs where ten interactions were statistically significant. Conclusions: The presented BN methodology provides the flexibility to model hierarchical interactions between RP covariates, which is applied to probabilistic inference on RP. The authors’ preliminary results demonstrate that such framework combined with an ensemble method can possibly improve prediction of RP under real-life clinical circumstances such as missing data or treatment plan adaptation.« less
Pilkiw, Maryna; Insel, Nathan; Cui, Younghua; Finney, Caitlin; Morrissey, Mark D; Takehara-Nishiuchi, Kaori
2017-01-01
The lateral entorhinal cortex (LEC) is thought to bind sensory events with the environment where they took place. To compare the relative influence of transient events and temporally stable environmental stimuli on the firing of LEC cells, we recorded neuron spiking patterns in the region during blocks of a trace eyeblink conditioning paradigm performed in two environments and with different conditioning stimuli. Firing rates of some neurons were phasically selective for conditioned stimuli in a way that depended on which room the rat was in; nearly all neurons were tonically selective for environments in a way that depended on which stimuli had been presented in those environments. As rats moved from one environment to another, tonic neuron ensemble activity exhibited prospective information about the conditioned stimulus associated with the environment. Thus, the LEC formed phasic and tonic codes for event-environment associations, thereby accurately differentiating multiple experiences with overlapping features. DOI: http://dx.doi.org/10.7554/eLife.28611.001 PMID:28682237
A novel algorithm for reducing false arrhythmia alarms in intensive care units.
Srivastava, Chandan; Sharma, Sonal; Jalali, Ali
2016-08-01
Alarm fatigue in intensive care units (ICU) is one of the top healthcare issues in the US. False alarms in ICU will decrease the quality of care and staff response time over the alarms. Normally, false alarm will cause desensitization of the clinical staff which leads to warnings and misleading, if the triggered alarm is true. In this study, we have proposed a multi-model ensemble approach to reduce the false alarm rate in monitoring systems. We have used 750 patient records from PhysioNet database. At First arrhythmia based features from electrocardiogram (ECG), arterial blood pressure (ABP) and photoplethysmogram (PPG) features were extracted from the records. Next, the dataset has been separated into two subsets on the basis of available features information. The first dataset (DS1) is the combination of ECG physiological, ABP and PPG features. Their correlation coefficient and p-values criteria have been applied for relevant alarm-wise feature-set selection, and random forest classifier was used for model development and validation. The threshold based approach was used on second dataset (DS2) which is the combination of arrhythmia, ABP and PPG features. The developed ensemble model is able to achieve sensitivity 83.33-100 % (average 95.56 %) being true alarms and suppress false alarms rate 66.67-89% (average 77.25%). The predictability of classifier shows the advantage to deal with unbalanced set of information, therefore overall model performance has reached to 83.96% accuracy.
ERIC Educational Resources Information Center
Berndt, William; Berndt, Arnold
This catalogue lists over 350 phonograph records which feature solo and ensemble music by wind and percussion instruments. Instruments heard on the records include oboe/English horn, flute, clarinet, bassoon, trumpet/cornet, French horn, trombone, baritone, tuba, saxophone, percussion, woodwind ensembles, and brass ensembles. The catalogue is…
An ensemble model of QSAR tools for regulatory risk assessment.
Pradeep, Prachi; Povinelli, Richard J; White, Shannon; Merrill, Stephen J
2016-01-01
Quantitative structure activity relationships (QSARs) are theoretical models that relate a quantitative measure of chemical structure to a physical property or a biological effect. QSAR predictions can be used for chemical risk assessment for protection of human and environmental health, which makes them interesting to regulators, especially in the absence of experimental data. For compatibility with regulatory use, QSAR models should be transparent, reproducible and optimized to minimize the number of false negatives. In silico QSAR tools are gaining wide acceptance as a faster alternative to otherwise time-consuming clinical and animal testing methods. However, different QSAR tools often make conflicting predictions for a given chemical and may also vary in their predictive performance across different chemical datasets. In a regulatory context, conflicting predictions raise interpretation, validation and adequacy concerns. To address these concerns, ensemble learning techniques in the machine learning paradigm can be used to integrate predictions from multiple tools. By leveraging various underlying QSAR algorithms and training datasets, the resulting consensus prediction should yield better overall predictive ability. We present a novel ensemble QSAR model using Bayesian classification. The model allows for varying a cut-off parameter that allows for a selection in the desirable trade-off between model sensitivity and specificity. The predictive performance of the ensemble model is compared with four in silico tools (Toxtree, Lazar, OECD Toolbox, and Danish QSAR) to predict carcinogenicity for a dataset of air toxins (332 chemicals) and a subset of the gold carcinogenic potency database (480 chemicals). Leave-one-out cross validation results show that the ensemble model achieves the best trade-off between sensitivity and specificity (accuracy: 83.8 % and 80.4 %, and balanced accuracy: 80.6 % and 80.8 %) and highest inter-rater agreement [kappa ( κ ): 0.63 and 0.62] for both the datasets. The ROC curves demonstrate the utility of the cut-off feature in the predictive ability of the ensemble model. This feature provides an additional control to the regulators in grading a chemical based on the severity of the toxic endpoint under study.
Deep learning ensemble with asymptotic techniques for oscillometric blood pressure estimation.
Lee, Soojeong; Chang, Joon-Hyuk
2017-11-01
This paper proposes a deep learning based ensemble regression estimator with asymptotic techniques, and offers a method that can decrease uncertainty for oscillometric blood pressure (BP) measurements using the bootstrap and Monte-Carlo approach. While the former is used to estimate SBP and DBP, the latter attempts to determine confidence intervals (CIs) for SBP and DBP based on oscillometric BP measurements. This work originally employs deep belief networks (DBN)-deep neural networks (DNN) to effectively estimate BPs based on oscillometric measurements. However, there are some inherent problems with these methods. First, it is not easy to determine the best DBN-DNN estimator, and worthy information might be omitted when selecting one DBN-DNN estimator and discarding the others. Additionally, our input feature vectors, obtained from only five measurements per subject, represent a very small sample size; this is a critical weakness when using the DBN-DNN technique and can cause overfitting or underfitting, depending on the structure of the algorithm. To address these problems, an ensemble with an asymptotic approach (based on combining the bootstrap with the DBN-DNN technique) is utilized to generate the pseudo features needed to estimate the SBP and DBP. In the first stage, the bootstrap-aggregation technique is used to create ensemble parameters. Afterward, the AdaBoost approach is employed for the second-stage SBP and DBP estimation. We then use the bootstrap and Monte-Carlo techniques in order to determine the CIs based on the target BP estimated using the DBN-DNN ensemble regression estimator with the asymptotic technique in the third stage. The proposed method can mitigate the estimation uncertainty such as large the standard deviation of error (SDE) on comparing the proposed DBN-DNN ensemble regression estimator with the DBN-DNN single regression estimator, we identify that the SDEs of the SBP and DBP are reduced by 0.58 and 0.57 mmHg, respectively. These indicate that the proposed method actually enhances the performance by 9.18% and 10.88% compared with the DBN-DNN single estimator. The proposed methodology improves the accuracy of BP estimation and reduces the uncertainty for BP estimation. Copyright © 2017 Elsevier B.V. All rights reserved.
An ensemble model of QSAR tools for regulatory risk assessment
Pradeep, Prachi; Povinelli, Richard J.; White, Shannon; ...
2016-09-22
Quantitative structure activity relationships (QSARs) are theoretical models that relate a quantitative measure of chemical structure to a physical property or a biological effect. QSAR predictions can be used for chemical risk assessment for protection of human and environmental health, which makes them interesting to regulators, especially in the absence of experimental data. For compatibility with regulatory use, QSAR models should be transparent, reproducible and optimized to minimize the number of false negatives. In silico QSAR tools are gaining wide acceptance as a faster alternative to otherwise time-consuming clinical and animal testing methods. However, different QSAR tools often make conflictingmore » predictions for a given chemical and may also vary in their predictive performance across different chemical datasets. In a regulatory context, conflicting predictions raise interpretation, validation and adequacy concerns. To address these concerns, ensemble learning techniques in the machine learning paradigm can be used to integrate predictions from multiple tools. By leveraging various underlying QSAR algorithms and training datasets, the resulting consensus prediction should yield better overall predictive ability. We present a novel ensemble QSAR model using Bayesian classification. The model allows for varying a cut-off parameter that allows for a selection in the desirable trade-off between model sensitivity and specificity. The predictive performance of the ensemble model is compared with four in silico tools (Toxtree, Lazar, OECD Toolbox, and Danish QSAR) to predict carcinogenicity for a dataset of air toxins (332 chemicals) and a subset of the gold carcinogenic potency database (480 chemicals). Leave-one-out cross validation results show that the ensemble model achieves the best trade-off between sensitivity and specificity (accuracy: 83.8 % and 80.4 %, and balanced accuracy: 80.6 % and 80.8 %) and highest inter-rater agreement [kappa (κ): 0.63 and 0.62] for both the datasets. The ROC curves demonstrate the utility of the cut-off feature in the predictive ability of the ensemble model. In conclusion, this feature provides an additional control to the regulators in grading a chemical based on the severity of the toxic endpoint under study.« less
Suh, Jong Hwan
2016-01-01
In recent years, the anonymous nature of the Internet has made it difficult to detect manipulated user reputations in social media, as well as to ensure the qualities of users and their posts. To deal with this, this study designs and examines an automatic approach that adopts writing style features to estimate user reputations in social media. Under varying ways of defining Good and Bad classes of user reputations based on the collected data, it evaluates the classification performance of the state-of-art methods: four writing style features, i.e. lexical, syntactic, structural, and content-specific, and eight classification techniques, i.e. four base learners-C4.5, Neural Network (NN), Support Vector Machine (SVM), and Naïve Bayes (NB)-and four Random Subspace (RS) ensemble methods based on the four base learners. When South Korea's Web forum, Daum Agora, was selected as a test bed, the experimental results show that the configuration of the full feature set containing content-specific features and RS-SVM combining RS and SVM gives the best accuracy for classification if the test bed poster reputations are segmented strictly into Good and Bad classes by portfolio approach. Pairwise t tests on accuracy confirm two expectations coming from the literature reviews: first, the feature set adding content-specific features outperform the others; second, ensemble learning methods are more viable than base learners. Moreover, among the four ways on defining the classes of user reputations, i.e. like, dislike, sum, and portfolio, the results show that the portfolio approach gives the highest accuracy.
Multimodel Ensemble Methods for Prediction of Wake-Vortex Transport and Decay Originating NASA
NASA Technical Reports Server (NTRS)
Korner, Stephan; Ahmad, Nashat N.; Holzapfel, Frank; VanValkenburg, Randal L.
2017-01-01
Several multimodel ensemble methods are selected and further developed to improve the deterministic and probabilistic prediction skills of individual wake-vortex transport and decay models. The different multimodel ensemble methods are introduced, and their suitability for wake applications is demonstrated. The selected methods include direct ensemble averaging, Bayesian model averaging, and Monte Carlo simulation. The different methodologies are evaluated employing data from wake-vortex field measurement campaigns conducted in the United States and Germany.
Strecker, Claas; Meyer, Bernd
2018-05-29
Protein flexibility poses a major challenge to docking of potential ligands in that the binding site can adopt different shapes. Docking algorithms usually keep the protein rigid and only allow the ligand to be treated as flexible. However, a wrong assessment of the shape of the binding pocket can prevent a ligand from adapting a correct pose. Ensemble docking is a simple yet promising method to solve this problem: Ligands are docked into multiple structures, and the results are subsequently merged. Selection of protein structures is a significant factor for this approach. In this work we perform a comprehensive and comparative study evaluating the impact of structure selection on ensemble docking. We perform ensemble docking with several crystal structures and with structures derived from molecular dynamics simulations of renin, an attractive target for antihypertensive drugs. Here, 500 ns of MD simulations revealed binding site shapes not found in any available crystal structure. We evaluate the importance of structure selection for ensemble docking by comparing binding pose prediction, ability to rank actives above nonactives (screening utility), and scoring accuracy. As a result, for ensemble definition k-means clustering appears to be better suited than hierarchical clustering with average linkage. The best performing ensemble consists of four crystal structures and is able to reproduce the native ligand poses better than any individual crystal structure. Moreover this ensemble outperforms 88% of all individual crystal structures in terms of screening utility as well as scoring accuracy. Similarly, ensembles of MD-derived structures perform on average better than 75% of any individual crystal structure in terms of scoring accuracy at all inspected ensembles sizes.
Pauci ex tanto numero: reducing redundancy in multi-model ensembles
NASA Astrophysics Data System (ADS)
Solazzo, E.; Riccio, A.; Kioutsioukis, I.; Galmarini, S.
2013-02-01
We explicitly address the fundamental issue of member diversity in multi-model ensembles. To date no attempts in this direction are documented within the air quality (AQ) community, although the extensive use of ensembles in this field. Common biases and redundancy are the two issues directly deriving from lack of independence, undermining the significance of a multi-model ensemble, and are the subject of this study. Shared biases among models will determine a biased ensemble, making therefore essential the errors of the ensemble members to be independent so that bias can cancel out. Redundancy derives from having too large a portion of common variance among the members of the ensemble, producing overconfidence in the predictions and underestimation of the uncertainty. The two issues of common biases and redundancy are analysed in detail using the AQMEII ensemble of AQ model results for four air pollutants in two European regions. We show that models share large portions of bias and variance, extending well beyond those induced by common inputs. We make use of several techniques to further show that subsets of models can explain the same amount of variance as the full ensemble with the advantage of being poorly correlated. Selecting the members for generating skilful, non-redundant ensembles from such subsets proved, however, non-trivial. We propose and discuss various methods of member selection and rate the ensemble performance they produce. In most cases, the full ensemble is outscored by the reduced ones. We conclude that, although independence of outputs may not always guarantee enhancement of scores (but this depends upon the skill being investigated) we discourage selecting the members of the ensemble simply on the basis of scores, that is, independence and skills need to be considered disjointly.
Analyzing the impact of changing size and composition of a crop model ensemble
NASA Astrophysics Data System (ADS)
Rodríguez, Alfredo
2017-04-01
The use of an ensemble of crop growth simulation models is a practice recently adopted in order to quantify aspects of uncertainties in model simulations. Yet, while the climate modelling community has extensively investigated the properties of model ensembles and their implications, this has hardly been investigated for crop model ensembles (Wallach et al., 2016). In their ensemble of 27 wheat models, Martre et al. (2015) found that the accuracy of the multi-model ensemble-average only increases up to an ensemble size of ca. 10, but does not improve when including more models in the analysis. However, even when this number of members is reached, questions about the impact of the addition or removal of a member to/from the ensemble arise. When selecting ensemble members, identifying members with poor performance or giving implausible results can make a large difference on the outcome. The objective of this study is to set up a methodology that defines indicators to show the effects of changing the ensemble composition and size on simulation results, when a selection procedure of ensemble members is applied. Ensemble mean or median, and variance are measures used to depict ensemble results among other indicators. We are utilizing simulations from an ensemble of wheat models that have been used to construct impact response surfaces (Pirttioja et al., 2015) (IRSs). These show the response of an impact variable (e.g., crop yield) to systematic changes in two explanatory variables (e.g., precipitation and temperature). Using these, we compare different sub-ensembles in terms of the mean, median and spread, and also by comparing IRSs. The methodology developed here allows comparing an ensemble before and after applying any procedure that changes the ensemble composition and size by measuring the impact of this decision on the ensemble central tendency measures. The methodology could also be further developed to compare the effect of changing ensemble composition and size on IRS features. References Martre, P., Wallach, D., Asseng, S., Ewert, F., Jones, J.W., Rötter, R.P., Boote, K.J., Ruane, A.C., Thorburn, P.J., Cammarano, D., Hatfield, J.L., Rosenzweig, C., Aggarwal, P.K., Angulo, C., Basso, B., Bertuzzi, P., Biernath, C., Brisson, N., Challinor, A.J., Doltra, J., Gayler, S., Goldberg, R., Grant, R.F., Heng, L., Hooker, J., Hunt, L.A., Ingwersen, J., Izaurralde, R.C., Kersebaum, K.C., Muller, C., Kumar, S.N., Nendel, C., O'Leary, G., Olesen, J.E., Osborne, T.M., Palosuo, T., Priesack, E., Ripoche, D., Semenov, M.A., Shcherbak, I., Steduto, P., Stockle, C.O., Stratonovitch, P., Streck, T., Supit, I., Tao, F.L., Travasso, M., Waha, K., White, J.W., Wolf, J., 2015. Multimodel ensembles of wheat growth: many models are better than one. Glob. Change Biol. 21, 911-925. Pirttioja N., Carter T., Fronzek S., Bindi M., Hoffmann H., Palosuo T., Ruiz-Ramos, M., Tao F., Trnka M., Acutis M., Asseng S., Baranowski P., Basso B., Bodin P., Buis S., Cammarano D., Deligios P., Destain M.-F., Doro L., Dumont B., Ewert F., Ferrise R., Francois L., Gaiser T., Hlavinka P., Jacquemin I., Kersebaum K.-C., Kollas C., Krzyszczak J., Lorite I. J., Minet J., Minguez M. I., Montesion M., Moriondo M., Müller C., Nendel C., Öztürk I., Perego A., Rodriguez, A., Ruane A.C., Ruget F., Sanna M., Semenov M., Slawinski C., Stratonovitch P., Supit I., Waha K., Wang E., Wu L., Zhao Z., Rötter R.P, 2015. A crop model ensemble analysis of temperature and precipitation effects on wheat yield across a European transect using impact response surfaces. Clim. Res., 65:87-105, doi:10.3354/cr01322 Wallach, D., Mearns, L.O. Ruane, A.C., Rötter, R.P., Asseng, S. (2016). Lessons from climate modeling on the design and use of ensembles for crop modeling. Climate Change (in press) doi:10.1007/s10584-016-1803-1.
Ensemble training to improve recognition using 2D ear
NASA Astrophysics Data System (ADS)
Middendorff, Christopher; Bowyer, Kevin W.
2009-05-01
The ear has gained popularity as a biometric feature due to the robustness of the shape over time and across emotional expression. Popular methods of ear biometrics analyze the ear as a whole, leaving these methods vulnerable to error due to occlusion. Many researchers explore ear recognition using an ensemble, but none present a method for designing the individual parts that comprise the ensemble. In this work, we introduce a method of modifying the ensemble shapes to improve performance. We determine how different properties of an ensemble training system can affect overall performance. We show that ensembles built from small parts will outperform ensembles built with larger parts, and that incorporating a large number of parts improves the performance of the ensemble.
Feature selection and classification of multiparametric medical images using bagging and SVM
NASA Astrophysics Data System (ADS)
Fan, Yong; Resnick, Susan M.; Davatzikos, Christos
2008-03-01
This paper presents a framework for brain classification based on multi-parametric medical images. This method takes advantage of multi-parametric imaging to provide a set of discriminative features for classifier construction by using a regional feature extraction method which takes into account joint correlations among different image parameters; in the experiments herein, MRI and PET images of the brain are used. Support vector machine classifiers are then trained based on the most discriminative features selected from the feature set. To facilitate robust classification and optimal selection of parameters involved in classification, in view of the well-known "curse of dimensionality", base classifiers are constructed in a bagging (bootstrap aggregating) framework for building an ensemble classifier and the classification parameters of these base classifiers are optimized by means of maximizing the area under the ROC (receiver operating characteristic) curve estimated from their prediction performance on left-out samples of bootstrap sampling. This classification system is tested on a sex classification problem, where it yields over 90% classification rates for unseen subjects. The proposed classification method is also compared with other commonly used classification algorithms, with favorable results. These results illustrate that the methods built upon information jointly extracted from multi-parametric images have the potential to perform individual classification with high sensitivity and specificity.
NASA Astrophysics Data System (ADS)
Young, Jonathan; Ridgway, Gerard; Leung, Kelvin; Ourselin, Sebastien
2012-02-01
It is well known that hippocampal atrophy is a marker of the onset of Alzheimer's disease (AD) and as a result hippocampal volumetry has been used in a number of studies to provide early diagnosis of AD and predict conversion of mild cognitive impairment patients to AD. However, rates of atrophy are not uniform across the hippocampus making shape analysis a potentially more accurate biomarker. This study studies the hippocampi from 226 healthy controls, 148 AD patients and 330 MCI patients obtained from T1 weighted structural MRI images from the ADNI database. The hippocampi are anatomically segmented using the MAPS multi-atlas segmentation method, and the resulting binary images are then processed with SPHARM software to decompose their shapes as a weighted sum of spherical harmonic basis functions. The resulting parameterizations are then used as feature vectors in Support Vector Machine (SVM) classification. A wrapper based feature selection method was used as this considers the utility of features in discriminating classes in combination, fully exploiting the multivariate nature of the data and optimizing the selected set of features for the type of classifier that is used. The leave-one-out cross validated accuracy obtained on training data is 88.6% for classifying AD vs controls and 74% for classifying MCI-converters vs MCI-stable with very compact feature sets, showing that this is a highly promising method. There is currently a considerable fall in accuracy on unseen data indicating that the feature selection is sensitive to the data used, however feature ensemble methods may overcome this.
Mining SNPs from EST sequences using filters and ensemble classifiers.
Wang, J; Zou, Q; Guo, M Z
2010-05-04
Abundant single nucleotide polymorphisms (SNPs) provide the most complete information for genome-wide association studies. However, due to the bottleneck of manual discovery of putative SNPs and the inaccessibility of the original sequencing reads, it is essential to develop a more efficient and accurate computational method for automated SNP detection. We propose a novel computational method to rapidly find true SNPs in public-available EST (expressed sequence tag) databases; this method is implemented as SNPDigger. EST sequences are clustered and aligned. SNP candidates are then obtained according to a measure of redundant frequency. Several new informative biological features, such as the structural neighbor profiles and the physical position of the SNP, were extracted from EST sequences, and the effectiveness of these features was demonstrated. An ensemble classifier, which employs a carefully selected feature set, was included for the imbalanced training data. The sensitivity and specificity of our method both exceeded 80% for human genetic data in the cross validation. Our method enables detection of SNPs from the user's own EST dataset and can be used on species for which there is no genome data. Our tests showed that this method can effectively guide SNP discovery in ESTs and will be useful to avoid and save the cost of biological analyses.
Ensemble approach for differentiation of malignant melanoma
NASA Astrophysics Data System (ADS)
Rastgoo, Mojdeh; Morel, Olivier; Marzani, Franck; Garcia, Rafael
2015-04-01
Melanoma is the deadliest type of skin cancer, yet it is the most treatable kind depending on its early diagnosis. The early prognosis of melanoma is a challenging task for both clinicians and dermatologists. Due to the importance of early diagnosis and in order to assist the dermatologists, we propose an automated framework based on ensemble learning methods and dermoscopy images to differentiate melanoma from dysplastic and benign lesions. The evaluation of our framework on the recent and public dermoscopy benchmark (PH2 dataset) indicates the potential of proposed method. Our evaluation, using only global features, revealed that ensembles such as random forest perform better than single learner. Using random forest ensemble and combination of color and texture features, our framework achieved the highest sensitivity of 94% and specificity of 92%.
Effective Feature Selection for Classification of Promoter Sequences.
K, Kouser; P G, Lavanya; Rangarajan, Lalitha; K, Acharya Kshitish
2016-01-01
Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.
NASA Astrophysics Data System (ADS)
Fayaz, S. M.; Rajanikant, G. K.
2014-07-01
Programmed cell death has been a fascinating area of research since it throws new challenges and questions in spite of the tremendous ongoing research in this field. Recently, necroptosis, a programmed form of necrotic cell death, has been implicated in many diseases including neurological disorders. Receptor interacting serine/threonine protein kinase 1 (RIPK1) is an important regulatory protein involved in the necroptosis and inhibition of this protein is essential to stop necroptotic process and eventually cell death. Current structure-based virtual screening methods involve a wide range of strategies and recently, considering the multiple protein structures for pharmacophore extraction has been emphasized as a way to improve the outcome. However, using the pharmacophoric information completely during docking is very important. Further, in such methods, using the appropriate protein structures for docking is desirable. If not, potential compound hits, obtained through pharmacophore-based screening, may not have correct ranks and scores after docking. Therefore, a comprehensive integration of different ensemble methods is essential, which may provide better virtual screening results. In this study, dual ensemble screening, a novel computational strategy was used to identify diverse and potent inhibitors against RIPK1. All the pharmacophore features present in the binding site were captured using both the apo and holo protein structures and an ensemble pharmacophore was built by combining these features. This ensemble pharmacophore was employed in pharmacophore-based screening of ZINC database. The compound hits, thus obtained, were subjected to ensemble docking. The leads acquired through docking were further validated through feature evaluation and molecular dynamics simulation.
Ling, Qing-Hua; Song, Yu-Qing; Han, Fei; Yang, Dan; Huang, De-Shuang
2016-01-01
For ensemble learning, how to select and combine the candidate classifiers are two key issues which influence the performance of the ensemble system dramatically. Random vector functional link networks (RVFL) without direct input-to-output links is one of suitable base-classifiers for ensemble systems because of its fast learning speed, simple structure and good generalization performance. In this paper, to obtain a more compact ensemble system with improved convergence performance, an improved ensemble of RVFL based on attractive and repulsive particle swarm optimization (ARPSO) with double optimization strategy is proposed. In the proposed method, ARPSO is applied to select and combine the candidate RVFL. As for using ARPSO to select the optimal base RVFL, ARPSO considers both the convergence accuracy on the validation data and the diversity of the candidate ensemble system to build the RVFL ensembles. In the process of combining RVFL, the ensemble weights corresponding to the base RVFL are initialized by the minimum norm least-square method and then further optimized by ARPSO. Finally, a few redundant RVFL is pruned, and thus the more compact ensemble of RVFL is obtained. Moreover, in this paper, theoretical analysis and justification on how to prune the base classifiers on classification problem is presented, and a simple and practically feasible strategy for pruning redundant base classifiers on both classification and regression problems is proposed. Since the double optimization is performed on the basis of the single optimization, the ensemble of RVFL built by the proposed method outperforms that built by some single optimization methods. Experiment results on function approximation and classification problems verify that the proposed method could improve its convergence accuracy as well as reduce the complexity of the ensemble system. PMID:27835638
Ling, Qing-Hua; Song, Yu-Qing; Han, Fei; Yang, Dan; Huang, De-Shuang
2016-01-01
For ensemble learning, how to select and combine the candidate classifiers are two key issues which influence the performance of the ensemble system dramatically. Random vector functional link networks (RVFL) without direct input-to-output links is one of suitable base-classifiers for ensemble systems because of its fast learning speed, simple structure and good generalization performance. In this paper, to obtain a more compact ensemble system with improved convergence performance, an improved ensemble of RVFL based on attractive and repulsive particle swarm optimization (ARPSO) with double optimization strategy is proposed. In the proposed method, ARPSO is applied to select and combine the candidate RVFL. As for using ARPSO to select the optimal base RVFL, ARPSO considers both the convergence accuracy on the validation data and the diversity of the candidate ensemble system to build the RVFL ensembles. In the process of combining RVFL, the ensemble weights corresponding to the base RVFL are initialized by the minimum norm least-square method and then further optimized by ARPSO. Finally, a few redundant RVFL is pruned, and thus the more compact ensemble of RVFL is obtained. Moreover, in this paper, theoretical analysis and justification on how to prune the base classifiers on classification problem is presented, and a simple and practically feasible strategy for pruning redundant base classifiers on both classification and regression problems is proposed. Since the double optimization is performed on the basis of the single optimization, the ensemble of RVFL built by the proposed method outperforms that built by some single optimization methods. Experiment results on function approximation and classification problems verify that the proposed method could improve its convergence accuracy as well as reduce the complexity of the ensemble system.
Pauci ex tanto numero: reduce redundancy in multi-model ensembles
NASA Astrophysics Data System (ADS)
Solazzo, E.; Riccio, A.; Kioutsioukis, I.; Galmarini, S.
2013-08-01
We explicitly address the fundamental issue of member diversity in multi-model ensembles. To date, no attempts in this direction have been documented within the air quality (AQ) community despite the extensive use of ensembles in this field. Common biases and redundancy are the two issues directly deriving from lack of independence, undermining the significance of a multi-model ensemble, and are the subject of this study. Shared, dependant biases among models do not cancel out but will instead determine a biased ensemble. Redundancy derives from having too large a portion of common variance among the members of the ensemble, producing overconfidence in the predictions and underestimation of the uncertainty. The two issues of common biases and redundancy are analysed in detail using the AQMEII ensemble of AQ model results for four air pollutants in two European regions. We show that models share large portions of bias and variance, extending well beyond those induced by common inputs. We make use of several techniques to further show that subsets of models can explain the same amount of variance as the full ensemble with the advantage of being poorly correlated. Selecting the members for generating skilful, non-redundant ensembles from such subsets proved, however, non-trivial. We propose and discuss various methods of member selection and rate the ensemble performance they produce. In most cases, the full ensemble is outscored by the reduced ones. We conclude that, although independence of outputs may not always guarantee enhancement of scores (but this depends upon the skill being investigated), we discourage selecting the members of the ensemble simply on the basis of scores; that is, independence and skills need to be considered disjointly.
On the structure and phase transitions of power-law Poissonian ensembles
NASA Astrophysics Data System (ADS)
Eliazar, Iddo; Oshanin, Gleb
2012-10-01
Power-law Poissonian ensembles are Poisson processes that are defined on the positive half-line, and that are governed by power-law intensities. Power-law Poissonian ensembles are stochastic objects of fundamental significance; they uniquely display an array of fractal features and they uniquely generate a span of important applications. In this paper we apply three different methods—oligarchic analysis, Lorenzian analysis and heterogeneity analysis—to explore power-law Poissonian ensembles. The amalgamation of these analyses, combined with the topology of power-law Poissonian ensembles, establishes a detailed and multi-faceted picture of the statistical structure and the statistical phase transitions of these elemental ensembles.
Amezquita-Sanchez, Juan P; Adeli, Anahita; Adeli, Hojjat
2016-05-15
Mild cognitive impairment (MCI) is a cognitive disorder characterized by memory impairment, greater than expected by age. A new methodology is presented to identify MCI patients during a working memory task using MEG signals. The methodology consists of four steps: In step 1, the complete ensemble empirical mode decomposition (CEEMD) is used to decompose the MEG signal into a set of adaptive sub-bands according to its contained frequency information. In step 2, a nonlinear dynamics measure based on permutation entropy (PE) analysis is employed to analyze the sub-bands and detect features to be used for MCI detection. In step 3, an analysis of variation (ANOVA) is used for feature selection. In step 4, the enhanced probabilistic neural network (EPNN) classifier is applied to the selected features to distinguish between MCI and healthy patients. The usefulness and effectiveness of the proposed methodology are validated using the sensed MEG data obtained experimentally from 18 MCI and 19 control patients. Copyright © 2016 Elsevier B.V. All rights reserved.
Automated detection of pulmonary nodules in CT images with support vector machines
NASA Astrophysics Data System (ADS)
Liu, Lu; Liu, Wanyu; Sun, Xiaoming
2008-10-01
Many methods have been proposed to avoid radiologists fail to diagnose small pulmonary nodules. Recently, support vector machines (SVMs) had received an increasing attention for pattern recognition. In this paper, we present a computerized system aimed at pulmonary nodules detection; it identifies the lung field, extracts a set of candidate regions with a high sensitivity ratio and then classifies candidates by the use of SVMs. The Computer Aided Diagnosis (CAD) system presented in this paper supports the diagnosis of pulmonary nodules from Computed Tomography (CT) images as inflammation, tuberculoma, granuloma..sclerosing hemangioma, and malignant tumor. Five texture feature sets were extracted for each lesion, while a genetic algorithm based feature selection method was applied to identify the most robust features. The selected feature set was fed into an ensemble of SVMs classifiers. The achieved classification performance was 100%, 92.75% and 90.23% in the training, validation and testing set, respectively. It is concluded that computerized analysis of medical images in combination with artificial intelligence can be used in clinical practice and may contribute to more efficient diagnosis.
NASA Astrophysics Data System (ADS)
Löw, Fabian; Schorcht, Gunther; Michel, Ulrich; Dech, Stefan; Conrad, Christopher
2012-10-01
Accurate crop identification and crop area estimation are important for studies on irrigated agricultural systems, yield and water demand modeling, and agrarian policy development. In this study a novel combination of Random Forest (RF) and Support Vector Machine (SVM) classifiers is presented that (i) enhances crop classification accuracy and (ii) provides spatial information on map uncertainty. The methodology was implemented over four distinct irrigated sites in Middle Asia using RapidEye time series data. The RF feature importance statistics was used as feature-selection strategy for the SVM to assess possible negative effects on classification accuracy caused by an oversized feature space. The results of the individual RF and SVM classifications were combined with rules based on posterior classification probability and estimates of classification probability entropy. SVM classification performance was increased by feature selection through RF. Further experimental results indicate that the hybrid classifier improves overall classification accuracy in comparison to the single classifiers as well as useŕs and produceŕs accuracy.
Salmon, Loïc; Giambaşu, George M; Nikolova, Evgenia N; Petzold, Katja; Bhattacharya, Akash; Case, David A; Al-Hashimi, Hashim M
2015-10-14
Approaches that combine experimental data and computational molecular dynamics (MD) to determine atomic resolution ensembles of biomolecules require the measurement of abundant experimental data. NMR residual dipolar couplings (RDCs) carry rich dynamics information, however, difficulties in modulating overall alignment of nucleic acids have limited the ability to fully extract this information. We present a strategy for modulating RNA alignment that is based on introducing variable dynamic kinks in terminal helices. With this strategy, we measured seven sets of RDCs in a cUUCGg apical loop and used this rich data set to test the accuracy of an 0.8 μs MD simulation computed using the Amber ff10 force field as well as to determine an atomic resolution ensemble. The MD-generated ensemble quantitatively reproduces the measured RDCs, but selection of a sub-ensemble was required to satisfy the RDCs within error. The largest discrepancies between the RDC-selected and MD-generated ensembles are observed for the most flexible loop residues and backbone angles connecting the loop to the helix, with the RDC-selected ensemble resulting in more uniform dynamics. Comparison of the RDC-selected ensemble with NMR spin relaxation data suggests that the dynamics occurs on the ps-ns time scales as verified by measurements of R(1ρ) relaxation-dispersion data. The RDC-satisfying ensemble samples many conformations adopted by the hairpin in crystal structures indicating that intrinsic plasticity may play important roles in conformational adaptation. The approach presented here can be applied to test nucleic acid force fields and to characterize dynamics in diverse RNA motifs at atomic resolution.
NASA Astrophysics Data System (ADS)
Yang, Wei; Zhang, Su; Li, Wenying; Chen, Yaqing; Lu, Hongtao; Chen, Wufan; Chen, Yazhu
2010-04-01
Various computerized features extracted from breast ultrasound images are useful in assessing the malignancy of breast tumors. However, the underlying relationship between the computerized features and tumor malignancy may not be linear in nature. We use the decision tree ensemble trained by the cost-sensitive boosting algorithm to approximate the target function for malignancy assessment and to reflect this relationship qualitatively. Partial dependence plots are employed to explore and visualize the effect of features on the output of the decision tree ensemble. In the experiments, 31 image features are extracted to quantify the sonographic characteristics of breast tumors. Patient age is used as an external feature because of its high clinical importance. The area under the receiver-operating characteristic curve of the tree ensembles can reach 0.95 with sensitivity of 0.95 (61/64) at the associated specificity 0.74 (77/104). The partial dependence plots of the four most important features are demonstrated to show the influence of the features on malignancy, and they are in accord with the empirical observations. The results can provide visual and qualitative references on the computerized image features for physicians, and can be useful for enhancing the interpretability of computer-aided diagnosis systems for breast ultrasound.
Using ensemble of classifiers for predicting HIV protease cleavage sites in proteins.
Nanni, Loris; Lumini, Alessandra
2009-03-01
The focus of this work is the use of ensembles of classifiers for predicting HIV protease cleavage sites in proteins. Due to the complex relationships in the biological data, several recent works show that often ensembles of learning algorithms outperform stand-alone methods. We show that the fusion of approaches based on different encoding models can be useful for improving the performance of this classification problem. In particular, in this work four different feature encodings for peptides are described and tested. An extensive evaluation on a large dataset according to a blind testing protocol is reported which demonstrates how different feature extraction methods and classifiers can be combined for obtaining a robust and reliable system. The comparison with other stand-alone approaches allows quantifying the performance improvement obtained by the ensembles proposed in this work.
Amozegar, M; Khorasani, K
2016-04-01
In this paper, a new approach for Fault Detection and Isolation (FDI) of gas turbine engines is proposed by developing an ensemble of dynamic neural network identifiers. For health monitoring of the gas turbine engine, its dynamics is first identified by constructing three separate or individual dynamic neural network architectures. Specifically, a dynamic multi-layer perceptron (MLP), a dynamic radial-basis function (RBF) neural network, and a dynamic support vector machine (SVM) are trained to individually identify and represent the gas turbine engine dynamics. Next, three ensemble-based techniques are developed to represent the gas turbine engine dynamics, namely, two heterogeneous ensemble models and one homogeneous ensemble model. It is first shown that all ensemble approaches do significantly improve the overall performance and accuracy of the developed system identification scheme when compared to each of the stand-alone solutions. The best selected stand-alone model (i.e., the dynamic RBF network) and the best selected ensemble architecture (i.e., the heterogeneous ensemble) in terms of their performances in achieving an accurate system identification are then selected for solving the FDI task. The required residual signals are generated by using both a single model-based solution and an ensemble-based solution under various gas turbine engine health conditions. Our extensive simulation studies demonstrate that the fault detection and isolation task achieved by using the residuals that are obtained from the dynamic ensemble scheme results in a significantly more accurate and reliable performance as illustrated through detailed quantitative confusion matrix analysis and comparative studies. Copyright © 2016 Elsevier Ltd. All rights reserved.
Sultan, Mohammad M; Kiss, Gert; Shukla, Diwakar; Pande, Vijay S
2014-12-09
Given the large number of crystal structures and NMR ensembles that have been solved to date, classical molecular dynamics (MD) simulations have become powerful tools in the atomistic study of the kinetics and thermodynamics of biomolecular systems on ever increasing time scales. By virtue of the high-dimensional conformational state space that is explored, the interpretation of large-scale simulations faces difficulties not unlike those in the big data community. We address this challenge by introducing a method called clustering based feature selection (CB-FS) that employs a posterior analysis approach. It combines supervised machine learning (SML) and feature selection with Markov state models to automatically identify the relevant degrees of freedom that separate conformational states. We highlight the utility of the method in the evaluation of large-scale simulations and show that it can be used for the rapid and automated identification of relevant order parameters involved in the functional transitions of two exemplary cell-signaling proteins central to human disease states.
Identifying the optimal segmentors for mass classification in mammograms
NASA Astrophysics Data System (ADS)
Zhang, Yu; Tomuro, Noriko; Furst, Jacob; Raicu, Daniela S.
2015-03-01
In this paper, we present the results of our investigation on identifying the optimal segmentor(s) from an ensemble of weak segmentors, used in a Computer-Aided Diagnosis (CADx) system which classifies suspicious masses in mammograms as benign or malignant. This is an extension of our previous work, where we used various parameter settings of image enhancement techniques to each suspicious mass (region of interest (ROI)) to obtain several enhanced images, then applied segmentation to each image to obtain several contours of a given mass. Each segmentation in this ensemble is essentially a "weak segmentor" because no single segmentation can produce the optimal result for all images. Then after shape features are computed from the segmented contours, the final classification model was built using logistic regression. The work in this paper focuses on identifying the optimal segmentor(s) from an ensemble mix of weak segmentors. For our purpose, optimal segmentors are those in the ensemble mix which contribute the most to the overall classification rather than the ones that produced high precision segmentation. To measure the segmentors' contribution, we examined weights on the features in the derived logistic regression model and computed the average feature weight for each segmentor. The result showed that, while in general the segmentors with higher segmentation success rates had higher feature weights, some segmentors with lower segmentation rates had high classification feature weights as well.
Multi-Conformer Ensemble Docking to Difficult Protein Targets
Ellingson, Sally R.; Miao, Yinglong; Baudry, Jerome; ...
2014-09-08
We investigate large-scale ensemble docking using five proteins from the Directory of Useful Decoys (DUD, dud.docking.org) for which docking to crystal structures has proven difficult. Molecular dynamics trajectories are produced for each protein and an ensemble of representative conformational structures extracted from the trajectories. Docking calculations are performed on these selected simulation structures and ensemble-based enrichment factors compared with those obtained using docking in crystal structures of the same protein targets or random selection of compounds. We also found simulation-derived snapshots with improved enrichment factors that increased the chemical diversity of docking hits for four of the five selected proteins.more » A combination of all the docking results obtained from molecular dynamics simulation followed by selection of top-ranking compounds appears to be an effective strategy for increasing the number and diversity of hits when using docking to screen large libraries of chemicals against difficult protein targets.« less
The emergence of collective phenomena in systems with random interactions
NASA Astrophysics Data System (ADS)
Abramkina, Volha
Emergent phenomena are one of the most profound topics in modern science, addressing the ways that collectivities and complex patterns appear due to multiplicity of components and simple interactions. Ensembles of random Hamiltonians allow one to explore emergent phenomena in a statistical way. In this work we adopt a shell model approach with a two-body interaction Hamiltonian. The sets of the two-body interaction strengths are selected at random, resulting in the two-body random ensemble (TBRE). Symmetries such as angular momentum, isospin, and parity entangled with complex many-body dynamics result in surprising order discovered in the spectrum of low-lying excitations. The statistical patterns exhibited in the TBRE are remarkably similar to those observed in real nuclei. Signs of almost every collective feature seen in nuclei, namely, pairing superconductivity, deformation, and vibration, have been observed in random ensembles [3, 4, 5, 6]. In what follows a systematic investigation of nuclear shape collectivities in random ensembles is conducted. The development of the mean field, its geometry, multipole collectivities and their dependence on the underlying two-body interaction are explored. Apart from the role of static symmetries such as SU(2) angular momentum and isospin groups, the emergence of dynamical symmetries including the seniority SU(2), rotational symmetry, as well as the Elliot SU(3) is shown to be an important precursor for the existence of geometric collectivities.
Haberman, Jason; Brady, Timothy F; Alvarez, George A
2015-04-01
Ensemble perception, including the ability to "see the average" from a group of items, operates in numerous feature domains (size, orientation, speed, facial expression, etc.). Although the ubiquity of ensemble representations is well established, the large-scale cognitive architecture of this process remains poorly defined. We address this using an individual differences approach. In a series of experiments, observers saw groups of objects and reported either a single item from the group or the average of the entire group. High-level ensemble representations (e.g., average facial expression) showed complete independence from low-level ensemble representations (e.g., average orientation). In contrast, low-level ensemble representations (e.g., orientation and color) were correlated with each other, but not with high-level ensemble representations (e.g., facial expression and person identity). These results suggest that there is not a single domain-general ensemble mechanism, and that the relationship among various ensemble representations depends on how proximal they are in representational space. (c) 2015 APA, all rights reserved).
Constructing better classifier ensemble based on weighted accuracy and diversity measure.
Zeng, Xiaodong; Wong, Derek F; Chao, Lidia S
2014-01-01
A weighted accuracy and diversity (WAD) method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In fact, accuracy and diversity are mutual restraint factors; that is, an ensemble with high accuracy may have low diversity, and an overly diverse ensemble may negatively affect accuracy. This study proposes a method to find the balance between accuracy and diversity that enhances the predictive ability of an ensemble for unknown data. The quality assessment for an ensemble is performed such that the final score is achieved by computing the harmonic mean of accuracy and diversity, where two weight parameters are used to balance them. The measure is compared to two representative measures, Kappa-Error and GenDiv, and two threshold measures that consider only accuracy or diversity, with two heuristic search algorithms, genetic algorithm, and forward hill-climbing algorithm, in ensemble selection tasks performed on 15 UCI benchmark datasets. The empirical results demonstrate that the WAD measure is superior to others in most cases.
Constructing Better Classifier Ensemble Based on Weighted Accuracy and Diversity Measure
Chao, Lidia S.
2014-01-01
A weighted accuracy and diversity (WAD) method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In fact, accuracy and diversity are mutual restraint factors; that is, an ensemble with high accuracy may have low diversity, and an overly diverse ensemble may negatively affect accuracy. This study proposes a method to find the balance between accuracy and diversity that enhances the predictive ability of an ensemble for unknown data. The quality assessment for an ensemble is performed such that the final score is achieved by computing the harmonic mean of accuracy and diversity, where two weight parameters are used to balance them. The measure is compared to two representative measures, Kappa-Error and GenDiv, and two threshold measures that consider only accuracy or diversity, with two heuristic search algorithms, genetic algorithm, and forward hill-climbing algorithm, in ensemble selection tasks performed on 15 UCI benchmark datasets. The empirical results demonstrate that the WAD measure is superior to others in most cases. PMID:24672402
NASA Astrophysics Data System (ADS)
Millar, R.; Ingram, W.; Allen, M. R.; Lowe, J.
2013-12-01
Temperature and precipitation patterns are the climate variables with the greatest impacts on both natural and human systems. Due to the small spatial scales and the many interactions involved in the global hydrological cycle, in general circulation models (GCMs) representations of precipitation changes are subject to considerable uncertainty. Quantifying and understanding the causes of uncertainty (and identifying robust features of predictions) in both global and local precipitation change is an essential challenge of climate science. We have used the huge distributed computing capacity of the climateprediction.net citizen science project to examine parametric uncertainty in an ensemble of 20,000 perturbed-physics versions of the HadCM3 general circulation model. The ensemble has been selected to have a control climate in top-of-atmosphere energy balance [Yamazaki et al. 2013, J.G.R.]. We force this ensemble with several idealised climate-forcing scenarios including carbon dioxide step and transient profiles, solar radiation management geoengineering experiments with stratospheric aerosols, and short-lived climate forcing agents. We will present the results from several of these forcing scenarios under GCM parametric uncertainty. We examine the global mean precipitation energy budget to understand the robustness of a simple non-linear global precipitation model [Good et al. 2012, Clim. Dyn.] as a better explanation of precipitation changes in transient climate projections under GCM parametric uncertainty than a simple linear tropospheric energy balance model. We will also present work investigating robust conclusions about precipitation changes in a balanced ensemble of idealised solar radiation management scenarios [Kravitz et al. 2011, Atmos. Sci. Let.].
NASA Astrophysics Data System (ADS)
Durner, Maximilian; Márton, Zoltán.; Hillenbrand, Ulrich; Ali, Haider; Kleinsteuber, Martin
2017-03-01
In this work, a new ensemble method for the task of category recognition in different environments is presented. The focus is on service robotic perception in an open environment, where the robot's task is to recognize previously unseen objects of predefined categories, based on training on a public dataset. We propose an ensemble learning approach to be able to flexibly combine complementary sources of information (different state-of-the-art descriptors computed on color and depth images), based on a Markov Random Field (MRF). By exploiting its specific characteristics, the MRF ensemble method can also be executed as a Dynamic Classifier Selection (DCS) system. In the experiments, the committee- and topology-dependent performance boost of our ensemble is shown. Despite reduced computational costs and using less information, our strategy performs on the same level as common ensemble approaches. Finally, the impact of large differences between datasets is analyzed.
Ali, Safdar; Majid, Abdul; Javed, Syed Gibran; Sattar, Mohsin
2016-06-01
Early prediction of breast cancer is important for effective treatment and survival. We developed an effective Cost-Sensitive Classifier with GentleBoost Ensemble (Can-CSC-GBE) for the classification of breast cancer using protein amino acid features. In this work, first, discriminant information of the protein sequences related to breast tissue is extracted. Then, the physicochemical properties hydrophobicity and hydrophilicity of amino acids are employed to generate molecule descriptors in different feature spaces. For comparison, we obtained results by combining Cost-Sensitive learning with conventional ensemble of AdaBoostM1 and Bagging. The proposed Can-CSC-GBE system has effectively reduced the misclassification costs and thereby improved the overall classification performance. Our novel approach has highlighted promising results as compared to the state-of-the-art ensemble approaches. Copyright © 2016 Elsevier Ltd. All rights reserved.
Perceived Average Orientation Reflects Effective Gist of the Surface.
Cha, Oakyoon; Chong, Sang Chul
2018-03-01
The human ability to represent ensemble visual information, such as average orientation and size, has been suggested as the foundation of gist perception. To effectively summarize different groups of objects into the gist of a scene, observers should form ensembles separately for different groups, even when objects have similar visual features across groups. We hypothesized that the visual system utilizes perceptual groups characterized by spatial configuration and represents separate ensembles for different groups. Therefore, participants could not integrate ensembles of different perceptual groups on a task basis. We asked participants to determine the average orientation of visual elements comprising a surface with a contour situated inside. Although participants were asked to estimate the average orientation of all the elements, they ignored orientation signals embedded in the contour. This constraint may help the visual system to keep the visual features of occluding objects separate from those of the occluded objects.
NASA Astrophysics Data System (ADS)
Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.
2017-12-01
Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.
A Machine Learning Ensemble Classifier for Early Prediction of Diabetic Retinopathy.
S K, Somasundaram; P, Alli
2017-11-09
The main complication of diabetes is Diabetic retinopathy (DR), retinal vascular disease and it leads to the blindness. Regular screening for early DR disease detection is considered as an intensive labor and resource oriented task. Therefore, automatic detection of DR diseases is performed only by using the computational technique is the great solution. An automatic method is more reliable to determine the presence of an abnormality in Fundus images (FI) but, the classification process is poorly performed. Recently, few research works have been designed for analyzing texture discrimination capacity in FI to distinguish the healthy images. However, the feature extraction (FE) process was not performed well, due to the high dimensionality. Therefore, to identify retinal features for DR disease diagnosis and early detection using Machine Learning and Ensemble Classification method, called, Machine Learning Bagging Ensemble Classifier (ML-BEC) is designed. The ML-BEC method comprises of two stages. The first stage in ML-BEC method comprises extraction of the candidate objects from Retinal Images (RI). The candidate objects or the features for DR disease diagnosis include blood vessels, optic nerve, neural tissue, neuroretinal rim, optic disc size, thickness and variance. These features are initially extracted by applying Machine Learning technique called, t-distributed Stochastic Neighbor Embedding (t-SNE). Besides, t-SNE generates a probability distribution across high-dimensional images where the images are separated into similar and dissimilar pairs. Then, t-SNE describes a similar probability distribution across the points in the low-dimensional map. This lessens the Kullback-Leibler divergence among two distributions regarding the locations of the points on the map. The second stage comprises of application of ensemble classifiers to the extracted features for providing accurate analysis of digital FI using machine learning. In this stage, an automatic detection of DR screening system using Bagging Ensemble Classifier (BEC) is investigated. With the help of voting the process in ML-BEC, bagging minimizes the error due to variance of the base classifier. With the publicly available retinal image databases, our classifier is trained with 25% of RI. Results show that the ensemble classifier can achieve better classification accuracy (CA) than single classification models. Empirical experiments suggest that the machine learning-based ensemble classifier is efficient for further reducing DR classification time (CT).
Van Landeghem, Sofie; Abeel, Thomas; Saeys, Yvan; Van de Peer, Yves
2010-09-15
In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results. We show that our FS methodology successfully discards a large fraction of machine-generated features, improving classification performance of state-of-the-art text mining algorithms. Furthermore, we illustrate how FS can be applied to gain understanding in the predictions of a framework for biomolecular event extraction from text. We include numerous examples of highly discriminative features that model either biological reality or common linguistic constructs. Finally, we discuss a number of insights from our FS analyses that will provide the opportunity to considerably improve upon current text mining tools. The FS algorithms and classifiers are available in Java-ML (http://java-ml.sf.net). The datasets are publicly available from the BioNLP'09 Shared Task web site (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/).
Advanced ensemble modelling of flexible macromolecules using X-ray solution scattering.
Tria, Giancarlo; Mertens, Haydyn D T; Kachala, Michael; Svergun, Dmitri I
2015-03-01
Dynamic ensembles of macromolecules mediate essential processes in biology. Understanding the mechanisms driving the function and molecular interactions of 'unstructured' and flexible molecules requires alternative approaches to those traditionally employed in structural biology. Small-angle X-ray scattering (SAXS) is an established method for structural characterization of biological macromolecules in solution, and is directly applicable to the study of flexible systems such as intrinsically disordered proteins and multi-domain proteins with unstructured regions. The Ensemble Optimization Method (EOM) [Bernadó et al. (2007 ▶). J. Am. Chem. Soc. 129, 5656-5664] was the first approach introducing the concept of ensemble fitting of the SAXS data from flexible systems. In this approach, a large pool of macromolecules covering the available conformational space is generated and a sub-ensemble of conformers coexisting in solution is selected guided by the fit to the experimental SAXS data. This paper presents a series of new developments and advancements to the method, including significantly enhanced functionality and also quantitative metrics for the characterization of the results. Building on the original concept of ensemble optimization, the algorithms for pool generation have been redesigned to allow for the construction of partially or completely symmetric oligomeric models, and the selection procedure was improved to refine the size of the ensemble. Quantitative measures of the flexibility of the system studied, based on the characteristic integral parameters of the selected ensemble, are introduced. These improvements are implemented in the new EOM version 2.0, and the capabilities as well as inherent limitations of the ensemble approach in SAXS, and of EOM 2.0 in particular, are discussed.
Protein binding hot spots prediction from sequence only by a new ensemble learning method.
Hu, Shan-Shan; Chen, Peng; Wang, Bing; Li, Jinyan
2017-10-01
Hot spots are interfacial core areas of binding proteins, which have been applied as targets in drug design. Experimental methods are costly in both time and expense to locate hot spot areas. Recently, in-silicon computational methods have been widely used for hot spot prediction through sequence or structure characterization. As the structural information of proteins is not always solved, and thus hot spot identification from amino acid sequences only is more useful for real-life applications. This work proposes a new sequence-based model that combines physicochemical features with the relative accessible surface area of amino acid sequences for hot spot prediction. The model consists of 83 classifiers involving the IBk (Instance-based k means) algorithm, where instances are encoded by important properties extracted from a total of 544 properties in the AAindex1 (Amino Acid Index) database. Then top-performance classifiers are selected to form an ensemble by a majority voting technique. The ensemble classifier outperforms the state-of-the-art computational methods, yielding an F1 score of 0.80 on the benchmark binding interface database (BID) test set. http://www2.ahu.edu.cn/pchen/web/HotspotEC.htm .
Revealing the distinct folding phases of an RNA three-helix junction.
Plumridge, Alex; Katz, Andrea M; Calvey, George D; Elber, Ron; Kirmizialtin, Serdal; Pollack, Lois
2018-05-14
Remarkable new insight has emerged into the biological role of RNA in cells. RNA folding and dynamics enable many of these newly discovered functions, calling for an understanding of RNA self-assembly and conformational dynamics. Because RNAs pass through multiple structures as they fold, an ensemble perspective is required to visualize the flow through fleetingly populated sets of states. Here, we combine microfluidic mixing technology and small angle X-ray scattering (SAXS) to measure the Mg-induced folding of a small RNA domain, the tP5abc three helix junction. Our measurements are interpreted using ensemble optimization to select atomically detailed structures that recapitulate each experimental curve. Structural ensembles, derived at key stages in both time-resolved studies and equilibrium titrations, reproduce the features of known intermediates, and more importantly, offer a powerful new structural perspective on the time-progression of folding. Distinct collapse phases along the pathway appear to be orchestrated by specific interactions with Mg ions. These key interactions subsequently direct motions of the backbone that position the partners of tertiary contacts for later bonding, and demonstrate a remarkable synergy between Mg and RNA across numerous time-scales.
ERIC Educational Resources Information Center
Morford, James B.
2007-01-01
Music teachers are often influenced by pedagogical practices in the collegiate ensembles in which they performed. Opportunities to participate in collegiate world music ensembles have increased in recent decades; West African ensembles and steel bands represent the second and third most common of these in the United States. The absence of…
Monthly ENSO Forecast Skill and Lagged Ensemble Size
DelSole, T.; Tippett, M.K.; Pegion, K.
2018-01-01
Abstract The mean square error (MSE) of a lagged ensemble of monthly forecasts of the Niño 3.4 index from the Climate Forecast System (CFSv2) is examined with respect to ensemble size and configuration. Although the real‐time forecast is initialized 4 times per day, it is possible to infer the MSE for arbitrary initialization frequency and for burst ensembles by fitting error covariances to a parametric model and then extrapolating to arbitrary ensemble size and initialization frequency. Applying this method to real‐time forecasts, we find that the MSE consistently reaches a minimum for a lagged ensemble size between one and eight days, when four initializations per day are included. This ensemble size is consistent with the 8–10 day lagged ensemble configuration used operationally. Interestingly, the skill of both ensemble configurations is close to the estimated skill of the infinite ensemble. The skill of the weighted, lagged, and burst ensembles are found to be comparable. Certain unphysical features of the estimated error growth were tracked down to problems with the climatology and data discontinuities. PMID:29937973
Monthly ENSO Forecast Skill and Lagged Ensemble Size
NASA Astrophysics Data System (ADS)
Trenary, L.; DelSole, T.; Tippett, M. K.; Pegion, K.
2018-04-01
The mean square error (MSE) of a lagged ensemble of monthly forecasts of the Niño 3.4 index from the Climate Forecast System (CFSv2) is examined with respect to ensemble size and configuration. Although the real-time forecast is initialized 4 times per day, it is possible to infer the MSE for arbitrary initialization frequency and for burst ensembles by fitting error covariances to a parametric model and then extrapolating to arbitrary ensemble size and initialization frequency. Applying this method to real-time forecasts, we find that the MSE consistently reaches a minimum for a lagged ensemble size between one and eight days, when four initializations per day are included. This ensemble size is consistent with the 8-10 day lagged ensemble configuration used operationally. Interestingly, the skill of both ensemble configurations is close to the estimated skill of the infinite ensemble. The skill of the weighted, lagged, and burst ensembles are found to be comparable. Certain unphysical features of the estimated error growth were tracked down to problems with the climatology and data discontinuities.
Computer aided diagnosis based on medical image processing and artificial intelligence methods
NASA Astrophysics Data System (ADS)
Stoitsis, John; Valavanis, Ioannis; Mougiakakou, Stavroula G.; Golemati, Spyretta; Nikita, Alexandra; Nikita, Konstantina S.
2006-12-01
Advances in imaging technology and computer science have greatly enhanced interpretation of medical images, and contributed to early diagnosis. The typical architecture of a Computer Aided Diagnosis (CAD) system includes image pre-processing, definition of region(s) of interest, features extraction and selection, and classification. In this paper, the principles of CAD systems design and development are demonstrated by means of two examples. The first one focuses on the differentiation between symptomatic and asymptomatic carotid atheromatous plaques. For each plaque, a vector of texture and motion features was estimated, which was then reduced to the most robust ones by means of ANalysis of VAriance (ANOVA). Using fuzzy c-means, the features were then clustered into two classes. Clustering performances of 74%, 79%, and 84% were achieved for texture only, motion only, and combinations of texture and motion features, respectively. The second CAD system presented in this paper supports the diagnosis of focal liver lesions and is able to characterize liver tissue from Computed Tomography (CT) images as normal, hepatic cyst, hemangioma, and hepatocellular carcinoma. Five texture feature sets were extracted for each lesion, while a genetic algorithm based feature selection method was applied to identify the most robust features. The selected feature set was fed into an ensemble of neural network classifiers. The achieved classification performance was 100%, 93.75% and 90.63% in the training, validation and testing set, respectively. It is concluded that computerized analysis of medical images in combination with artificial intelligence can be used in clinical practice and may contribute to more efficient diagnosis.
Exploiting ensemble learning for automatic cataract detection and grading.
Yang, Ji-Jiang; Li, Jianqiang; Shen, Ruifang; Zeng, Yang; He, Jian; Bi, Jing; Li, Yong; Zhang, Qinyan; Peng, Lihui; Wang, Qing
2016-02-01
Cataract is defined as a lenticular opacity presenting usually with poor visual acuity. It is one of the most common causes of visual impairment worldwide. Early diagnosis demands the expertise of trained healthcare professionals, which may present a barrier to early intervention due to underlying costs. To date, studies reported in the literature utilize a single learning model for retinal image classification in grading cataract severity. We present an ensemble learning based approach as a means to improving diagnostic accuracy. Three independent feature sets, i.e., wavelet-, sketch-, and texture-based features, are extracted from each fundus image. For each feature set, two base learning models, i.e., Support Vector Machine and Back Propagation Neural Network, are built. Then, the ensemble methods, majority voting and stacking, are investigated to combine the multiple base learning models for final fundus image classification. Empirical experiments are conducted for cataract detection (two-class task, i.e., cataract or non-cataractous) and cataract grading (four-class task, i.e., non-cataractous, mild, moderate or severe) tasks. The best performance of the ensemble classifier is 93.2% and 84.5% in terms of the correct classification rates for cataract detection and grading tasks, respectively. The results demonstrate that the ensemble classifier outperforms the single learning model significantly, which also illustrates the effectiveness of the proposed approach. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Andrabi, Munazah; Hutchins, Andrew Paul; Miranda-Saavedra, Diego; Kono, Hidetoshi; Nussinov, Ruth; Mizuguchi, Kenji; Ahmad, Shandar
2017-06-22
DNA shape is emerging as an important determinant of transcription factor binding beyond just the DNA sequence. The only tool for large scale DNA shape estimates, DNAshape was derived from Monte-Carlo simulations and predicts four broad and static DNA shape features, Propeller twist, Helical twist, Minor groove width and Roll. The contributions of other shape features e.g. Shift, Slide and Opening cannot be evaluated using DNAshape. Here, we report a novel method DynaSeq, which predicts molecular dynamics-derived ensembles of a more exhaustive set of DNA shape features. We compared the DNAshape and DynaSeq predictions for the common features and applied both to predict the genome-wide binding sites of 1312 TFs available from protein interaction quantification (PIQ) data. The results indicate a good agreement between the two methods for the common shape features and point to advantages in using DynaSeq. Predictive models employing ensembles from individual conformational parameters revealed that base-pair opening - known to be important in strand separation - was the best predictor of transcription factor-binding sites (TFBS) followed by features employed by DNAshape. Of note, TFBS could be predicted not only from the features at the target motif sites, but also from those as far as 200 nucleotides away from the motif.
Piao, Yongjun; Piao, Minghao; Ryu, Keun Ho
2017-01-01
Cancer classification has been a crucial topic of research in cancer treatment. In the last decade, messenger RNA (mRNA) expression profiles have been widely used to classify different types of cancers. With the discovery of a new class of small non-coding RNAs; known as microRNAs (miRNAs), various studies have shown that the expression patterns of miRNA can also accurately classify human cancers. Therefore, there is a great demand for the development of machine learning approaches to accurately classify various types of cancers using miRNA expression data. In this article, we propose a feature subset-based ensemble method in which each model is learned from a different projection of the original feature space to classify multiple cancers. In our method, the feature relevance and redundancy are considered to generate multiple feature subsets, the base classifiers are learned from each independent miRNA subset, and the average posterior probability is used to combine the base classifiers. To test the performance of our method, we used bead-based and sequence-based miRNA expression datasets and conducted 10-fold and leave-one-out cross validations. The experimental results show that the proposed method yields good results and has higher prediction accuracy than popular ensemble methods. The Java program and source code of the proposed method and the datasets in the experiments are freely available at https://sourceforge.net/projects/mirna-ensemble/. Copyright © 2016 Elsevier Ltd. All rights reserved.
Device and Method for Gathering Ensemble Data Sets
NASA Technical Reports Server (NTRS)
Racette, Paul E. (Inventor)
2014-01-01
An ensemble detector uses calibrated noise references to produce ensemble sets of data from which properties of non-stationary processes may be extracted. The ensemble detector comprising: a receiver; a switching device coupled to the receiver, the switching device configured to selectively connect each of a plurality of reference noise signals to the receiver; and a gain modulation circuit coupled to the receiver and configured to vary a gain of the receiver based on a forcing signal; whereby the switching device selectively connects each of the plurality of reference noise signals to the receiver to produce an output signal derived from the plurality of reference noise signals and the forcing signal.
Visualizing Confidence in Cluster-Based Ensemble Weather Forecast Analyses.
Kumpf, Alexander; Tost, Bianca; Baumgart, Marlene; Riemer, Michael; Westermann, Rudiger; Rautenhaus, Marc
2018-01-01
In meteorology, cluster analysis is frequently used to determine representative trends in ensemble weather predictions in a selected spatio-temporal region, e.g., to reduce a set of ensemble members to simplify and improve their analysis. Identified clusters (i.e., groups of similar members), however, can be very sensitive to small changes of the selected region, so that clustering results can be misleading and bias subsequent analyses. In this article, we - a team of visualization scientists and meteorologists-deliver visual analytics solutions to analyze the sensitivity of clustering results with respect to changes of a selected region. We propose an interactive visual interface that enables simultaneous visualization of a) the variation in composition of identified clusters (i.e., their robustness), b) the variability in cluster membership for individual ensemble members, and c) the uncertainty in the spatial locations of identified trends. We demonstrate that our solution shows meteorologists how representative a clustering result is, and with respect to which changes in the selected region it becomes unstable. Furthermore, our solution helps to identify those ensemble members which stably belong to a given cluster and can thus be considered similar. In a real-world application case we show how our approach is used to analyze the clustering behavior of different regions in a forecast of "Tropical Cyclone Karl", guiding the user towards the cluster robustness information required for subsequent ensemble analysis.
Monte Carlo replica-exchange based ensemble docking of protein conformations.
Zhang, Zhe; Ehmann, Uwe; Zacharias, Martin
2017-05-01
A replica-exchange Monte Carlo (REMC) ensemble docking approach has been developed that allows efficient exploration of protein-protein docking geometries. In addition to Monte Carlo steps in translation and orientation of binding partners, possible conformational changes upon binding are included based on Monte Carlo selection of protein conformations stored as ordered pregenerated conformational ensembles. The conformational ensembles of each binding partner protein were generated by three different approaches starting from the unbound partner protein structure with a range spanning a root mean square deviation of 1-2.5 Å with respect to the unbound structure. Because MC sampling is performed to select appropriate partner conformations on the fly the approach is not limited by the number of conformations in the ensemble compared to ensemble docking of each conformer pair in ensemble cross docking. Although only a fraction of generated conformers was in closer agreement with the bound structure the REMC ensemble docking approach achieved improved docking results compared to REMC docking with only the unbound partner structures or using docking energy minimization methods. The approach has significant potential for further improvement in combination with more realistic structural ensembles and better docking scoring functions. Proteins 2017; 85:924-937. © 2016 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
ERIC Educational Resources Information Center
Jones, Sara K.
2018-01-01
The purpose of this comparative case study was to examine the motivation for participation in traditional and non-traditional vocal ensembles by students who are not pursuing a career in music and the perceived benefits of this participation. Participants were selected from a traditional mixed choral ensemble and a student-run a cappella ensemble.…
Benkert, Pascal; Schwede, Torsten; Tosatto, Silvio Ce
2009-05-20
The selection of the most accurate protein model from a set of alternatives is a crucial step in protein structure prediction both in template-based and ab initio approaches. Scoring functions have been developed which can either return a quality estimate for a single model or derive a score from the information contained in the ensemble of models for a given sequence. Local structural features occurring more frequently in the ensemble have a greater probability of being correct. Within the context of the CASP experiment, these so called consensus methods have been shown to perform considerably better in selecting good candidate models, but tend to fail if the best models are far from the dominant structural cluster. In this paper we show that model selection can be improved if both approaches are combined by pre-filtering the models used during the calculation of the structural consensus. Our recently published QMEAN composite scoring function has been improved by including an all-atom interaction potential term. The preliminary model ranking based on the new QMEAN score is used to select a subset of reliable models against which the structural consensus score is calculated. This scoring function called QMEANclust achieves a correlation coefficient of predicted quality score and GDT_TS of 0.9 averaged over the 98 CASP7 targets and perform significantly better in selecting good models from the ensemble of server models than any other groups participating in the quality estimation category of CASP7. Both scoring functions are also benchmarked on the MOULDER test set consisting of 20 target proteins each with 300 alternatives models generated by MODELLER. QMEAN outperforms all other tested scoring functions operating on individual models, while the consensus method QMEANclust only works properly on decoy sets containing a certain fraction of near-native conformations. We also present a local version of QMEAN for the per-residue estimation of model quality (QMEANlocal) and compare it to a new local consensus-based approach. Improved model selection is obtained by using a composite scoring function operating on single models in order to enrich higher quality models which are subsequently used to calculate the structural consensus. The performance of consensus-based methods such as QMEANclust highly depends on the composition and quality of the model ensemble to be analysed. Therefore, performance estimates for consensus methods based on large meta-datasets (e.g. CASP) might overrate their applicability in more realistic modelling situations with smaller sets of models based on individual methods.
Ensemble data assimilation in the Red Sea: sensitivity to ensemble selection and atmospheric forcing
NASA Astrophysics Data System (ADS)
Toye, Habib; Zhan, Peng; Gopalakrishnan, Ganesh; Kartadikaria, Aditya R.; Huang, Huang; Knio, Omar; Hoteit, Ibrahim
2017-07-01
We present our efforts to build an ensemble data assimilation and forecasting system for the Red Sea. The system consists of the high-resolution Massachusetts Institute of Technology general circulation model (MITgcm) to simulate ocean circulation and of the Data Research Testbed (DART) for ensemble data assimilation. DART has been configured to integrate all members of an ensemble adjustment Kalman filter (EAKF) in parallel, based on which we adapted the ensemble operations in DART to use an invariant ensemble, i.e., an ensemble Optimal Interpolation (EnOI) algorithm. This approach requires only single forward model integration in the forecast step and therefore saves substantial computational cost. To deal with the strong seasonal variability of the Red Sea, the EnOI ensemble is then seasonally selected from a climatology of long-term model outputs. Observations of remote sensing sea surface height (SSH) and sea surface temperature (SST) are assimilated every 3 days. Real-time atmospheric fields from the National Center for Environmental Prediction (NCEP) and the European Center for Medium-Range Weather Forecasts (ECMWF) are used as forcing in different assimilation experiments. We investigate the behaviors of the EAKF and (seasonal-) EnOI and compare their performances for assimilating and forecasting the circulation of the Red Sea. We further assess the sensitivity of the assimilation system to various filtering parameters (ensemble size, inflation) and atmospheric forcing.
NASA Astrophysics Data System (ADS)
Federico, S.; Avolio, E.; Bellecci, C.; Colacino, M.; Walko, R. L.
2006-03-01
This paper reports preliminary results for a Limited area model Ensemble Prediction System (LEPS), based on RAMS (Regional Atmospheric Modelling System), for eight case studies of moderate-intense precipitation over Calabria, the southernmost tip of the Italian peninsula. LEPS aims to transfer the benefits of a probabilistic forecast from global to regional scales in countries where local orographic forcing is a key factor to force convection. To accomplish this task and to limit computational time in an operational implementation of LEPS, we perform a cluster analysis of ECMWF-EPS runs. Starting from the 51 members that form the ECMWF-EPS we generate five clusters. For each cluster a representative member is selected and used to provide initial and dynamic boundary conditions to RAMS, whose integrations generate LEPS. RAMS runs have 12-km horizontal resolution. To analyze the impact of enhanced horizontal resolution on quantitative precipitation forecasts, LEPS forecasts are compared to a full Brute Force (BF) ensemble. This ensemble is based on RAMS, has 36 km horizontal resolution and is generated by 51 members, nested in each ECMWF-EPS member. LEPS and BF results are compared subjectively and by objective scores. Subjective analysis is based on precipitation and probability maps of case studies whereas objective analysis is made by deterministic and probabilistic scores. Scores and maps are calculated by comparing ensemble precipitation forecasts against reports from the Calabria regional raingauge network. Results show that LEPS provided better rainfall predictions than BF for all case studies selected. This strongly suggests the importance of the enhanced horizontal resolution, compared to ensemble population, for Calabria for these cases. To further explore the impact of local physiographic features on QPF (Quantitative Precipitation Forecasting), LEPS results are also compared with a 6-km horizontal resolution deterministic forecast. Due to local and mesoscale forcing, the high resolution forecast (Hi-Res) has better performance compared to the ensemble mean for rainfall thresholds larger than 10mm but it tends to overestimate precipitation for lower amounts. This yields larger false alarms that have a detrimental effect on objective scores for lower thresholds. To exploit the advantages of a probabilistic forecast compared to a deterministic one, the relation between the ECMWF-EPS 700 hPa geopotential height spread and LEPS performance is analyzed. Results are promising even if additional studies are required.
Bennet, Jaison; Ganaprakasam, Chilambuchelvan Arul; Arputharaj, Kannan
2014-01-01
Cancer classification by doctors and radiologists was based on morphological and clinical features and had limited diagnostic ability in olden days. The recent arrival of DNA microarray technology has led to the concurrent monitoring of thousands of gene expressions in a single chip which stimulates the progress in cancer classification. In this paper, we have proposed a hybrid approach for microarray data classification based on nearest neighbor (KNN), naive Bayes, and support vector machine (SVM). Feature selection prior to classification plays a vital role and a feature selection technique which combines discrete wavelet transform (DWT) and moving window technique (MWT) is used. The performance of the proposed method is compared with the conventional classifiers like support vector machine, nearest neighbor, and naive Bayes. Experiments have been conducted on both real and benchmark datasets and the results indicate that the ensemble approach produces higher classification accuracy than conventional classifiers. This paper serves as an automated system for the classification of cancer and can be applied by doctors in real cases which serve as a boon to the medical community. This work further reduces the misclassification of cancers which is highly not allowed in cancer detection.
Statistical Analysis of Protein Ensembles
NASA Astrophysics Data System (ADS)
Máté, Gabriell; Heermann, Dieter
2014-04-01
As 3D protein-configuration data is piling up, there is an ever-increasing need for well-defined, mathematically rigorous analysis approaches, especially that the vast majority of the currently available methods rely heavily on heuristics. We propose an analysis framework which stems from topology, the field of mathematics which studies properties preserved under continuous deformations. First, we calculate a barcode representation of the molecules employing computational topology algorithms. Bars in this barcode represent different topological features. Molecules are compared through their barcodes by statistically determining the difference in the set of their topological features. As a proof-of-principle application, we analyze a dataset compiled of ensembles of different proteins, obtained from the Ensemble Protein Database. We demonstrate that our approach correctly detects the different protein groupings.
Akbar, Shahid; Hayat, Maqsood; Iqbal, Muhammad; Jan, Mian Ahmad
2017-06-01
Cancer is a fatal disease, responsible for one-quarter of all deaths in developed countries. Traditional anticancer therapies such as, chemotherapy and radiation, are highly expensive, susceptible to errors and ineffective techniques. These conventional techniques induce severe side-effects on human cells. Due to perilous impact of cancer, the development of an accurate and highly efficient intelligent computational model is desirable for identification of anticancer peptides. In this paper, evolutionary intelligent genetic algorithm-based ensemble model, 'iACP-GAEnsC', is proposed for the identification of anticancer peptides. In this model, the protein sequences are formulated, using three different discrete feature representation methods, i.e., amphiphilic Pseudo amino acid composition, g-Gap dipeptide composition, and Reduce amino acid alphabet composition. The performance of the extracted feature spaces are investigated separately and then merged to exhibit the significance of hybridization. In addition, the predicted results of individual classifiers are combined together, using optimized genetic algorithm and simple majority technique in order to enhance the true classification rate. It is observed that genetic algorithm-based ensemble classification outperforms than individual classifiers as well as simple majority voting base ensemble. The performance of genetic algorithm-based ensemble classification is highly reported on hybrid feature space, with an accuracy of 96.45%. In comparison to the existing techniques, 'iACP-GAEnsC' model has achieved remarkable improvement in terms of various performance metrics. Based on the simulation results, it is observed that 'iACP-GAEnsC' model might be a leading tool in the field of drug design and proteomics for researchers. Copyright © 2017 Elsevier B.V. All rights reserved.
Using histograms to introduce randomization in the generation of ensembles of decision trees
Kamath, Chandrika; Cantu-Paz, Erick; Littau, David
2005-02-22
A system for decision tree ensembles that includes a module to read the data, a module to create a histogram, a module to evaluate a potential split according to some criterion using the histogram, a module to select a split point randomly in an interval around the best split, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method includes the steps of reading the data; creating a histogram; evaluating a potential split according to some criterion using the histogram, selecting a split point randomly in an interval around the best split, splitting the data, and combining multiple decision trees in ensembles.
High northern latitude temperature extremes, 1400-1999
NASA Astrophysics Data System (ADS)
Tingley, M. P.; Huybers, P.; Hughen, K. A.
2009-12-01
There is often an interest in determining which interval features the most extreme value of a reconstructed climate field, such as the warmest year or decade in a temperature reconstruction. Previous approaches to this type of question have not fully accounted for the spatial and temporal covariance in the climate field when assessing the significance of extreme values. Here we present results from applying BARSAT, a new, Bayesian approach to reconstructing climate fields, to a 600 year multiproxy temperature data set that covers land areas between 45N and 85N. The end result of the analysis is an ensemble of spatially and temporally complete realizations of the temperature field, each of which is consistent with the observations and the estimated values of the parameters that define the assumed spatial and temporal covariance functions. In terms of the spatial average temperature, 1990-1999 was the warmest decade in the 1400-1999 interval in each of 2000 ensemble members, while 1995 was the warmest year in 98% of the ensemble members. A similar analysis at each node of a regular 5 degree grid gives insight into the spatial distribution of warm temperatures, and reveals that 1995 was anomalously warm in Eurasia, whereas 1998 featured extreme warmth in North America. In 70% of the ensemble members, 1601 featured the coldest spatial average, indicating that the eruption of Huaynaputina in Peru in 1600 (with a volcanic explosivity index of 6) had a major cooling impact on the high northern latitudes. Repeating this analysis at each node reveals the varying impacts of major volcanic eruptions on the distribution of extreme cooling. Finally, we use the ensemble to investigate extremes in the time evolution of centennial temperature trends, and find that in more than half the ensemble members, the greatest rate of change in the spatial mean time series was a cooling centered at 1600. The largest rate of centennial scale warming, however, occurred in the 20th Century in more than 98% of the ensemble members.
Complete analysis of ensemble inequivalence in the Blume-Emery-Griffiths model
NASA Astrophysics Data System (ADS)
Hovhannisyan, V. V.; Ananikian, N. S.; Campa, A.; Ruffo, S.
2017-12-01
We study inequivalence of canonical and microcanonical ensembles in the mean-field Blume-Emery-Griffiths model. This generalizes previous results obtained for the Blume-Capel model. The phase diagram strongly depends on the value of the biquadratic exchange interaction K , the additional feature present in the Blume-Emery-Griffiths model. At small values of K , as for the Blume-Capel model, lines of first- and second-order phase transitions between a ferromagnetic and a paramagnetic phase are present, separated by a tricritical point whose location is different in the two ensembles. At higher values of K the phase diagram changes substantially, with the appearance of a triple point in the canonical ensemble, which does not find any correspondence in the microcanonical ensemble. Moreover, one of the first-order lines that starts from the triple point ends in a critical point, whose position in the phase diagram is different in the two ensembles. This line separates two paramagnetic phases characterized by a different value of the quadrupole moment. These features were not previously studied for other models and substantially enrich the landscape of ensemble inequivalence, identifying new aspects that had been discussed in a classification of phase transitions based on singularity theory. Finally, we discuss ergodicity breaking, which is highlighted by the presence of gaps in the accessible values of magnetization at low energies: it also displays new interesting patterns that are not present in the Blume-Capel model.
ERIC Educational Resources Information Center
Berndt, Arnold, Comp.
This catalogue lists phonograph records which feature solo and ensemble music by wind and percussion instruments. It supplements the "1978 Catalogue of Wind and Percussion Solos and Ensembles" (ED 171 614). Instruments played on the records include oboe/English horn, flute, clarinet, bassoon, saxophone, trumpet/cornet, French horn, woodwind…
NASA Astrophysics Data System (ADS)
Chen, Dongyue; Lin, Jianhui; Li, Yanping
2018-06-01
Complementary ensemble empirical mode decomposition (CEEMD) has been developed for the mode-mixing problem in Empirical Mode Decomposition (EMD) method. Compared to the ensemble empirical mode decomposition (EEMD), the CEEMD method reduces residue noise in the signal reconstruction. Both CEEMD and EEMD need enough ensemble number to reduce the residue noise, and hence it would be too much computation cost. Moreover, the selection of intrinsic mode functions (IMFs) for further analysis usually depends on experience. A modified CEEMD method and IMFs evaluation index are proposed with the aim of reducing the computational cost and select IMFs automatically. A simulated signal and in-service high-speed train gearbox vibration signals are employed to validate the proposed method in this paper. The results demonstrate that the modified CEEMD can decompose the signal efficiently with less computation cost, and the IMFs evaluation index can select the meaningful IMFs automatically.
Yao, Dongren; Calhoun, Vince D; Fu, Zening; Du, Yuhui; Sui, Jing
2018-05-15
Discriminating Alzheimer's disease (AD) from its prodromal form, mild cognitive impairment (MCI), is a significant clinical problem that may facilitate early diagnosis and intervention, in which a more challenging issue is to classify MCI subtypes, i.e., those who eventually convert to AD (cMCI) versus those who do not (MCI). To solve this difficult 4-way classification problem (AD, MCI, cMCI and healthy controls), a competition was hosted by Kaggle to invite the scientific community to apply their machine learning approaches on pre-processed sets of T1-weighted magnetic resonance images (MRI) data and the demographic information from the international Alzheimer's disease neuroimaging initiative (ADNI) database. This paper summarizes our competition results. We first proposed a hierarchical process by turning the 4-way classification into five binary classification problems. A new feature selection technology based on relative importance was also proposed, aiming to identify a more informative and concise subset from 426 sMRI morphometric and 3 demographic features, to ensure each binary classifier to achieve its highest accuracy. As a result, about 2% of the original features were selected to build a new feature space, which can achieve the final four-way classification with a 54.38% accuracy on testing data through hierarchical grouping, higher than several alternative methods in comparison. More importantly, the selected discriminative features such as hippocampal volume, parahippocampal surface area, and medial orbitofrontal thickness, etc. as well as the MMSE score, are reasonable and consistent with those reported in AD/MCI deficits. In summary, the proposed method provides a new framework for multi-way classification using hierarchical grouping and precise feature selection. Copyright © 2018 Elsevier B.V. All rights reserved.
Ensemble predictive model for more accurate soil organic carbon spectroscopic estimation
NASA Astrophysics Data System (ADS)
Vašát, Radim; Kodešová, Radka; Borůvka, Luboš
2017-07-01
A myriad of signal pre-processing strategies and multivariate calibration techniques has been explored in attempt to improve the spectroscopic prediction of soil organic carbon (SOC) over the last few decades. Therefore, to come up with a novel, more powerful, and accurate predictive approach to beat the rank becomes a challenging task. However, there may be a way, so that combine several individual predictions into a single final one (according to ensemble learning theory). As this approach performs best when combining in nature different predictive algorithms that are calibrated with structurally different predictor variables, we tested predictors of two different kinds: 1) reflectance values (or transforms) at each wavelength and 2) absorption feature parameters. Consequently we applied four different calibration techniques, two per each type of predictors: a) partial least squares regression and support vector machines for type 1, and b) multiple linear regression and random forest for type 2. The weights to be assigned to individual predictions within the ensemble model (constructed as a weighted average) were determined by an automated procedure that ensured the best solution among all possible was selected. The approach was tested at soil samples taken from surface horizon of four sites differing in the prevailing soil units. By employing the ensemble predictive model the prediction accuracy of SOC improved at all four sites. The coefficient of determination in cross-validation (R2cv) increased from 0.849, 0.611, 0.811 and 0.644 (the best individual predictions) to 0.864, 0.650, 0.824 and 0.698 for Site 1, 2, 3 and 4, respectively. Generally, the ensemble model affected the final prediction so that the maximal deviations of predicted vs. observed values of the individual predictions were reduced, and thus the correlation cloud became thinner as desired.
AUC-based biomarker ensemble with an application on gene scores predicting low bone mineral density.
Zhao, X G; Dai, W; Li, Y; Tian, L
2011-11-01
The area under the receiver operating characteristic (ROC) curve (AUC), long regarded as a 'golden' measure for the predictiveness of a continuous score, has propelled the need to develop AUC-based predictors. However, the AUC-based ensemble methods are rather scant, largely due to the fact that the associated objective function is neither continuous nor concave. Indeed, there is no reliable numerical algorithm identifying optimal combination of a set of biomarkers to maximize the AUC, especially when the number of biomarkers is large. We have proposed a novel AUC-based statistical ensemble methods for combining multiple biomarkers to differentiate a binary response of interest. Specifically, we propose to replace the non-continuous and non-convex AUC objective function by a convex surrogate loss function, whose minimizer can be efficiently identified. With the established framework, the lasso and other regularization techniques enable feature selections. Extensive simulations have demonstrated the superiority of the new methods to the existing methods. The proposal has been applied to a gene expression dataset to construct gene expression scores to differentiate elderly women with low bone mineral density (BMD) and those with normal BMD. The AUCs of the resulting scores in the independent test dataset has been satisfactory. Aiming for directly maximizing AUC, the proposed AUC-based ensemble method provides an efficient means of generating a stable combination of multiple biomarkers, which is especially useful under the high-dimensional settings. lutian@stanford.edu. Supplementary data are available at Bioinformatics online.
Mazurowski, Maciej A; Zurada, Jacek M; Tourassi, Georgia D
2009-07-01
Ensemble classifiers have been shown efficient in multiple applications. In this article, the authors explore the effectiveness of ensemble classifiers in a case-based computer-aided diagnosis system for detection of masses in mammograms. They evaluate two general ways of constructing subclassifiers by resampling of the available development dataset: Random division and random selection. Furthermore, they discuss the problem of selecting the ensemble size and propose two adaptive incremental techniques that automatically select the size for the problem at hand. All the techniques are evaluated with respect to a previously proposed information-theoretic CAD system (IT-CAD). The experimental results show that the examined ensemble techniques provide a statistically significant improvement (AUC = 0.905 +/- 0.024) in performance as compared to the original IT-CAD system (AUC = 0.865 +/- 0.029). Some of the techniques allow for a notable reduction in the total number of examples stored in the case base (to 1.3% of the original size), which, in turn, results in lower storage requirements and a shorter response time of the system. Among the methods examined in this article, the two proposed adaptive techniques are by far the most effective for this purpose. Furthermore, the authors provide some discussion and guidance for choosing the ensemble parameters.
Conductor gestures influence evaluations of ensemble performance
Morrison, Steven J.; Price, Harry E.; Smedley, Eric M.; Meals, Cory D.
2014-01-01
Previous research has found that listener evaluations of ensemble performances vary depending on the expressivity of the conductor’s gestures, even when performances are otherwise identical. It was the purpose of the present study to test whether this effect of visual information was evident in the evaluation of specific aspects of ensemble performance: articulation and dynamics. We constructed a set of 32 music performances that combined auditory and visual information and were designed to feature a high degree of contrast along one of two target characteristics: articulation and dynamics. We paired each of four music excerpts recorded by a chamber ensemble in both a high- and low-contrast condition with video of four conductors demonstrating high- and low-contrast gesture specifically appropriate to either articulation or dynamics. Using one of two equivalent test forms, college music majors and non-majors (N = 285) viewed sixteen 30 s performances and evaluated the quality of the ensemble’s articulation, dynamics, technique, and tempo along with overall expressivity. Results showed significantly higher evaluations for performances featuring high rather than low conducting expressivity regardless of the ensemble’s performance quality. Evaluations for both articulation and dynamics were strongly and positively correlated with evaluations of overall ensemble expressivity. PMID:25104944
NASA Astrophysics Data System (ADS)
Nemoto, Mitsutaka; Hayashi, Naoto; Hanaoka, Shouhei; Nomura, Yukihiro; Miki, Soichiro; Yoshikawa, Takeharu; Ohtomo, Kuni
2016-03-01
The purpose of this study is to evaluate the feasibility of a novel feature generation, which is based on multiple deep neural networks (DNNs) with boosting, for computer-assisted detection (CADe). It is hard and time-consuming to optimize the hyperparameters for DNNs such as stacked denoising autoencoder (SdA). The proposed method allows using SdA based features without the burden of the hyperparameter setting. The proposed method was evaluated by an application for detecting cerebral aneurysms on magnetic resonance angiogram (MRA). A baseline CADe process included four components; scaling, candidate area limitation, candidate detection, and candidate classification. Proposed feature generation method was applied to extract the optimal features for candidate classification. Proposed method only required setting range of the hyperparameters for SdA. The optimal feature set was selected from a large quantity of SdA based features by multiple SdAs, each of which was trained using different hyperparameter set. The feature selection was operated through ada-boost ensemble learning method. Training of the baseline CADe process and proposed feature generation were operated with 200 MRA cases, and the evaluation was performed with 100 MRA cases. Proposed method successfully provided SdA based features just setting the range of some hyperparameters for SdA. The CADe process by using both previous voxel features and SdA based features had the best performance with 0.838 of an area under ROC curve and 0.312 of ANODE score. The results showed that proposed method was effective in the application for detecting cerebral aneurysms on MRA.
Phase-selective entrainment of nonlinear oscillator ensembles
Zlotnik, Anatoly V.; Nagao, Raphael; Kiss, Istvan Z.; ...
2016-03-18
The ability to organize and finely manipulate the hierarchy and timing of dynamic processes is important for understanding and influencing brain functions, sleep and metabolic cycles, and many other natural phenomena. However, establishing spatiotemporal structures in biological oscillator ensembles is a challenging task that requires controlling large collections of complex nonlinear dynamical units. In this report, we present a method to design entrainment signals that create stable phase patterns in ensembles of heterogeneous nonlinear oscillators without using state feedback information. We demonstrate the approach using experiments with electrochemical reactions on multielectrode arrays, in which we selectively assign ensemble subgroups intomore » spatiotemporal patterns with multiple phase clusters. As a result, the experimentally confirmed mechanism elucidates the connection between the phases and natural frequencies of a collection of dynamical elements, the spatial and temporal information that is encoded within this ensemble, and how external signals can be used to retrieve this information.« less
Phase-selective entrainment of nonlinear oscillator ensembles
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zlotnik, Anatoly V.; Nagao, Raphael; Kiss, Istvan Z.
The ability to organize and finely manipulate the hierarchy and timing of dynamic processes is important for understanding and influencing brain functions, sleep and metabolic cycles, and many other natural phenomena. However, establishing spatiotemporal structures in biological oscillator ensembles is a challenging task that requires controlling large collections of complex nonlinear dynamical units. In this report, we present a method to design entrainment signals that create stable phase patterns in ensembles of heterogeneous nonlinear oscillators without using state feedback information. We demonstrate the approach using experiments with electrochemical reactions on multielectrode arrays, in which we selectively assign ensemble subgroups intomore » spatiotemporal patterns with multiple phase clusters. As a result, the experimentally confirmed mechanism elucidates the connection between the phases and natural frequencies of a collection of dynamical elements, the spatial and temporal information that is encoded within this ensemble, and how external signals can be used to retrieve this information.« less
Phase-selective entrainment of nonlinear oscillator ensembles
NASA Astrophysics Data System (ADS)
Zlotnik, Anatoly; Nagao, Raphael; Kiss, István Z.; Li-Shin, Jr.
2016-03-01
The ability to organize and finely manipulate the hierarchy and timing of dynamic processes is important for understanding and influencing brain functions, sleep and metabolic cycles, and many other natural phenomena. However, establishing spatiotemporal structures in biological oscillator ensembles is a challenging task that requires controlling large collections of complex nonlinear dynamical units. In this report, we present a method to design entrainment signals that create stable phase patterns in ensembles of heterogeneous nonlinear oscillators without using state feedback information. We demonstrate the approach using experiments with electrochemical reactions on multielectrode arrays, in which we selectively assign ensemble subgroups into spatiotemporal patterns with multiple phase clusters. The experimentally confirmed mechanism elucidates the connection between the phases and natural frequencies of a collection of dynamical elements, the spatial and temporal information that is encoded within this ensemble, and how external signals can be used to retrieve this information.
Prediction of Weather Impacted Airport Capacity using Ensemble Learning
NASA Technical Reports Server (NTRS)
Wang, Yao Xun
2011-01-01
Ensemble learning with the Bagging Decision Tree (BDT) model was used to assess the impact of weather on airport capacities at selected high-demand airports in the United States. The ensemble bagging decision tree models were developed and validated using the Federal Aviation Administration (FAA) Aviation System Performance Metrics (ASPM) data and weather forecast at these airports. The study examines the performance of BDT, along with traditional single Support Vector Machines (SVM), for airport runway configuration selection and airport arrival rates (AAR) prediction during weather impacts. Testing of these models was accomplished using observed weather, weather forecast, and airport operation information at the chosen airports. The experimental results show that ensemble methods are more accurate than a single SVM classifier. The airport capacity ensemble method presented here can be used as a decision support model that supports air traffic flow management to meet the weather impacted airport capacity in order to reduce costs and increase safety.
Majid, Abdul; Ali, Safdar
2015-01-01
We developed genetic programming (GP)-based evolutionary ensemble system for the early diagnosis, prognosis and prediction of human breast cancer. This system has effectively exploited the diversity in feature and decision spaces. First, individual learners are trained in different feature spaces using physicochemical properties of protein amino acids. Their predictions are then stacked to develop the best solution during GP evolution process. Finally, results for HBC-Evo system are obtained with optimal threshold, which is computed using particle swarm optimization. Our novel approach has demonstrated promising results compared to state of the art approaches.
Improving database enrichment through ensemble docking
NASA Astrophysics Data System (ADS)
Rao, Shashidhar; Sanschagrin, Paul C.; Greenwood, Jeremy R.; Repasky, Matthew P.; Sherman, Woody; Farid, Ramy
2008-09-01
While it may seem intuitive that using an ensemble of multiple conformations of a receptor in structure-based virtual screening experiments would necessarily yield improved enrichment of actives relative to using just a single receptor, it turns out that at least in the p38 MAP kinase model system studied here, a very large majority of all possible ensembles do not yield improved enrichment of actives. However, there are combinations of receptor structures that do lead to improved enrichment results. We present here a method to select the ensembles that produce the best enrichments that does not rely on knowledge of active compounds or sophisticated analyses of the 3D receptor structures. In the system studied here, the small fraction of ensembles of up to 3 receptors that do yield good enrichments of actives were identified by selecting ensembles that have the best mean GlideScore for the top 1% of the docked ligands in a database screen of actives and drug-like "decoy" ligands. Ensembles of two receptors identified using this mean GlideScore metric generally outperform single receptors, while ensembles of three receptors identified using this metric consistently give optimal enrichment factors in which, for example, 40% of the known actives outrank all the other ligands in the database.
Spam comments prediction using stacking with ensemble learning
NASA Astrophysics Data System (ADS)
Mehmood, Arif; On, Byung-Won; Lee, Ingyu; Ashraf, Imran; Choi, Gyu Sang
2018-01-01
Illusive comments of product or services are misleading for people in decision making. The current methodologies to predict deceptive comments are concerned for feature designing with single training model. Indigenous features have ability to show some linguistic phenomena but are hard to reveal the latent semantic meaning of the comments. We propose a prediction model on general features of documents using stacking with ensemble learning. Term Frequency/Inverse Document Frequency (TF/IDF) features are inputs to stacking of Random Forest and Gradient Boosted Trees and the outputs of the base learners are encapsulated with decision tree to make final training of the model. The results exhibits that our approach gives the accuracy of 92.19% which outperform the state-of-the-art method.
WONKA: objective novel complex analysis for ensembles of protein-ligand structures.
Bradley, A R; Wall, I D; von Delft, F; Green, D V S; Deane, C M; Marsden, B D
2015-10-01
WONKA is a tool for the systematic analysis of an ensemble of protein-ligand structures. It makes the identification of conserved and unusual features within such an ensemble straightforward. WONKA uses an intuitive workflow to process structural co-ordinates. Ligand and protein features are summarised and then presented within an interactive web application. WONKA's power in consolidating and summarising large amounts of data is described through the analysis of three bromodomain datasets. Furthermore, and in contrast to many current methods, WONKA relates analysis to individual ligands, from which we find unusual and erroneous binding modes. Finally the use of WONKA as an annotation tool to share observations about structures is demonstrated. WONKA is freely available to download and install locally or can be used online at http://wonka.sgc.ox.ac.uk.
An ensemble of SVM classifiers based on gene pairs.
Tong, Muchenxuan; Liu, Kun-Hong; Xu, Chungui; Ju, Wenbin
2013-07-01
In this paper, a genetic algorithm (GA) based ensemble support vector machine (SVM) classifier built on gene pairs (GA-ESP) is proposed. The SVMs (base classifiers of the ensemble system) are trained on different informative gene pairs. These gene pairs are selected by the top scoring pair (TSP) criterion. Each of these pairs projects the original microarray expression onto a 2-D space. Extensive permutation of gene pairs may reveal more useful information and potentially lead to an ensemble classifier with satisfactory accuracy and interpretability. GA is further applied to select an optimized combination of base classifiers. The effectiveness of the GA-ESP classifier is evaluated on both binary-class and multi-class datasets. Copyright © 2013 Elsevier Ltd. All rights reserved.
Purely Structural Protein Scoring Functions Using Support Vector Machine and Ensemble Learning.
Mirzaei, Shokoufeh; Sidi, Tomer; Keasar, Chen; Crivelli, Silvia
2016-08-24
The function of a protein is determined by its structure, which creates a need for efficient methods of protein structure determination to advance scientific and medical research. Because current experimental structure determination methods carry a high price tag, computational predictions are highly desirable. Given a protein sequence, computational methods produce numerous 3D structures known as decoys. However, selection of the best quality decoys is challenging as the end users can handle only a few ones. Therefore, scoring functions are central to decoy selection. They combine measurable features into a single number indicator of decoy quality. Unfortunately, current scoring functions do not consistently select the best decoys. Machine learning techniques offer great potential to improve decoy scoring. This paper presents two machine-learning based scoring functions to predict the quality of proteins structures, i.e., the similarity between the predicted structure and the experimental one without knowing the latter. We use different metrics to compare these scoring functions against three state-of-the-art scores. This is a first attempt at comparing different scoring functions using the same non-redundant dataset for training and testing and the same features. The results show that adding informative features may be more significant than the method used.
Whitaker, Leslie R; Warren, Brandon L; Venniro, Marco; Harte, Tyler C; McPherson, Kylie B; Beidel, Jennifer; Bossert, Jennifer M; Shaham, Yavin; Bonci, Antonello; Hope, Bruce T
2017-09-06
Learned associations between environmental stimuli and rewards drive goal-directed learning and motivated behavior. These memories are thought to be encoded by alterations within specific patterns of sparsely distributed neurons called neuronal ensembles that are activated selectively by reward-predictive stimuli. Here, we use the Fos promoter to identify strongly activated neuronal ensembles in rat prelimbic cortex (PLC) and assess altered intrinsic excitability after 10 d of operant food self-administration training (1 h/d). First, we used the Daun02 inactivation procedure in male FosLacZ-transgenic rats to ablate selectively Fos-expressing PLC neurons that were active during operant food self-administration. Selective ablation of these neurons decreased food seeking. We then used male FosGFP-transgenic rats to assess selective alterations of intrinsic excitability in Fos-expressing neuronal ensembles (FosGFP + ) that were activated during food self-administration and compared these with alterations in less activated non-ensemble neurons (FosGFP - ). Using whole-cell recordings of layer V pyramidal neurons in an ex vivo brain slice preparation, we found that operant self-administration increased excitability of FosGFP + neurons and decreased excitability of FosGFP - neurons. Increased excitability of FosGFP + neurons was driven by increased steady-state input resistance. Decreased excitability of FosGFP - neurons was driven by increased contribution of small-conductance calcium-activated potassium (SK) channels. Injections of the specific SK channel antagonist apamin into PLC increased Fos expression but had no effect on food seeking. Overall, operant learning increased intrinsic excitability of PLC Fos-expressing neuronal ensembles that play a role in food seeking but decreased intrinsic excitability of Fos - non-ensembles. SIGNIFICANCE STATEMENT Prefrontal cortex activity plays a critical role in operant learning, but the underlying cellular mechanisms are unknown. Using the chemogenetic Daun02 inactivation procedure, we found that a small number of strongly activated Fos-expressing neuronal ensembles in rat PLC play an important role in learned operant food seeking. Using GFP expression to identify Fos-expressing layer V pyramidal neurons in prelimbic cortex (PLC) of FosGFP-transgenic rats, we found that operant food self-administration led to increased intrinsic excitability in the behaviorally relevant Fos-expressing neuronal ensembles, but decreased intrinsic excitability in Fos - neurons using distinct cellular mechanisms. Copyright © 2017 the authors 0270-6474/17/378845-12$15.00/0.
Yousef, Malik; Khalifa, Waleed; AbedAllah, Loai
2016-12-22
The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that ECkNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.
Yousef, Malik; Khalifa, Waleed; AbdAllah, Loai
2016-12-01
The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that EC-kNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.
ERIC Educational Resources Information Center
McKee, Lori L.; Heydon, Rachel M.
2015-01-01
This exploratory case study considered the opportunities for print literacy learning within multimodal ensembles that featured art, singing and digital media within the context of an intergenerational programme that brought together 13 kindergarten children (4 and 5 years) with seven elder companions. Study questions concerned how reading and…
Conservation of Mass and Preservation of Positivity with Ensemble-Type Kalman Filter Algorithms
NASA Technical Reports Server (NTRS)
Janjic, Tijana; Mclaughlin, Dennis; Cohn, Stephen E.; Verlaan, Martin
2014-01-01
This paper considers the incorporation of constraints to enforce physically based conservation laws in the ensemble Kalman filter. In particular, constraints are used to ensure that the ensemble members and the ensemble mean conserve mass and remain nonnegative through measurement updates. In certain situations filtering algorithms such as the ensemble Kalman filter (EnKF) and ensemble transform Kalman filter (ETKF) yield updated ensembles that conserve mass but are negative, even though the actual states must be nonnegative. In such situations if negative values are set to zero, or a log transform is introduced, the total mass will not be conserved. In this study, mass and positivity are both preserved by formulating the filter update as a set of quadratic programming problems that incorporate non-negativity constraints. Simple numerical experiments indicate that this approach can have a significant positive impact on the posterior ensemble distribution, giving results that are more physically plausible both for individual ensemble members and for the ensemble mean. In two examples, an update that includes a non-negativity constraint is able to properly describe the transport of a sharp feature (e.g., a triangle or cone). A number of implementation questions still need to be addressed, particularly the need to develop a computationally efficient quadratic programming update for large ensemble.
Understanding the Structural Ensembles of a Highly Extended Disordered Protein†
Daughdrill, Gary W.; Kashtanov, Stepan; Stancik, Amber; Hill, Shannon E.; Helms, Gregory; Muschol, Martin
2013-01-01
Developing a comprehensive description of the equilibrium structural ensembles for intrinsically disordered proteins (IDPs) is essential to understanding their function. The p53 transactivation domain (p53TAD) is an IDP that interacts with multiple protein partners and contains numerous phosphorylation sites. Multiple techniques were used to investigate the equilibrium structural ensemble of p53TAD in its native and chemically unfolded states. The results from these experiments show that the native state of p53TAD has dimensions similar to a classical random coil while the chemically unfolded state is more extended. To investigate the molecular properties responsible for this behavior, a novel algorithm that generates diverse and unbiased structural ensembles of IDPs was developed. This algorithm was used to generate a large pool of plausible p53TAD structures that were reweighted to identify a subset of structures with the best fit to small angle X-ray scattering data. High weight structures in the native state ensemble show features that are localized to protein binding sites and regions with high proline content. The features localized to the protein binding sites are mostly eliminated in the chemically unfolded ensemble; while, the regions with high proline content remain relatively unaffected. Data from NMR experiments support these results, showing that residues from the protein binding sites experience larger environmental changes upon unfolding by urea than regions with high proline content. This behavior is consistent with the urea-induced exposure of nonpolar and aromatic side-chains in the protein binding sites that are partially excluded from solvent in the native state ensemble. PMID:21979461
ERIC Educational Resources Information Center
Mages, Wendy K.
2013-01-01
This research analyzes the techniques, strategies, and philosophical foundations that contributed to the quality and maintenance of a strong theatre-in-education ensemble. This study details how the company selected ensemble members and describes the work environment the company developed to promote collaboration and encourage actor-teacher…
Collaborative Composing in High School String Chamber Music Ensembles
ERIC Educational Resources Information Center
Hopkins, Michael T.
2015-01-01
The purpose of this study was to examine collaborative composing in high school string chamber music ensembles. Research questions included the following: (a) How do high school string instrumentalists in chamber music ensembles use verbal and musical forms of communication to collaboratively compose a piece of music? (b) How do selected variables…
Selecting climate simulations for impact studies based on multivariate patterns of climate change.
Mendlik, Thomas; Gobiet, Andreas
In climate change impact research it is crucial to carefully select the meteorological input for impact models. We present a method for model selection that enables the user to shrink the ensemble to a few representative members, conserving the model spread and accounting for model similarity. This is done in three steps: First, using principal component analysis for a multitude of meteorological parameters, to find common patterns of climate change within the multi-model ensemble. Second, detecting model similarities with regard to these multivariate patterns using cluster analysis. And third, sampling models from each cluster, to generate a subset of representative simulations. We present an application based on the ENSEMBLES regional multi-model ensemble with the aim to provide input for a variety of climate impact studies. We find that the two most dominant patterns of climate change relate to temperature and humidity patterns. The ensemble can be reduced from 25 to 5 simulations while still maintaining its essential characteristics. Having such a representative subset of simulations reduces computational costs for climate impact modeling and enhances the quality of the ensemble at the same time, as it prevents double-counting of dependent simulations that would lead to biased statistics. The online version of this article (doi:10.1007/s10584-015-1582-0) contains supplementary material, which is available to authorized users.
Mazurowski, Maciej A.; Zurada, Jacek M.; Tourassi, Georgia D.
2009-01-01
Ensemble classifiers have been shown efficient in multiple applications. In this article, the authors explore the effectiveness of ensemble classifiers in a case-based computer-aided diagnosis system for detection of masses in mammograms. They evaluate two general ways of constructing subclassifiers by resampling of the available development dataset: Random division and random selection. Furthermore, they discuss the problem of selecting the ensemble size and propose two adaptive incremental techniques that automatically select the size for the problem at hand. All the techniques are evaluated with respect to a previously proposed information-theoretic CAD system (IT-CAD). The experimental results show that the examined ensemble techniques provide a statistically significant improvement (AUC=0.905±0.024) in performance as compared to the original IT-CAD system (AUC=0.865±0.029). Some of the techniques allow for a notable reduction in the total number of examples stored in the case base (to 1.3% of the original size), which, in turn, results in lower storage requirements and a shorter response time of the system. Among the methods examined in this article, the two proposed adaptive techniques are by far the most effective for this purpose. Furthermore, the authors provide some discussion and guidance for choosing the ensemble parameters. PMID:19673196
Papiotis, Panos; Marchini, Marco; Perez-Carrillo, Alfonso; Maestre, Esteban
2014-01-01
In a musical ensemble such as a string quartet, the musicians interact and influence each other's actions in several aspects of the performance simultaneously in order to achieve a common aesthetic goal. In this article, we present and evaluate a computational approach for measuring the degree to which these interactions exist in a given performance. We recorded a number of string quartet exercises under two experimental conditions (solo and ensemble), acquiring both audio and bowing motion data. Numerical features in the form of time series were extracted from the data as performance descriptors representative of four distinct dimensions of the performance: Intonation, Dynamics, Timbre, and Tempo. Four different interdependence estimation methods (two linear and two nonlinear) were applied to the extracted features in order to assess the overall level of interdependence between the four musicians. The obtained results suggest that it is possible to correctly discriminate between the two experimental conditions by quantifying interdependence between the musicians in each of the studied performance dimensions; the nonlinear methods appear to perform best for most of the numerical features tested. Moreover, by using the solo recordings as a reference to which the ensemble recordings are contrasted, it is feasible to compare the amount of interdependence that is established between the musicians in a given performance dimension across all exercises, and relate the results to the underlying goal of the exercise. We discuss our findings in the context of ensemble performance research, the current limitations of our approach, and the ways in which it can be expanded and consolidated. PMID:25228894
Blend in Singing Ensemble Performance: Vibrato Production in a Vocal Quartet.
Daffern, Helena
2017-05-01
"Blend" is a defining characteristic of good vocal ensemble performance. To achieve this, directors often consider vibrato as a feature to be controlled and consequently restrict its use. Analysis of individual voices in ensemble situations presents several challenges, including the isolation of voices for analysis from recordings. This study considers vibrato production as a feature that contributes to blend through an ecological study of a vocal quartet. A vocal ensemble was recorded using head-worn microphones and electrolaryngograph electrodes to enable fundamental frequency analysis of the individual voices. The same four-part material was recorded over several weeks of rehearsal to allow analysis of conscious and subconscious changes to vibrato production over time. Alongside the recording of their rehearsal discussions, singers were also asked for opinions on vibrato production in connection with blend. The results indicate that vibrato is adjusted to some extent by individual singers to improve blend, with some instances of synchrony between voice parts. Some conscious alterations to vibrato were made to improve blend; however, these are not always evident in the data, suggesting that singers' own perceptions of their performance may be influenced by other factors. These findings indicate a need for further studies of vibrato as a feature of blend, particularly in terms of the synergies between expectation and actual production, and potential synchronicity between singers; increased understanding of vibrato in an ensemble setting will lead to more efficient rehearsal techniques and vocal training, and could prevent vocal misuse leading to pathology in the future. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Zerbino, Daniel R.; Johnson, Nathan; Juetteman, Thomas; Sheppard, Dan; Wilder, Steven P.; Lavidas, Ilias; Nuhn, Michael; Perry, Emily; Raffaillac-Desfosses, Quentin; Sobral, Daniel; Keefe, Damian; Gräf, Stefan; Ahmed, Ikhlak; Kinsella, Rhoda; Pritchard, Bethan; Brent, Simon; Amode, Ridwan; Parker, Anne; Trevanion, Steven; Birney, Ewan; Dunham, Ian; Flicek, Paul
2016-01-01
New experimental techniques in epigenomics allow researchers to assay a diversity of highly dynamic features such as histone marks, DNA modifications or chromatin structure. The study of their fluctuations should provide insights into gene expression regulation, cell differentiation and disease. The Ensembl project collects and maintains the Ensembl regulation data resources on epigenetic marks, transcription factor binding and DNA methylation for human and mouse, as well as microarray probe mappings and annotations for a variety of chordate genomes. From this data, we produce a functional annotation of the regulatory elements along the human and mouse genomes with plans to expand to other species as data becomes available. Starting from well-studied cell lines, we will progressively expand our library of measurements to a greater variety of samples. Ensembl’s regulation resources provide a central and easy-to-query repository for reference epigenomes. As with all Ensembl data, it is freely available at http://www.ensembl.org, from the Perl and REST APIs and from the public Ensembl MySQL database server at ensembldb.ensembl.org. Database URL: http://www.ensembl.org PMID:26888907
NASA Astrophysics Data System (ADS)
Qian, Y.; Wang, L.; Leung, L. R.; Lin, G.; Lu, J.; Gao, Y.; Zhang, Y.
2017-12-01
Projecting precipitation changes is challenging because of incomplete understanding of the climate system and biases and uncertainty in climate models. In East Asia where summer precipitation is dominantly influenced by the monsoon circulation and the global models from Coupled Model Intercomparison Project Phase 5 (CMIP5), however, give various projection of precipitation change for 21th century. It is critical for community to know which models' projection are more reliable in response to natural and anthropogenic forcings. In this study we defined multiple-dimensional metrics, measuring the model performance in simulating the present-day of large-scale circulation, regional precipitation and relationship between them. The large-scale circulation features examined in this study include the lower tropospheric southwesterly winds, the western North Pacific subtropical high, the South China Sea Subtropical High, and the East Asian westerly jet in the upper troposphere. Each of these circulation features transport moisture to East Asia, enhancing the moist static energy and strengthening the Meiyu moisture front that is the primary mechanism for precipitation generation in eastern China. Based on these metrics, 30 models in CMIP5 ensemble are classified into three groups. Models in the top performing group projected regional precipitation patterns that are more similar to each other than the bottom or middle performing group and consistently projected statistically significant increasing trends in two of the large-scale circulation indices and precipitation. In contrast, models in the bottom or middle performing group projected small drying or no trends in precipitation. We also find the models that only reasonably reproduce the observed precipitation climatology does not guarantee more reliable projection of future precipitation because good simulation skill could be achieved through compensating errors from multiple sources. Herein the potential for more robust projections of precipitation changes at regional scale is demonstrated through the use of discriminating metric to subsample the multi-model ensemble. The results from this study provides insights for how to select models from CMIP ensemble to project regional climate and hydrological cycle changes.
Using Ensemble Decisions and Active Selection to Improve Low-Cost Labeling for Multi-View Data
NASA Technical Reports Server (NTRS)
Rebbapragada, Umaa; Wagstaff, Kiri L.
2011-01-01
This paper seeks to improve low-cost labeling in terms of training set reliability (the fraction of correctly labeled training items) and test set performance for multi-view learning methods. Co-training is a popular multiview learning method that combines high-confidence example selection with low-cost (self) labeling. However, co-training with certain base learning algorithms significantly reduces training set reliability, causing an associated drop in prediction accuracy. We propose the use of ensemble labeling to improve reliability in such cases. We also discuss and show promising results on combining low-cost ensemble labeling with active (low-confidence) example selection. We unify these example selection and labeling strategies under collaborative learning, a family of techniques for multi-view learning that we are developing for distributed, sensor-network environments.
Recognition of emotions using multimodal physiological signals and an ensemble deep learning model.
Yin, Zhong; Zhao, Mengyuan; Wang, Yongxiong; Yang, Jingdong; Zhang, Jianhua
2017-03-01
Using deep-learning methodologies to analyze multimodal physiological signals becomes increasingly attractive for recognizing human emotions. However, the conventional deep emotion classifiers may suffer from the drawback of the lack of the expertise for determining model structure and the oversimplification of combining multimodal feature abstractions. In this study, a multiple-fusion-layer based ensemble classifier of stacked autoencoder (MESAE) is proposed for recognizing emotions, in which the deep structure is identified based on a physiological-data-driven approach. Each SAE consists of three hidden layers to filter the unwanted noise in the physiological features and derives the stable feature representations. An additional deep model is used to achieve the SAE ensembles. The physiological features are split into several subsets according to different feature extraction approaches with each subset separately encoded by a SAE. The derived SAE abstractions are combined according to the physiological modality to create six sets of encodings, which are then fed to a three-layer, adjacent-graph-based network for feature fusion. The fused features are used to recognize binary arousal or valence states. DEAP multimodal database was employed to validate the performance of the MESAE. By comparing with the best existing emotion classifier, the mean of classification rate and F-score improves by 5.26%. The superiority of the MESAE against the state-of-the-art shallow and deep emotion classifiers has been demonstrated under different sizes of the available physiological instances. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Bianconi, Ginestra
2009-03-01
In this paper we generalize the concept of random networks to describe network ensembles with nontrivial features by a statistical mechanics approach. This framework is able to describe undirected and directed network ensembles as well as weighted network ensembles. These networks might have nontrivial community structure or, in the case of networks embedded in a given space, they might have a link probability with a nontrivial dependence on the distance between the nodes. These ensembles are characterized by their entropy, which evaluates the cardinality of networks in the ensemble. In particular, in this paper we define and evaluate the structural entropy, i.e., the entropy of the ensembles of undirected uncorrelated simple networks with given degree sequence. We stress the apparent paradox that scale-free degree distributions are characterized by having small structural entropy while they are so widely encountered in natural, social, and technological complex systems. We propose a solution to the paradox by proving that scale-free degree distributions are the most likely degree distribution with the corresponding value of the structural entropy. Finally, the general framework we present in this paper is able to describe microcanonical ensembles of networks as well as canonical or hidden-variable network ensembles with significant implications for the formulation of network-constructing algorithms.
Preferences of and Attitudes toward Treble Choral Ensembles
ERIC Educational Resources Information Center
Wilson, Jill M.
2012-01-01
In choral ensembles, a pursuit where females far outnumber males, concern exists that females are being devalued. Attitudes of female choral singers may be negatively affected by the gender imbalance that exists in mixed choirs and by the placement of the mixed choir as the most select ensemble in a program. The purpose of this research was to…
Village Building Identification Based on Ensemble Convolutional Neural Networks
Guo, Zhiling; Chen, Qi; Xu, Yongwei; Shibasaki, Ryosuke; Shao, Xiaowei
2017-01-01
In this study, we present the Ensemble Convolutional Neural Network (ECNN), an elaborate CNN frame formulated based on ensembling state-of-the-art CNN models, to identify village buildings from open high-resolution remote sensing (HRRS) images. First, to optimize and mine the capability of CNN for village mapping and to ensure compatibility with our classification targets, a few state-of-the-art models were carefully optimized and enhanced based on a series of rigorous analyses and evaluations. Second, rather than directly implementing building identification by using these models, we exploited most of their advantages by ensembling their feature extractor parts into a stronger model called ECNN based on the multiscale feature learning method. Finally, the generated ECNN was applied to a pixel-level classification frame to implement object identification. The proposed method can serve as a viable tool for village building identification with high accuracy and efficiency. The experimental results obtained from the test area in Savannakhet province, Laos, prove that the proposed ECNN model significantly outperforms existing methods, improving overall accuracy from 96.64% to 99.26%, and kappa from 0.57 to 0.86. PMID:29084154
An ensemble predictive modeling framework for breast cancer classification.
Nagarajan, Radhakrishnan; Upreti, Meenakshi
2017-12-01
Molecular changes often precede clinical presentation of diseases and can be useful surrogates with potential to assist in informed clinical decision making. Recent studies have demonstrated the usefulness of modeling approaches such as classification that can predict the clinical outcomes from molecular expression profiles. While useful, a majority of these approaches implicitly use all molecular markers as features in the classification process often resulting in sparse high-dimensional projection of the samples often comparable to that of the sample size. In this study, a variant of the recently proposed ensemble classification approach is used for predicting good and poor-prognosis breast cancer samples from their molecular expression profiles. In contrast to traditional single and ensemble classifiers, the proposed approach uses multiple base classifiers with varying feature sets obtained from two-dimensional projection of the samples in conjunction with a majority voting strategy for predicting the class labels. In contrast to our earlier implementation, base classifiers in the ensembles are chosen based on maximal sensitivity and minimal redundancy by choosing only those with low average cosine distance. The resulting ensemble sets are subsequently modeled as undirected graphs. Performance of four different classification algorithms is shown to be better within the proposed ensemble framework in contrast to using them as traditional single classifier systems. Significance of a subset of genes with high-degree centrality in the network abstractions across the poor-prognosis samples is also discussed. Copyright © 2017 Elsevier Inc. All rights reserved.
Yang, Shan; Al-Hashimi, Hashim M.
2016-01-01
A growing number of studies employ time-averaged experimental data to determine dynamic ensembles of biomolecules. While it is well known that different ensembles can satisfy experimental data to within error, the extent and nature of these degeneracies, and their impact on the accuracy of the ensemble determination remains poorly understood. Here, we use simulations and a recently introduced metric for assessing ensemble similarity to explore degeneracies in determining ensembles using NMR residual dipolar couplings (RDCs) with specific application to A-form helices in RNA. Various target ensembles were constructed representing different domain-domain orientational distributions that are confined to a topologically restricted (<10%) conformational space. Five independent sets of ensemble averaged RDCs were then computed for each target ensemble and a ‘sample and select’ scheme used to identify degenerate ensembles that satisfy RDCs to within experimental uncertainty. We find that ensembles with different ensemble sizes and that can differ significantly from the target ensemble (by as much as ΣΩ ~ 0.4 where ΣΩ varies between 0 and 1 for maximum and minimum ensemble similarity, respectively) can satisfy the ensemble averaged RDCs. These deviations increase with the number of unique conformers and breadth of the target distribution, and result in significant uncertainty in determining conformational entropy (as large as 5 kcal/mol at T = 298 K). Nevertheless, the RDC-degenerate ensembles are biased towards populated regions of the target ensemble, and capture other essential features of the distribution, including the shape. Our results identify ensemble size as a major source of uncertainty in determining ensembles and suggest that NMR interactions such as RDCs and spin relaxation, on their own, do not carry the necessary information needed to determine conformational entropy at a useful level of precision. The framework introduced here provides a general approach for exploring degeneracies in ensemble determination for different types of experimental data. PMID:26131693
Johnson, David K.; Karanicolas, John
2015-01-01
Small-molecules that inhibit interactions between specific pairs of proteins have long represented a promising avenue for therapeutic intervention in a variety of settings. Structural studies have shown that in many cases, the inhibitor-bound protein adopts a conformation that is distinct from its unbound and its protein-bound conformations. This plasticity of the protein surface presents a major challenge in predicting which members of a protein family will be inhibited by a given ligand. Here, we use biased simulations of Bcl-2-family proteins to generate ensembles of low-energy conformations that contain surface pockets suitable for small molecule binding. We find that the resulting conformational ensembles include surface pockets that mimic those observed in inhibitor-bound crystal structures. Next, we find that the ensembles generated using different members of this protein family are overlapping but distinct, and that the activity of a given compound against a particular family member (ligand selectivity) can be predicted from whether the corresponding ensemble samples a complementary surface pocket. Finally, we find that each ensemble includes certain surface pockets that are not shared by any other family member: while no inhibitors have yet been identified to take advantage of these pockets, we expect that chemical scaffolds complementing these “distinct” pockets will prove highly selective for their targets. The opportunity to achieve target selectivity within a protein family by exploiting differences in surface fluctuations represents a new paradigm that may facilitate design of family-selective small-molecule inhibitors of protein-protein interactions. PMID:25706586
Computational selection of antibody-drug conjugate targets for breast cancer
Fauteux, François; Hill, Jennifer J.; Jaramillo, Maria L.; Pan, Youlian; Phan, Sieu; Famili, Fazel; O'Connor-McCourt, Maureen
2016-01-01
The selection of therapeutic targets is a critical aspect of antibody-drug conjugate research and development. In this study, we applied computational methods to select candidate targets overexpressed in three major breast cancer subtypes as compared with a range of vital organs and tissues. Microarray data corresponding to over 8,000 tissue samples were collected from the public domain. Breast cancer samples were classified into molecular subtypes using an iterative ensemble approach combining six classification algorithms and three feature selection techniques, including a novel kernel density-based method. This feature selection method was used in conjunction with differential expression and subcellular localization information to assemble a primary list of targets. A total of 50 cell membrane targets were identified, including one target for which an antibody-drug conjugate is in clinical use, and six targets for which antibody-drug conjugates are in clinical trials for the treatment of breast cancer and other solid tumors. In addition, 50 extracellular proteins were identified as potential targets for non-internalizing strategies and alternative modalities. Candidate targets linked with the epithelial-to-mesenchymal transition were identified by analyzing differential gene expression in epithelial and mesenchymal tumor-derived cell lines. Overall, these results show that mining human gene expression data has the power to select and prioritize breast cancer antibody-drug conjugate targets, and the potential to lead to new and more effective cancer therapeutics. PMID:26700623
ERIC Educational Resources Information Center
Bergee, Martin J.; Westfall, Claude R.
2005-01-01
This is the third study in a line of inquiry whose purpose has been to develop a theoretical model of selected extra musical variables' influence on solo and small-ensemble festival ratings. Authors of the second of these (Bergee & McWhirter, 2005) had used binomial logistic regression as the basis for their model-formulation strategy. Their…
Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection
Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad
2014-01-01
Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices. PMID:25177107
Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection.
Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad
2014-11-01
Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices.
Williams, W Jon; Coca, Aitor; Roberge, Raymond; Shepherd, Angie; Powell, Jeffrey; Shaffer, Ronald E
2011-01-01
This study investigated the physiological responses to wearing a standard firefighter ensemble (SE) and a prototype ensemble (PE) modified from the SE that contained additional features, such as magnetic ring enclosures at the glove-sleeve interface, integrated boot-pant interface, integrated hood-SCBA facepiece interface, and a novel hose arrangement that rerouted self-contained breathing apparatus (SCBA) exhaust gases back into the upper portion of the jacket. Although the features of the PE increased the level of encapsulation of the wearer that could lead to increased physiological stress compared with the SE, it was hypothesized that the rerouted exhaust gases provided by the PE hose assembly would (1) provide convective cooling to the upper torso, (2) reduce the thermal stress experienced by the wearer, and (3) reduce the overall physiological stress imposed by the PE such that it would be either less or not significantly different from the SE. Ten subjects (seven male, three female) performed treadmill exercise in an environmental chamber (22°C, 50% RH) at 50% [image omitted]O(2max) while wearing either the SE with an SCBA or the PE with an SCBA either with or without the hose attached (designated PEWH and PENH, respectively). Heart rate (HR), rectal and intestinal temperatures (T(re), T(in)), sweat loss, and endurance time were measured. All subjects completed at least 20 min of treadmill exercise during the testing. At the end of exercise, there was no difference in T(re) (p = 0.45) or T(in) (p = 0.42), HR, or total sweat loss between the SE and either PEWH or PENH (p = 0.59). However, T(sk) was greater in PEWH and PENH compared with SE (p < 0.05). Total endurance time in SE was greater than in either PEWH or PENH (p < 0.05). Thus, it was concluded that the rerouting of exhaust gases to the jacket did not provide significant convective cooling or reduce thermal stress compared with the SE under the mild conditions selected, and the data did not support the hypotheses of the present study.
An Intelligent Ensemble Neural Network Model for Wind Speed Prediction in Renewable Energy Systems.
Ranganayaki, V; Deepa, S N
2016-01-01
Various criteria are proposed to select the number of hidden neurons in artificial neural network (ANN) models and based on the criterion evolved an intelligent ensemble neural network model is proposed to predict wind speed in renewable energy applications. The intelligent ensemble neural model based wind speed forecasting is designed by averaging the forecasted values from multiple neural network models which includes multilayer perceptron (MLP), multilayer adaptive linear neuron (Madaline), back propagation neural network (BPN), and probabilistic neural network (PNN) so as to obtain better accuracy in wind speed prediction with minimum error. The random selection of hidden neurons numbers in artificial neural network results in overfitting or underfitting problem. This paper aims to avoid the occurrence of overfitting and underfitting problems. The selection of number of hidden neurons is done in this paper employing 102 criteria; these evolved criteria are verified by the computed various error values. The proposed criteria for fixing hidden neurons are validated employing the convergence theorem. The proposed intelligent ensemble neural model is applied for wind speed prediction application considering the real time wind data collected from the nearby locations. The obtained simulation results substantiate that the proposed ensemble model reduces the error value to minimum and enhances the accuracy. The computed results prove the effectiveness of the proposed ensemble neural network (ENN) model with respect to the considered error factors in comparison with that of the earlier models available in the literature.
An Intelligent Ensemble Neural Network Model for Wind Speed Prediction in Renewable Energy Systems
Ranganayaki, V.; Deepa, S. N.
2016-01-01
Various criteria are proposed to select the number of hidden neurons in artificial neural network (ANN) models and based on the criterion evolved an intelligent ensemble neural network model is proposed to predict wind speed in renewable energy applications. The intelligent ensemble neural model based wind speed forecasting is designed by averaging the forecasted values from multiple neural network models which includes multilayer perceptron (MLP), multilayer adaptive linear neuron (Madaline), back propagation neural network (BPN), and probabilistic neural network (PNN) so as to obtain better accuracy in wind speed prediction with minimum error. The random selection of hidden neurons numbers in artificial neural network results in overfitting or underfitting problem. This paper aims to avoid the occurrence of overfitting and underfitting problems. The selection of number of hidden neurons is done in this paper employing 102 criteria; these evolved criteria are verified by the computed various error values. The proposed criteria for fixing hidden neurons are validated employing the convergence theorem. The proposed intelligent ensemble neural model is applied for wind speed prediction application considering the real time wind data collected from the nearby locations. The obtained simulation results substantiate that the proposed ensemble model reduces the error value to minimum and enhances the accuracy. The computed results prove the effectiveness of the proposed ensemble neural network (ENN) model with respect to the considered error factors in comparison with that of the earlier models available in the literature. PMID:27034973
Jung, Kwan Ho; Lee, Keun-Hyeung
2015-09-15
A peptide-based ensemble for the detection of cyanide ions in 100% aqueous solutions was designed on the basis of the copper binding motif. 7-Nitro-2,1,3-benzoxadiazole-labeled tripeptide (NBD-SSH, NBD-SerSerHis) formed the ensemble with Cu(2+), leading to a change in the color of the solution from yellow to orange and a complete decrease of fluorescence emission. The ensemble (NBD-SSH-Cu(2+)) sensitively and selectively detected a low concentration of cyanide ions in 100% aqueous solutions by a colorimetric change as well as a fluorescent change. The addition of cyanide ions instantly removed Cu(2+) from the ensemble (NBD-SSH-Cu(2+)) in 100% aqueous solutions, resulting in a color change of the solution from orange to yellow and a "turn-on" fluorescent response. The detection limits for cyanide ions were lower than the maximum allowable level of cyanide ions in drinking water set by the World Health Organization. The peptide-based ensemble system is expected to be a potential and practical way for the detection of submicromolar concentrations of cyanide ions in 100% aqueous solutions.
Development of Super-Ensemble techniques for ocean analyses: the Mediterranean Sea case
NASA Astrophysics Data System (ADS)
Pistoia, Jenny; Pinardi, Nadia; Oddo, Paolo; Collins, Matthew; Korres, Gerasimos; Drillet, Yann
2017-04-01
Short-term ocean analyses for Sea Surface Temperature SST in the Mediterranean Sea can be improved by a statistical post-processing technique, called super-ensemble. This technique consists in a multi-linear regression algorithm applied to a Multi-Physics Multi-Model Super-Ensemble (MMSE) dataset, a collection of different operational forecasting analyses together with ad-hoc simulations produced by modifying selected numerical model parameterizations. A new linear regression algorithm based on Empirical Orthogonal Function filtering techniques is capable to prevent overfitting problems, even if best performances are achieved when we add correlation to the super-ensemble structure using a simple spatial filter applied after the linear regression. Our outcomes show that super-ensemble performances depend on the selection of an unbiased operator and the length of the learning period, but the quality of the generating MMSE dataset has the largest impact on the MMSE analysis Root Mean Square Error (RMSE) evaluated with respect to observed satellite SST. Lower RMSE analysis estimates result from the following choices: 15 days training period, an overconfident MMSE dataset (a subset with the higher quality ensemble members), and the least square algorithm being filtered a posteriori.
Hierarchical ensemble of global and local classifiers for face recognition.
Su, Yu; Shan, Shiguang; Chen, Xilin; Gao, Wen
2009-08-01
In the literature of psychophysics and neurophysiology, many studies have shown that both global and local features are crucial for face representation and recognition. This paper proposes a novel face recognition method which exploits both global and local discriminative features. In this method, global features are extracted from the whole face images by keeping the low-frequency coefficients of Fourier transform, which we believe encodes the holistic facial information, such as facial contour. For local feature extraction, Gabor wavelets are exploited considering their biological relevance. After that, Fisher's linear discriminant (FLD) is separately applied to the global Fourier features and each local patch of Gabor features. Thus, multiple FLD classifiers are obtained, each embodying different facial evidences for face recognition. Finally, all these classifiers are combined to form a hierarchical ensemble classifier. We evaluate the proposed method using two large-scale face databases: FERET and FRGC version 2.0. Experiments show that the results of our method are impressively better than the best known results with the same evaluation protocol.
Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan
2016-12-02
In the postgenomic era, the number of unreviewed protein sequences is remarkably larger and grows tremendously faster than that of reviewed ones. However, existing methods for protein subchloroplast localization often ignore the information from these unlabeled proteins. This paper proposes a multi-label predictor based on ensemble linear neighborhood propagation (LNP), namely, LNP-Chlo, which leverages hybrid sequence-based feature information from both labeled and unlabeled proteins for predicting localization of both single- and multi-label chloroplast proteins. Experimental results on a stringent benchmark dataset and a novel independent dataset suggest that LNP-Chlo performs at least 6% (absolute) better than state-of-the-art predictors. This paper also demonstrates that ensemble LNP significantly outperforms LNP based on individual features. For readers' convenience, the online Web server LNP-Chlo is freely available at http://bioinfo.eie.polyu.edu.hk/LNPChloServer/ .
Palaniappan, Rajkumar; Sundaraj, Kenneth; Sundaraj, Sebastian; Huliraj, N; Revadi, S S
2017-06-08
Auscultation is a medical procedure used for the initial diagnosis and assessment of lung and heart diseases. From this perspective, we propose assessing the performance of the extreme learning machine (ELM) classifiers for the diagnosis of pulmonary pathology using breath sounds. Energy and entropy features were extracted from the breath sound using the wavelet packet transform. The statistical significance of the extracted features was evaluated by one-way analysis of variance (ANOVA). The extracted features were inputted into the ELM classifier. The maximum classification accuracies obtained for the conventional validation (CV) of the energy and entropy features were 97.36% and 98.37%, respectively, whereas the accuracies obtained for the cross validation (CRV) of the energy and entropy features were 96.80% and 97.91%, respectively. In addition, maximum classification accuracies of 98.25% and 99.25% were obtained for the CV and CRV of the ensemble features, respectively. The results indicate that the classification accuracy obtained with the ensemble features was higher than those obtained with the energy and entropy features.
A Hyper-Heuristic Ensemble Method for Static Job-Shop Scheduling.
Hart, Emma; Sim, Kevin
2016-01-01
We describe a new hyper-heuristic method NELLI-GP for solving job-shop scheduling problems (JSSP) that evolves an ensemble of heuristics. The ensemble adopts a divide-and-conquer approach in which each heuristic solves a unique subset of the instance set considered. NELLI-GP extends an existing ensemble method called NELLI by introducing a novel heuristic generator that evolves heuristics composed of linear sequences of dispatching rules: each rule is represented using a tree structure and is itself evolved. Following a training period, the ensemble is shown to outperform both existing dispatching rules and a standard genetic programming algorithm on a large set of new test instances. In addition, it obtains superior results on a set of 210 benchmark problems from the literature when compared to two state-of-the-art hyper-heuristic approaches. Further analysis of the relationship between heuristics in the evolved ensemble and the instances each solves provides new insights into features that might describe similar instances.
Randhawa, Vinay; Kumar Singh, Anil; Acharya, Vishal
2015-12-01
Systems-biology inspired identification of drug targets and machine learning-based screening of small molecules which modulate their activity have the potential to revolutionize modern drug discovery by complementing conventional methods. To utilize the effectiveness of such pipelines, we first analyzed the dysregulated gene pairs between control and tumor samples and then implemented an ensemble-based feature selection approach to prioritize targets in oral squamous cell carcinoma (OSCC) for therapeutic exploration. Based on the structural information of known inhibitors of CXCR4-one of the best targets identified in this study-a feature selection was implemented for the identification of optimal structural features (molecular descriptor) based on which a classification model was generated. Furthermore, the CXCR4-centered descriptor-based classification model was finally utilized to screen a repository of plant derived small-molecules to obtain potential inhibitors. The application of our methodology may assist effective selection of the best targets which may have previously been overlooked, that in turn will lead to the development of new oral cancer medications. The small molecules identified in this study can be ideal candidates for trials as potential novel anti-oral cancer agents. Importantly, distinct steps of this whole study may provide reference for the analysis of other complex human diseases.
NASA Astrophysics Data System (ADS)
Brochero, Darwin; Hajji, Islem; Pina, Jasson; Plana, Queralt; Sylvain, Jean-Daniel; Vergeynst, Jenna; Anctil, Francois
2015-04-01
Theories about generalization error with ensembles are mainly based on the diversity concept, which promotes resorting to many members of different properties to support mutually agreeable decisions. Kuncheva (2004) proposed the Multi Level Diversity Model (MLDM) to promote diversity in model ensembles, combining different data subsets, input subsets, models, parameters, and including a combiner level in order to optimize the final ensemble. This work tests the hypothesis about the minimisation of the generalization error with ensembles of Neural Network (NN) structures. We used the MLDM to evaluate two different scenarios: (i) ensembles from a same NN architecture, and (ii) a super-ensemble built by a combination of sub-ensembles of many NN architectures. The time series used correspond to the 12 basins of the MOdel Parameter Estimation eXperiment (MOPEX) project that were used by Duan et al. (2006) and Vos (2013) as benchmark. Six architectures are evaluated: FeedForward NN (FFNN) trained with the Levenberg Marquardt algorithm (Hagan et al., 1996), FFNN trained with SCE (Duan et al., 1993), Recurrent NN trained with a complex method (Weins et al., 2008), Dynamic NARX NN (Leontaritis and Billings, 1985), Echo State Network (ESN), and leak integrator neuron (L-ESN) (Lukosevicius and Jaeger, 2009). Each architecture performs separately an Input Variable Selection (IVS) according to a forward stepwise selection (Anctil et al., 2009) using mean square error as objective function. Post-processing by Predictor Stepwise Selection (PSS) of the super-ensemble has been done following the method proposed by Brochero et al. (2011). IVS results showed that the lagged stream flow, lagged precipitation, and Standardized Precipitation Index (SPI) (McKee et al., 1993) were the most relevant variables. They were respectively selected as one of the firsts three selected variables in 66, 45, and 28 of the 72 scenarios. A relationship between aridity index (Arora, 2002) and NN performance showed that wet basins are more easily modelled than dry basins. Nash-Sutcliffe (NS) Efficiency criterion was used to evaluate the performance of the models. Test results showed that in 9 of the 12 basins, the mean sub-ensembles performance was better than the one presented by Vos (2013). Furthermore, in 55 of 72 cases (6 NN structures x 12 basins) the mean sub-ensemble performance was better than the best individual performance, and in 10 basins the performance of the mean super-ensemble was better than the best individual super-ensemble member. As well, it was identified that members of ESN and L-ESN sub-ensembles have very similar and good performance values. Regarding the mean super-ensemble performance, we obtained an average gain in performance of 17%, and found that PSS preserves sub-ensemble members from different NN structures, indicating the pertinence of diversity in the super-ensemble. Moreover, it was demonstrated that around 100 predictors from the different structures are enough to optimize the super-ensemble. Although sub-ensembles of FFNN-SCE showed unstable performances, FFNN-SCE members were picked-up several times in the final predictor selection. References Anctil, F., M. Filion, and J. Tournebize (2009). "A neural network experiment on the simulation of daily nitrate-nitrogen and suspended sediment fluxes from a small agricultural catchment". In: Ecol. Model. 220.6, pp. 879-887. Arora, V. K. (2002). "The use of the aridity index to assess climate change effect on annual runoff". In: J. Hydrol. 265.164, pp. 164 -177 . Brochero, D., F. Anctil, and C. Gagn'e (2011). "Simplifying a hydrological ensemble prediction system with a backward greedy selection of members Part 1: Optimization criteria". In: Hydrol. Earth Syst. Sci. 15.11, pp. 3307-3325. Duan, Q., J. Schaake, V. Andr'eassian, S. Franks, G. Goteti, H. Gupta, Y. Gusev, F. Habets, A. Hall, L. Hay, T. Hogue, M. Huang, G. Leavesley, X. Liang, O. Nasonova, J. Noilhan, L. Oudin, S. Sorooshian, T. Wagener, and E. Wood (2006). "Model Parameter Estimation Experiment (MOPEX): An overview of science strategy and major results from the second and third workshops". In: J. Hydrol. 320.12, pp. 3-17. Duan, Q., V. Gupta, and S. Sorooshian (1993). "Shuffled complex evolution approach for effective and efficient global minimization". In: J. Optimiz. Theory App. 76.3, pp. 501-521. Hagan, M. T., H. B. Demuth, and M. Beale (1996). Neural network design . 1st ed. PWS Publishing Co., p. 730. Kuncheva, L. I. (2004). Combining Pattern Classifiers: Methods and Algorithms . Wiley-Interscience, p. 350. Leontaritis, I. and S. Billings (1985). "Input-output parametric models for non-linear systems Part I: deterministic non-linear systems". In: International Journal of Control 41.2, pp. 303-328. Lukosevicius, M. and H. Jaeger (2009). "Reservoir computing approaches to recurrent neural network training". In: Computer Science Review 3.3, pp. 127-149. McKee, T., N. Doesken, and J. Kleist (1993). The Relationship of Drought Frequency and Duration to Time Scales . In: Eighth Conference on Applied Climatology. Vos, N. J. de (2013). "Echo state networks as an alternative to traditional artificial neural networks in rainfall-runoff modelling". In: Hydrol. Earth Syst. Sci. 17.1, pp. 253-267. Weins, T., R. Burton, G. Schoenau, and D. Bitner (2008). Recursive Generalized Neural Networks (RGNN) for the Modeling of a Load Sensing Pump. In: ASME Joint Conference on Fluid Power, Transmission and Control.
Fault Detection of Bearing Systems through EEMD and Optimization Algorithm
Lee, Dong-Han; Ahn, Jong-Hyo; Koh, Bong-Hwan
2017-01-01
This study proposes a fault detection and diagnosis method for bearing systems using ensemble empirical mode decomposition (EEMD) based feature extraction, in conjunction with particle swarm optimization (PSO), principal component analysis (PCA), and Isomap. First, a mathematical model is assumed to generate vibration signals from damaged bearing components, such as the inner-race, outer-race, and rolling elements. The process of decomposing vibration signals into intrinsic mode functions (IMFs) and extracting statistical features is introduced to develop a damage-sensitive parameter vector. Finally, PCA and Isomap algorithm are used to classify and visualize this parameter vector, to separate damage characteristics from healthy bearing components. Moreover, the PSO-based optimization algorithm improves the classification performance by selecting proper weightings for the parameter vector, to maximize the visualization effect of separating and grouping of parameter vectors in three-dimensional space. PMID:29143772
Tuyisenge, Viateur; Trebaul, Lena; Bhattacharjee, Manik; Chanteloup-Forêt, Blandine; Saubat-Guigui, Carole; Mîndruţă, Ioana; Rheims, Sylvain; Maillard, Louis; Kahane, Philippe; Taussig, Delphine; David, Olivier
2018-03-01
Intracranial electroencephalographic (iEEG) recordings contain "bad channels", which show non-neuronal signals. Here, we developed a new method that automatically detects iEEG bad channels using machine learning of seven signal features. The features quantified signals' variance, spatial-temporal correlation and nonlinear properties. Because the number of bad channels is usually much lower than the number of good channels, we implemented an ensemble bagging classifier known to be optimal in terms of stability and predictive accuracy for datasets with imbalanced class distributions. This method was applied on stereo-electroencephalographic (SEEG) signals recording during low frequency stimulations performed in 206 patients from 5 clinical centers. We found that the classification accuracy was extremely good: It increased with the number of subjects used to train the classifier and reached a plateau at 99.77% for 110 subjects. The classification performance was thus not impacted by the multicentric nature of data. The proposed method to automatically detect bad channels demonstrated convincing results and can be envisaged to be used on larger datasets for automatic quality control of iEEG data. This is the first method proposed to classify bad channels in iEEG and should allow to improve the data selection when reviewing iEEG signals. Copyright © 2017 International Federation of Clinical Neurophysiology. Published by Elsevier B.V. All rights reserved.
Ensemble Kalman Filter Data Assimilation in a Solar Dynamo Model
NASA Astrophysics Data System (ADS)
Dikpati, M.
2017-12-01
Despite great advancement in solar dynamo models since the first model by Parker in 1955, there remain many challenges in the quest to build a dynamo-based prediction scheme that can accurately predict the solar cycle features. One of these challenges is to implement modern data assimilation techniques, which have been used in the oceanic and atmospheric prediction models. Development of data assimilation in solar models are in the early stages. Recently, observing system simulation experiments (OSSE's) have been performed using Ensemble Kalman Filter data assimilation, in the framework of Data Assimilation Research Testbed of NCAR (NCAR-DART), for estimating parameters in a solar dynamo model. I will demonstrate how the selection of ensemble size, number of observations, amount of error in observations and the choice of assimilation interval play important role in parameter estimation. I will also show how the results of parameter reconstruction improve when accuracy in low-latitude observations is increased, despite large error in polar region data. I will then describe how implementation of data assimilation in a solar dynamo model can bring more accuracy in the prediction of polar fields in North and South hemispheres during the declining phase of cycle 24. Recent evidence indicates that the strength of the Sun's polar field during the cycle minima might be a reliable predictor for the next sunspot cycle's amplitude; therefore it is crucial to accurately predict the polar field strength and pattern.
NASA Technical Reports Server (NTRS)
Abeles, F. J.
1980-01-01
Each of the subsystems comprising the protective ensemble for firefighters is described. These include: (1) the garment system which includes turnout gear, helmets, faceshields, coats, pants, gloves, and boots; (2) the self-contained breathing system; (3) the lighting system; and (4) the communication system. The design selection rationale is discussed and the drawings used to fabricate the prototype ensemble are provided. The specifications presented were developed using the requirements and test method of the protective ensemble standard. Approximate retail prices are listed.
The Ensembl Web Site: Mechanics of a Genome Browser
Stalker, James; Gibbins, Brian; Meidl, Patrick; Smith, James; Spooner, William; Hotz, Hans-Rudolf; Cox, Antony V.
2004-01-01
The Ensembl Web site (http://www.ensembl.org/) is the principal user interface to the data of the Ensembl project, and currently serves >500,000 pages (∼2.5 million hits) per week, providing access to >80 GB (gigabyte) of data to users in more than 80 countries. Built atop an open-source platform comprising Apache/mod_perl and the MySQL relational database management system, it is modular, extensible, and freely available. It is being actively reused and extended in several different projects, and has been downloaded and installed in companies and academic institutions worldwide. Here, we describe some of the technical features of the site, with particular reference to its dynamic configuration that enables it to handle disparate data from multiple species. PMID:15123591
The Ensembl Web site: mechanics of a genome browser.
Stalker, James; Gibbins, Brian; Meidl, Patrick; Smith, James; Spooner, William; Hotz, Hans-Rudolf; Cox, Antony V
2004-05-01
The Ensembl Web site (http://www.ensembl.org/) is the principal user interface to the data of the Ensembl project, and currently serves >500,000 pages (approximately 2.5 million hits) per week, providing access to >80 GB (gigabyte) of data to users in more than 80 countries. Built atop an open-source platform comprising Apache/mod_perl and the MySQL relational database management system, it is modular, extensible, and freely available. It is being actively reused and extended in several different projects, and has been downloaded and installed in companies and academic institutions worldwide. Here, we describe some of the technical features of the site, with particular reference to its dynamic configuration that enables it to handle disparate data from multiple species.
A Random Forest-based ensemble method for activity recognition.
Feng, Zengtao; Mo, Lingfei; Li, Meng
2015-01-01
This paper presents a multi-sensor ensemble approach to human physical activity (PA) recognition, using random forest. We designed an ensemble learning algorithm, which integrates several independent Random Forest classifiers based on different sensor feature sets to build a more stable, more accurate and faster classifier for human activity recognition. To evaluate the algorithm, PA data collected from the PAMAP (Physical Activity Monitoring for Aging People), which is a standard, publicly available database, was utilized to train and test. The experimental results show that the algorithm is able to correctly recognize 19 PA types with an accuracy of 93.44%, while the training is faster than others. The ensemble classifier system based on the RF (Random Forest) algorithm can achieve high recognition accuracy and fast calculation.
NASA Astrophysics Data System (ADS)
Tamkin, G.; Schnase, J. L.; Duffy, D.; Li, J.; Strong, S.; Thompson, J. H.
2017-12-01
NASA's efforts to advance climate analytics-as-a-service are making new capabilities available to the research community: (1) A full-featured Reanalysis Ensemble Service (RES) comprising monthly means data from multiple reanalysis data sets, accessible through an enhanced set of extraction, analytic, arithmetic, and intercomparison operations. The operations are made accessible through NASA's climate data analytics Web services and our client-side Climate Data Services Python library, CDSlib; (2) A cloud-based, high-performance Virtual Real-Time Analytics Testbed supporting a select set of climate variables. This near real-time capability enables advanced technologies like Spark and Hadoop-based MapReduce analytics over native NetCDF files; and (3) A WPS-compliant Web service interface to our climate data analytics service that will enable greater interoperability with next-generation systems such as ESGF. The Reanalysis Ensemble Service includes the following: - New API that supports full temporal, spatial, and grid-based resolution services with sample queries - A Docker-ready RES application to deploy across platforms - Extended capabilities that enable single- and multiple reanalysis area average, vertical average, re-gridding, standard deviation, and ensemble averages - Convenient, one-stop shopping for commonly used data products from multiple reanalyses including basic sub-setting and arithmetic operations (e.g., avg, sum, max, min, var, count, anomaly) - Full support for the MERRA-2 reanalysis dataset in addition to, ECMWF ERA-Interim, NCEP CFSR, JMA JRA-55 and NOAA/ESRL 20CR… - A Jupyter notebook-based distribution mechanism designed for client use cases that combines CDSlib documentation with interactive scenarios and personalized project management - Supporting analytic services for NASA GMAO Forward Processing datasets - Basic uncertainty quantification services that combine heterogeneous ensemble products with comparative observational products (e.g., reanalysis, observational, visualization) - The ability to compute and visualize multiple reanalysis for ease of inter-comparisons - Automated tools to retrieve and prepare data collections for analytic processing
NASA Technical Reports Server (NTRS)
Williamson, Rebecca; Carbo, Jorge; Luna, Bernadette; Webbon, Bruce W.
1998-01-01
Wearing impermeable garments for hazardous materials clean up can often present a health and safety problem for the wearer. Even short duration clean up activities can produce heat stress injuries in hazardous materials workers. It was hypothesized that an internal cooling system might increase worker productivity and decrease likelihood of heat stress injuries in typical HazMat operations. Two HazMat protective ensembles were compared during treadmill exercise. The different ensembles were created using two different suits: a Trelleborg VPS suit representative of current HazMat suits and a prototype suit developed by NASA engineers. The two life support systems used were a current technology Interspiro Spirolite breathing apparatus and a liquid air breathing system that also provided convective cooling. Twelve local members of a HazMat team served as test subjects. They were fully instrumented to allow a complete physiological comparison of their thermal responses to the different ensembles. Results showed that cooling from the liquid air system significantly decreased thermal stress. The results of the subjective evaluations of new design features in the prototype suit were also highly favorable. Incorporation of these new design features could lead to significant operational advantages in the future.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rios Velazquez, E; Parmar, C; Narayan, V
Purpose: To compare the complementary value of quantitative radiomic features to that of radiologist-annotated semantic features in predicting EGFR mutations in lung adenocarcinomas. Methods: Pre-operative CT images of 258 lung adenocarcinoma patients were available. Tumors were segmented using the sing-click ensemble segmentation algorithm. A set of radiomic features was extracted using 3D-Slicer. Test-retest reproducibility and unsupervised dimensionality reduction were applied to select a subset of reproducible and independent radiomic features. Twenty semantic annotations were scored by an expert radiologist, describing the tumor, surrounding tissue and associated findings. Minimum-redundancy-maximum-relevance (MRMR) was used to identify the most informative radiomic and semantic featuresmore » in 172 patients (training-set, temporal split). Radiomic, semantic and combined radiomic-semantic logistic regression models to predict EGFR mutations were evaluated in and independent validation dataset of 86 patients using the area under the receiver operating curve (AUC). Results: EGFR mutations were found in 77/172 (45%) and 39/86 (45%) of the training and validation sets, respectively. Univariate AUCs showed a similar range for both feature types: radiomics median AUC = 0.57 (range: 0.50 – 0.62); semantic median AUC = 0.53 (range: 0.50 – 0.64, Wilcoxon p = 0.55). After MRMR feature selection, the best-performing radiomic, semantic, and radiomic-semantic logistic regression models, for EGFR mutations, showed a validation AUC of 0.56 (p = 0.29), 0.63 (p = 0.063) and 0.67 (p = 0.004), respectively. Conclusion: Quantitative volumetric and textural Radiomic features complement the qualitative and semi-quantitative radiologist annotations. The prognostic value of informative qualitative semantic features such as cavitation and lobulation is increased with the addition of quantitative textural features from the tumor region.« less
Law, Andrew J.; Rivlis, Gil
2014-01-01
Pioneering studies demonstrated that novel degrees of freedom could be controlled individually by directly encoding the firing rate of single motor cortex neurons, without regard to each neuron's role in controlling movement of the native limb. In contrast, recent brain-computer interface work has emphasized decoding outputs from large ensembles that include substantially more neurons than the number of degrees of freedom being controlled. To bridge the gap between direct encoding by single neurons and decoding output from large ensembles, we studied monkeys controlling one degree of freedom by comodulating up to four arbitrarily selected motor cortex neurons. Performance typically exceeded random quite early in single sessions and then continued to improve to different degrees in different sessions. We therefore examined factors that might affect performance. Performance improved with larger ensembles. In contrast, other factors that might have reflected preexisting synaptic architecture—such as the similarity of preferred directions—had little if any effect on performance. Patterns of comodulation among ensemble neurons became more consistent across trials as performance improved over single sessions. Compared with the ensemble neurons, other simultaneously recorded neurons showed less modulation. Patterns of voluntarily comodulated firing among small numbers of arbitrarily selected primary motor cortex (M1) neurons thus can be found and improved rapidly, with little constraint based on the normal relationships of the individual neurons to native limb movement. This rapid flexibility in relationships among M1 neurons may in part underlie our ability to learn new movements and improve motor skill. PMID:24920030
Mesoscale Predictability and Error Growth in Short Range Ensemble Forecasts
NASA Astrophysics Data System (ADS)
Gingrich, Mark
Although it was originally suggested that small-scale, unresolved errors corrupt forecasts at all scales through an inverse error cascade, some authors have proposed that those mesoscale circulations resulting from stationary forcing on the larger scale may inherit the predictability of the large-scale motions. Further, the relative contributions of large- and small-scale uncertainties in producing error growth in the mesoscales remain largely unknown. Here, 100 member ensemble forecasts are initialized from an ensemble Kalman filter (EnKF) to simulate two winter storms impacting the East Coast of the United States in 2010. Four verification metrics are considered: the local snow water equivalence, total liquid water, and 850 hPa temperatures representing mesoscale features; and the sea level pressure field representing a synoptic feature. It is found that while the predictability of the mesoscale features can be tied to the synoptic forecast, significant uncertainty existed on the synoptic scale at lead times as short as 18 hours. Therefore, mesoscale details remained uncertain in both storms due to uncertainties at the large scale. Additionally, the ensemble perturbation kinetic energy did not show an appreciable upscale propagation of error for either case. Instead, the initial condition perturbations from the cycling EnKF were maximized at large scales and immediately amplified at all scales without requiring initial upscale propagation. This suggests that relatively small errors in the synoptic-scale initialization may have more importance in limiting predictability than errors in the unresolved, small-scale initial conditions.
Clustering molecular dynamics trajectories for optimizing docking experiments.
De Paris, Renata; Quevedo, Christian V; Ruiz, Duncan D; Norberto de Souza, Osmar; Barros, Rodrigo C
2015-01-01
Molecular dynamics simulations of protein receptors have become an attractive tool for rational drug discovery. However, the high computational cost of employing molecular dynamics trajectories in virtual screening of large repositories threats the feasibility of this task. Computational intelligence techniques have been applied in this context, with the ultimate goal of reducing the overall computational cost so the task can become feasible. Particularly, clustering algorithms have been widely used as a means to reduce the dimensionality of molecular dynamics trajectories. In this paper, we develop a novel methodology for clustering entire trajectories using structural features from the substrate-binding cavity of the receptor in order to optimize docking experiments on a cloud-based environment. The resulting partition was selected based on three clustering validity criteria, and it was further validated by analyzing the interactions between 20 ligands and a fully flexible receptor (FFR) model containing a 20 ns molecular dynamics simulation trajectory. Our proposed methodology shows that taking into account features of the substrate-binding cavity as input for the k-means algorithm is a promising technique for accurately selecting ensembles of representative structures tailored to a specific ligand.
Biomedical named entity extraction: some issues of corpus compatibilities.
Ekbal, Asif; Saha, Sriparna; Sikdar, Utpal Kumar
2013-01-01
Named Entity (NE) extraction is one of the most fundamental and important tasks in biomedical information extraction. It involves identification of certain entities from text and their classification into some predefined categories. In the biomedical community, there is yet no general consensus regarding named entity (NE) annotation; thus, it is very difficult to compare the existing systems due to corpus incompatibilities. Due to this problem we can not also exploit the advantages of using different corpora together. In our present work we address the issues of corpus compatibilities, and use a single objective optimization (SOO) based classifier ensemble technique that uses the search capability of genetic algorithm (GA) for NE extraction in biomedicine. We hypothesize that the reliability of predictions of each classifier differs among the various output classes. We use Conditional Random Field (CRF) and Support Vector Machine (SVM) frameworks to build a number of models depending upon the various representations of the set of features and/or feature templates. It is to be noted that we tried to extract the features without using any deep domain knowledge and/or resources. In order to assess the challenges of corpus compatibilities, we experiment with the different benchmark datasets and their various combinations. Comparison results with the existing approaches prove the efficacy of the used technique. GA based ensemble achieves around 2% performance improvements over the individual classifiers. Degradation in performance on the integrated corpus clearly shows the difficulties of the task. In summary, our used ensemble based approach attains the state-of-the-art performance levels for entity extraction in three different kinds of biomedical datasets. The possible reasons behind the better performance in our used approach are the (i). use of variety and rich features as described in Subsection "Features for named entity extraction"; (ii) use of GA based classifier ensemble technique to combine the outputs of multiple classifiers.
A new approach to human microRNA target prediction using ensemble pruning and rotation forest.
Mousavi, Reza; Eftekhari, Mahdi; Haghighi, Mehdi Ghezelbash
2015-12-01
MicroRNAs (miRNAs) are small non-coding RNAs that have important functions in gene regulation. Since finding miRNA target experimentally is costly and needs spending much time, the use of machine learning methods is a growing research area for miRNA target prediction. In this paper, a new approach is proposed by using two popular ensemble strategies, i.e. Ensemble Pruning and Rotation Forest (EP-RTF), to predict human miRNA target. For EP, the approach utilizes Genetic Algorithm (GA). In other words, a subset of classifiers from the heterogeneous ensemble is first selected by GA. Next, the selected classifiers are trained based on the RTF method and then are combined using weighted majority voting. In addition to seeking a better subset of classifiers, the parameter of RTF is also optimized by GA. Findings of the present study confirm that the newly developed EP-RTF outperforms (in terms of classification accuracy, sensitivity, and specificity) the previously applied methods over four datasets in the field of human miRNA target. Diversity-error diagrams reveal that the proposed ensemble approach constructs individual classifiers which are more accurate and usually diverse than the other ensemble approaches. Given these experimental results, we highly recommend EP-RTF for improving the performance of miRNA target prediction.
New machine-learning algorithms for prediction of Parkinson's disease
NASA Astrophysics Data System (ADS)
Mandal, Indrajit; Sairam, N.
2014-03-01
This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.
Albrecht, Markus
2007-12-01
This review gives an introduction into supramolecular chemistry describing in the first part general principles, focusing on terms like noncovalent interaction, molecular recognition, self-assembly, and supramolecular function. In the second part those will be illustrated by simple examples from our laboratories. Supramolecular chemistry is the science that bridges the gap between the world of molecules and nanotechnology. In supramolecular chemistry noncovalent interactions occur between molecular building blocks, which by molecular recognition and self-assembly form (functional) supramolecular entities. It is also termed the "chemistry of the noncovalent bond." Molecular recognition is based on geometrical complementarity based on the "key-and-lock" principle with nonshape-dependent effects, e.g., solvatization, being also highly influential. Self-assembly leads to the formation of well-defined aggregates. Hereby the overall structure of the target ensemble is controlled by the symmetry features of the certain building blocks. Finally, the aggregates can possess special properties or supramolecular functions, which are only found in the ensemble but not in the participating molecules. This review gives an introduction on supramolecular chemistry and illustrates the fundamental principles by recent examples from our group.
APPRIS 2017: principal isoforms for multiple gene sets
Rodriguez-Rivas, Juan; Di Domenico, Tomás; Vázquez, Jesús; Valencia, Alfonso
2018-01-01
Abstract The APPRIS database (http://appris-tools.org) uses protein structural and functional features and information from cross-species conservation to annotate splice isoforms in protein-coding genes. APPRIS selects a single protein isoform, the ‘principal’ isoform, as the reference for each gene based on these annotations. A single main splice isoform reflects the biological reality for most protein coding genes and APPRIS principal isoforms are the best predictors of these main proteins isoforms. Here, we present the updates to the database, new developments that include the addition of three new species (chimpanzee, Drosophila melangaster and Caenorhabditis elegans), the expansion of APPRIS to cover the RefSeq gene set and the UniProtKB proteome for six species and refinements in the core methods that make up the annotation pipeline. In addition APPRIS now provides a measure of reliability for individual principal isoforms and updates with each release of the GENCODE/Ensembl and RefSeq reference sets. The individual GENCODE/Ensembl, RefSeq and UniProtKB reference gene sets for six organisms have been merged to produce common sets of splice variants. PMID:29069475
Evaluation of GCMs in the context of regional predictive climate impact studies.
NASA Astrophysics Data System (ADS)
Kokorev, Vasily; Anisimov, Oleg
2016-04-01
Significant improvements in the structure, complexity, and general performance of earth system models (ESMs) have been made in the recent decade. Despite these efforts, the range of uncertainty in predicting regional climate impacts remains large. The problem is two-fold. Firstly, there is an intrinsic conflict between the local and regional scales of climate impacts and adaptation strategies, on one hand, and larger scales, at which ESMs demonstrate better performance, on the other. Secondly, there is a growing understanding that majority of the impacts involve thresholds, and are thus driven by extreme climate events, whereas accent in climate projections is conventionally made on gradual changes in means. In this study we assess the uncertainty in projecting extreme climatic events within a region-specific and process-oriented context by examining the skills and ranking of ESMs. We developed a synthetic regionalization of Northern Eurasia that accounts for the spatial features of modern climatic changes and major environmental and socio-economical impacts. Elements of such fragmentation could be considered as natural focus regions that bridge the gap between the spatial scales adopted in climate-impacts studies and patterns of climate change simulated by ESMs. In each focus region we selected several target meteorological variables that govern the key regional impacts, and examined the ability of the models to replicate their seasonal and annual means and trends by testing them against observations. We performed a similar evaluation with regard to extremes and statistics of the target variables. And lastly, we used the results of these analyses to select sets of models that demonstrate the best performance at selected focus regions with regard to selected sets of target meteorological parameters. Ultimately, we ranked the models according to their skills, identified top-end models that "better than average" reproduce the behavior of climatic parameters, and eliminated the outliers. Since the criteria of selecting the "best" models are somewhat loose, we constructed several regional ensembles consisting of different number of high-ranked models and compared results from these optimized ensembles with observations and with the ensemble of all models. We tested our approach in specific regional application of the terrestrial Russian Arctic, considering permafrost and Artic biomes as key regional climate-dependent systems, and temperature and precipitation characteristics governing their state as target meteorological parameters. Results of this case study are deposited on the web portal www.permafrost.su/gcms
Yun, Su-Won; Park, Shin-Ae; Kim, Tae-June; Kim, Jun-Hyuk; Pak, Gi-Woong; Kim, Yong-Tae
2017-02-08
A simple, inexpensive approach is proposed for enhancing the durability of automotive proton exchange membrane fuel cells by selective promotion of the hydrogen oxidation reaction (HOR) and suppression of the oxygen reduction reaction (ORR) at the anode in startup/shutdown events. Dodecanethiol forms a self-assembled monolayer (SAM) on the surface of Pt particles, thus decreasing the number of Pt ensemble sites. Interestingly, by controlling the dodecanethiol concentration during SAM formation, the number of ensemble sites can be precisely optimized such that it is sufficient for the HOR but insufficient for the ORR. Thus, a Pt surface with an SAM of dodecanethiol clearly effects HOR-selective electrocatalysis. Clear HOR selectivity is demonstrated in unit cell tests with the actual membrane electrode assembly, as well as in an electrochemical three-electrode setup with a thin-film rotating disk electrode configuration. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Lawi, Armin; Adhitya, Yudhi
2018-03-01
The objective of this research is to determine the quality of cocoa beans through morphology of their digital images. Samples of cocoa beans were scattered on a bright white paper under a controlled lighting condition. A compact digital camera was used to capture the images. The images were then processed to extract their morphological parameters. Classification process begins with an analysis of cocoa beans image based on morphological feature extraction. Parameters for extraction of morphological or physical feature parameters, i.e., Area, Perimeter, Major Axis Length, Minor Axis Length, Aspect Ratio, Circularity, Roundness, Ferret Diameter. The cocoa beans are classified into 4 groups, i.e.: Normal Beans, Broken Beans, Fractured Beans, and Skin Damaged Beans. The model of classification used in this paper is the Multiclass Ensemble Least-Squares Support Vector Machine (MELS-SVM), a proposed improvement model of SVM using ensemble method in which the separate hyperplanes are obtained by least square approach and the multiclass procedure uses One-Against- All method. The result of our proposed model showed that the classification with morphological feature input parameters were accurately as 99.705% for the four classes, respectively.
Locally Weighted Ensemble Clustering.
Huang, Dong; Wang, Chang-Dong; Lai, Jian-Huang
2018-05-01
Due to its ability to combine multiple base clusterings into a probably better and more robust clustering, the ensemble clustering technique has been attracting increasing attention in recent years. Despite the significant success, one limitation to most of the existing ensemble clustering methods is that they generally treat all base clusterings equally regardless of their reliability, which makes them vulnerable to low-quality base clusterings. Although some efforts have been made to (globally) evaluate and weight the base clusterings, yet these methods tend to view each base clustering as an individual and neglect the local diversity of clusters inside the same base clustering. It remains an open problem how to evaluate the reliability of clusters and exploit the local diversity in the ensemble to enhance the consensus performance, especially, in the case when there is no access to data features or specific assumptions on data distribution. To address this, in this paper, we propose a novel ensemble clustering approach based on ensemble-driven cluster uncertainty estimation and local weighting strategy. In particular, the uncertainty of each cluster is estimated by considering the cluster labels in the entire ensemble via an entropic criterion. A novel ensemble-driven cluster validity measure is introduced, and a locally weighted co-association matrix is presented to serve as a summary for the ensemble of diverse clusters. With the local diversity in ensembles exploited, two novel consensus functions are further proposed. Extensive experiments on a variety of real-world datasets demonstrate the superiority of the proposed approach over the state-of-the-art.
NASA Astrophysics Data System (ADS)
Fox, Neil I.; Micheas, Athanasios C.; Peng, Yuqiang
2016-07-01
This paper introduces the use of Bayesian full Procrustes shape analysis in object-oriented meteorological applications. In particular, the Procrustes methodology is used to generate mean forecast precipitation fields from a set of ensemble forecasts. This approach has advantages over other ensemble averaging techniques in that it can produce a forecast that retains the morphological features of the precipitation structures and present the range of forecast outcomes represented by the ensemble. The production of the ensemble mean avoids the problems of smoothing that result from simple pixel or cell averaging, while producing credible sets that retain information on ensemble spread. Also in this paper, the full Bayesian Procrustes scheme is used as an object verification tool for precipitation forecasts. This is an extension of a previously presented Procrustes shape analysis based verification approach into a full Bayesian format designed to handle the verification of precipitation forecasts that match objects from an ensemble of forecast fields to a single truth image. The methodology is tested on radar reflectivity nowcasts produced in the Warning Decision Support System - Integrated Information (WDSS-II) by varying parameters in the K-means cluster tracking scheme.
Residue-level global and local ensemble-ensemble comparisons of protein domains.
Clark, Sarah A; Tronrud, Dale E; Karplus, P Andrew
2015-09-01
Many methods of protein structure generation such as NMR-based solution structure determination and template-based modeling do not produce a single model, but an ensemble of models consistent with the available information. Current strategies for comparing ensembles lose information because they use only a single representative structure. Here, we describe the ENSEMBLATOR and its novel strategy to directly compare two ensembles containing the same atoms to identify significant global and local backbone differences between them on per-atom and per-residue levels, respectively. The ENSEMBLATOR has four components: eePREP (ee for ensemble-ensemble), which selects atoms common to all models; eeCORE, which identifies atoms belonging to a cutoff-distance dependent common core; eeGLOBAL, which globally superimposes all models using the defined core atoms and calculates for each atom the two intraensemble variations, the interensemble variation, and the closest approach of members of the two ensembles; and eeLOCAL, which performs a local overlay of each dipeptide and, using a novel measure of local backbone similarity, reports the same four variations as eeGLOBAL. The combination of eeGLOBAL and eeLOCAL analyses identifies the most significant differences between ensembles. We illustrate the ENSEMBLATOR's capabilities by showing how using it to analyze NMR ensembles and to compare NMR ensembles with crystal structures provides novel insights compared to published studies. One of these studies leads us to suggest that a "consistency check" of NMR-derived ensembles may be a useful analysis step for NMR-based structure determinations in general. The ENSEMBLATOR 1.0 is available as a first generation tool to carry out ensemble-ensemble comparisons. © 2015 The Protein Society.
Residue-level global and local ensemble-ensemble comparisons of protein domains
Clark, Sarah A; Tronrud, Dale E; Andrew Karplus, P
2015-01-01
Many methods of protein structure generation such as NMR-based solution structure determination and template-based modeling do not produce a single model, but an ensemble of models consistent with the available information. Current strategies for comparing ensembles lose information because they use only a single representative structure. Here, we describe the ENSEMBLATOR and its novel strategy to directly compare two ensembles containing the same atoms to identify significant global and local backbone differences between them on per-atom and per-residue levels, respectively. The ENSEMBLATOR has four components: eePREP (ee for ensemble-ensemble), which selects atoms common to all models; eeCORE, which identifies atoms belonging to a cutoff-distance dependent common core; eeGLOBAL, which globally superimposes all models using the defined core atoms and calculates for each atom the two intraensemble variations, the interensemble variation, and the closest approach of members of the two ensembles; and eeLOCAL, which performs a local overlay of each dipeptide and, using a novel measure of local backbone similarity, reports the same four variations as eeGLOBAL. The combination of eeGLOBAL and eeLOCAL analyses identifies the most significant differences between ensembles. We illustrate the ENSEMBLATOR's capabilities by showing how using it to analyze NMR ensembles and to compare NMR ensembles with crystal structures provides novel insights compared to published studies. One of these studies leads us to suggest that a “consistency check” of NMR-derived ensembles may be a useful analysis step for NMR-based structure determinations in general. The ENSEMBLATOR 1.0 is available as a first generation tool to carry out ensemble-ensemble comparisons. PMID:26032515
Zhu, Guanhua; Liu, Wei; Bao, Chenglong; Tong, Dudu; Ji, Hui; Shen, Zuowei; Yang, Daiwen; Lu, Lanyuan
2018-05-01
The structural variations of multidomain proteins with flexible parts mediate many biological processes, and a structure ensemble can be determined by selecting a weighted combination of representative structures from a simulated structure pool, producing the best fit to experimental constraints such as interatomic distance. In this study, a hybrid structure-based and physics-based atomistic force field with an efficient sampling strategy is adopted to simulate a model di-domain protein against experimental paramagnetic relaxation enhancement (PRE) data that correspond to distance constraints. The molecular dynamics simulations produce a wide range of conformations depicted on a protein energy landscape. Subsequently, a conformational ensemble recovered with low-energy structures and the minimum-size restraint is identified in good agreement with experimental PRE rates, and the result is also supported by chemical shift perturbations and small-angle X-ray scattering data. It is illustrated that the regularizations of energy and ensemble-size prevent an arbitrary interpretation of protein conformations. Moreover, energy is found to serve as a critical control to refine the structure pool and prevent data overfitting, because the absence of energy regularization exposes ensemble construction to the noise from high-energy structures and causes a more ambiguous representation of protein conformations. Finally, we perform structure-ensemble optimizations with a topology-based structure pool, to enhance the understanding on the ensemble results from different sources of pool candidates. © 2018 Wiley Periodicals, Inc.
Combining Feature Extraction Methods to Assist the Diagnosis of Alzheimer's Disease.
Segovia, F; Górriz, J M; Ramírez, J; Phillips, C
2016-01-01
Neuroimaging data as (18)F-FDG PET is widely used to assist the diagnosis of Alzheimer's disease (AD). Looking for regions with hypoperfusion/ hypometabolism, clinicians may predict or corroborate the diagnosis of the patients. Modern computer aided diagnosis (CAD) systems based on the statistical analysis of whole neuroimages are more accurate than classical systems based on quantifying the uptake of some predefined regions of interests (ROIs). In addition, these new systems allow determining new ROIs and take advantage of the huge amount of information comprised in neuroimaging data. A major branch of modern CAD systems for AD is based on multivariate techniques, which analyse a neuroimage as a whole, considering not only the voxel intensities but also the relations among them. In order to deal with the vast dimensionality of the data, a number of feature extraction methods have been successfully applied. In this work, we propose a CAD system based on the combination of several feature extraction techniques. First, some commonly used feature extraction methods based on the analysis of the variance (as principal component analysis), on the factorization of the data (as non-negative matrix factorization) and on classical magnitudes (as Haralick features) were simultaneously applied to the original data. These feature sets were then combined by means of two different combination approaches: i) using a single classifier and a multiple kernel learning approach and ii) using an ensemble of classifier and selecting the final decision by majority voting. The proposed approach was evaluated using a labelled neuroimaging database along with a cross validation scheme. As conclusion, the proposed CAD system performed better than approaches using only one feature extraction technique. We also provide a fair comparison (using the same database) of the selected feature extraction methods.
Fluctuation instability of the Dirac Sea in quark models of strong interactions
NASA Astrophysics Data System (ADS)
Zinovjev, G. M.; Molodtsov, S. V.
2016-03-01
A number of exactly integrable (quark) models of quantum field theory that feature an infinite correlation length are considered. An instability of the standard vacuum quark ensemble, a Dirac sea (in spacetimes of dimension higher than three), is highlighted. It is due to a strong ground-state degeneracy, which, in turn, stems from a special character of the energy distribution. In the case where the momentumcutoff parameter tends to infinity, this distribution becomes infinitely narrow and leads to large (unlimited) fluctuations. A comparison of the results for various vacuum ensembles, including a Dirac sea, a neutral ensemble, a color superconductor, and a Bardeen-Cooper-Schrieffer (BCS) state, was performed. In the presence of color quark interaction, a BCS state is unambiguously chosen as the ground state of the quark ensemble.
Quantifying selective alignment of ensemble nitrogen-vacancy centers in (111) diamond
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tahara, Kosuke; Ozawa, Hayato; Iwasaki, Takayuki
2015-11-09
Selective alignment of nitrogen-vacancy (NV) centers in diamond is an important technique towards its applications. Quantification of the alignment ratio is necessary to design the optimized diamond samples. However, this is not a straightforward problem for dense ensemble of the NV centers. We estimate the alignment ratio of ensemble NV centers along the [111] direction in (111) diamond by optically detected magnetic resonance measurements. Diamond films deposited by N{sub 2} doped chemical vapor deposition have NV center densities over 1 × 10{sup 15 }cm{sup −3} and alignment ratios over 75%. Although spin coherence time (T{sub 2}) is limited to a few μs bymore » electron spins of nitrogen impurities, the combination of the selective alignment and the high density can be a possible way to optimize NV-containing diamond samples for the sensing applications.« less
Murrell, Daniel S; Cortes-Ciriano, Isidro; van Westen, Gerard J P; Stott, Ian P; Bender, Andreas; Malliavin, Thérèse E; Glen, Robert C
2015-01-01
In silico predictive models have proved to be valuable for the optimisation of compound potency, selectivity and safety profiles in the drug discovery process. camb is an R package that provides an environment for the rapid generation of quantitative Structure-Property and Structure-Activity models for small molecules (including QSAR, QSPR, QSAM, PCM) and is aimed at both advanced and beginner R users. camb's capabilities include the standardisation of chemical structure representation, computation of 905 one-dimensional and 14 fingerprint type descriptors for small molecules, 8 types of amino acid descriptors, 13 whole protein sequence descriptors, filtering methods for feature selection, generation of predictive models (using an interface to the R package caret), as well as techniques to create model ensembles using techniques from the R package caretEnsemble). Results can be visualised through high-quality, customisable plots (R package ggplot2). Overall, camb constitutes an open-source framework to perform the following steps: (1) compound standardisation, (2) molecular and protein descriptor calculation, (3) descriptor pre-processing and model training, visualisation and validation, and (4) bioactivity/property prediction for new molecules. camb aims to speed model generation, in order to provide reproducibility and tests of robustness. QSPR and proteochemometric case studies are included which demonstrate camb's application.Graphical abstractFrom compounds and data to models: a complete model building workflow in one package.
Curve Boxplot: Generalization of Boxplot for Ensembles of Curves.
Mirzargar, Mahsa; Whitaker, Ross T; Kirby, Robert M
2014-12-01
In simulation science, computational scientists often study the behavior of their simulations by repeated solutions with variations in parameters and/or boundary values or initial conditions. Through such simulation ensembles, one can try to understand or quantify the variability or uncertainty in a solution as a function of the various inputs or model assumptions. In response to a growing interest in simulation ensembles, the visualization community has developed a suite of methods for allowing users to observe and understand the properties of these ensembles in an efficient and effective manner. An important aspect of visualizing simulations is the analysis of derived features, often represented as points, surfaces, or curves. In this paper, we present a novel, nonparametric method for summarizing ensembles of 2D and 3D curves. We propose an extension of a method from descriptive statistics, data depth, to curves. We also demonstrate a set of rendering and visualization strategies for showing rank statistics of an ensemble of curves, which is a generalization of traditional whisker plots or boxplots to multidimensional curves. Results are presented for applications in neuroimaging, hurricane forecasting and fluid dynamics.
NASA Technical Reports Server (NTRS)
Steiner, B.; Kuriyama, M.; Dobbyn, R. C.; Laor, U.; Larson, D.; Brown, M.
1988-01-01
Novel, streak-like disruption features restricted to the plane of diffraction have recently been observed in images obtained by synchrotron radiation diffraction from undoped, semi-insulating gallium arsenide crystals. These features were identified as ensembles of very thin platelets or interfaces lying in (110) planes, and a structural model consisting of antiphase domain boundaries was proposed. We report here the other principal features observed in high resolution monochromatic synchrotron radiation diffraction images: (quasi) cellular structure; linear, very low-angle subgrain boundaries in (110) directions, and surface stripes in a (110) direction. In addition, we report systematic differences in the acceptance angle for images involving various diffraction vectors. When these observations are considered together, a unifying picture emerges. The presence of ensembles of thin (110) antiphase platelet regions or boundaries is generally consistent not only with the streak-like diffraction features but with the other features reported here as well. For the formation of such regions we propose two mechanisms, operating in parallel, that appear to be consistent with the various defect features observed by a variety of techniques.
NASA Technical Reports Server (NTRS)
Steiner, B.; Kuriyama, M.; Dobbyn, R. C.; Laor, U.; Larson, D.
1989-01-01
Novel, streak-like disruption features restricted to the plane of diffraction have recently been observed in images obtained by synchrotron radiation diffraction from undoped, semi-insulating gallium arsenide crystals. These features were identified as ensembles of very thin platelets or interfaces lying in (110) planes, and a structural model consisting of antiphase domain boundaries was proposed. We report here the other principal features observed in high resolution monochromatic synchrotron radiation diffraction images: (quasi) cellular structure; linear, very low-angle subgrain boundaries in (110) directions, and surface stripes in a (110) direction. In addition, we report systematic differences in the acceptance angle for images involving various diffraction vectors. When these observations are considered together, a unifying picture emerges. The presence of ensembles of thin (110) antiphase platelet regions or boundaries is generally consistent not only with the streak-like diffraction features but with the other features reported here as well. For the formation of such regions we propose two mechanisms, operating in parallel, that appear to be consistent with the various defect features observed by a variety of techniques.
Analog-Based Postprocessing of Navigation-Related Hydrological Ensemble Forecasts
NASA Astrophysics Data System (ADS)
Hemri, S.; Klein, B.
2017-11-01
Inland waterway transport benefits from probabilistic forecasts of water levels as they allow to optimize the ship load and, hence, to minimize the transport costs. Probabilistic state-of-the-art hydrologic ensemble forecasts inherit biases and dispersion errors from the atmospheric ensemble forecasts they are driven with. The use of statistical postprocessing techniques like ensemble model output statistics (EMOS) allows for a reduction of these systematic errors by fitting a statistical model based on training data. In this study, training periods for EMOS are selected based on forecast analogs, i.e., historical forecasts that are similar to the forecast to be verified. Due to the strong autocorrelation of water levels, forecast analogs have to be selected based on entire forecast hydrographs in order to guarantee similar hydrograph shapes. Custom-tailored measures of similarity for forecast hydrographs comprise hydrological series distance (SD), the hydrological matching algorithm (HMA), and dynamic time warping (DTW). Verification against observations reveals that EMOS forecasts for water level at three gauges along the river Rhine with training periods selected based on SD, HMA, and DTW compare favorably with reference EMOS forecasts, which are based on either seasonal training periods or on training periods obtained by dividing the hydrological forecast trajectories into runoff regimes.
Krystkowiak, Izabella; Lenart, Jakub; Debski, Konrad; Kuterba, Piotr; Petas, Michal; Kaminska, Bozena; Dabrowski, Michal
2013-01-01
We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql -h database.nencki-genomics.org -u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface.
Krystkowiak, Izabella; Lenart, Jakub; Debski, Konrad; Kuterba, Piotr; Petas, Michal; Kaminska, Bozena; Dabrowski, Michal
2013-01-01
We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql –h database.nencki-genomics.org –u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface. Database URL: http://www.nencki-genomics.org. PMID:24089456
Climate Model Ensemble Methodology: Rationale and Challenges
NASA Astrophysics Data System (ADS)
Vezer, M. A.; Myrvold, W.
2012-12-01
A tractable model of the Earth's atmosphere, or, indeed, any large, complex system, is inevitably unrealistic in a variety of ways. This will have an effect on the model's output. Nonetheless, we want to be able to rely on certain features of the model's output in studies aiming to detect, attribute, and project climate change. For this, we need assurance that these features reflect the target system, and are not artifacts of the unrealistic assumptions that go into the model. One technique for overcoming these limitations is to study ensembles of models which employ different simplifying assumptions and different methods of modelling. One then either takes as reliable certain outputs on which models in the ensemble agree, or takes the average of these outputs as the best estimate. Since the Intergovernmental Panel on Climate Change's Fourth Assessment Report (IPCC AR4) modellers have aimed to improve ensemble analysis by developing techniques to account for dependencies among models, and to ascribe unequal weights to models according to their performance. The goal of this paper is to present as clearly and cogently as possible the rationale for climate model ensemble methodology, the motivation of modellers to account for model dependencies, and their efforts to ascribe unequal weights to models. The method of our analysis is as follows. We will consider a simpler, well-understood case of taking the mean of a number of measurements of some quantity. Contrary to what is sometimes said, it is not a requirement of this practice that the errors of the component measurements be independent; one must, however, compensate for any lack of independence. We will also extend the usual accounts to include cases of unknown systematic error. We draw parallels between this simpler illustration and the more complex example of climate model ensembles, detailing how ensembles can provide more useful information than any of their constituent models. This account emphasizes the epistemic importance of considering degrees of model dependence, and the practice of ascribing unequal weights to models of unequal skill.
Ebadi, Ashkan; Dalboni da Rocha, Josué L.; Nagaraju, Dushyanth B.; Tovar-Moll, Fernanda; Bramati, Ivanei; Coutinho, Gabriel; Sitaram, Ranganatha; Rashidi, Parisa
2017-01-01
The human brain is a complex network of interacting regions. The gray matter regions of brain are interconnected by white matter tracts, together forming one integrative complex network. In this article, we report our investigation about the potential of applying brain connectivity patterns as an aid in diagnosing Alzheimer's disease and Mild Cognitive Impairment (MCI). We performed pattern analysis of graph theoretical measures derived from Diffusion Tensor Imaging (DTI) data representing structural brain networks of 45 subjects, consisting of 15 patients of Alzheimer's disease (AD), 15 patients of MCI, and 15 healthy subjects (CT). We considered pair-wise class combinations of subjects, defining three separate classification tasks, i.e., AD-CT, AD-MCI, and CT-MCI, and used an ensemble classification module to perform the classification tasks. Our ensemble framework with feature selection shows a promising performance with classification accuracy of 83.3% for AD vs. MCI, 80% for AD vs. CT, and 70% for MCI vs. CT. Moreover, our findings suggest that AD can be related to graph measures abnormalities at Brodmann areas in the sensorimotor cortex and piriform cortex. In this way, node redundancy coefficient and load centrality in the primary motor cortex were recognized as good indicators of AD in contrast to MCI. In general, load centrality, betweenness centrality, and closeness centrality were found to be the most relevant network measures, as they were the top identified features at different nodes. The present study can be regarded as a “proof of concept” about a procedure for the classification of MRI markers between AD dementia, MCI, and normal old individuals, due to the small and not well-defined groups of AD and MCI patients. Future studies with larger samples of subjects and more sophisticated patient exclusion criteria are necessary toward the development of a more precise technique for clinical diagnosis. PMID:28293162
Noise-Resilient Quantum Computing with a Nitrogen-Vacancy Center and Nuclear Spins.
Casanova, J; Wang, Z-Y; Plenio, M B
2016-09-23
Selective control of qubits in a quantum register for the purposes of quantum information processing represents a critical challenge for dense spin ensembles in solid-state systems. Here we present a protocol that achieves a complete set of selective electron-nuclear gates and single nuclear rotations in such an ensemble in diamond facilitated by a nearby nitrogen-vacancy (NV) center. The protocol suppresses internuclear interactions as well as unwanted coupling between the NV center and other spins of the ensemble to achieve quantum gate fidelities well exceeding 99%. Notably, our method can be applied to weakly coupled, distant spins representing a scalable procedure that exploits the exceptional properties of nuclear spins in diamond as robust quantum memories.
Bioactive focus in conformational ensembles: a pluralistic approach
NASA Astrophysics Data System (ADS)
Habgood, Matthew
2017-12-01
Computational generation of conformational ensembles is key to contemporary drug design. Selecting the members of the ensemble that will approximate the conformation most likely to bind to a desired target (the bioactive conformation) is difficult, given that the potential energy usually used to generate and rank the ensemble is a notoriously poor discriminator between bioactive and non-bioactive conformations. In this study an approach to generating a focused ensemble is proposed in which each conformation is assigned multiple rankings based not just on potential energy but also on solvation energy, hydrophobic or hydrophilic interaction energy, radius of gyration, and on a statistical potential derived from Cambridge Structural Database data. The best ranked structures derived from each system are then assembled into a new ensemble that is shown to be better focused on bioactive conformations. This pluralistic approach is tested on ensembles generated by the Molecular Operating Environment's Low Mode Molecular Dynamics module, and by the Cambridge Crystallographic Data Centre's conformation generator software.
EnsembleGraph: Interactive Visual Analysis of Spatial-Temporal Behavior for Ensemble Simulation Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shu, Qingya; Guo, Hanqi; Che, Limei
We present a novel visualization framework—EnsembleGraph— for analyzing ensemble simulation data, in order to help scientists understand behavior similarities between ensemble members over space and time. A graph-based representation is used to visualize individual spatiotemporal regions with similar behaviors, which are extracted by hierarchical clustering algorithms. A user interface with multiple-linked views is provided, which enables users to explore, locate, and compare regions that have similar behaviors between and then users can investigate and analyze the selected regions in detail. The driving application of this paper is the studies on regional emission influences over tropospheric ozone, which is based onmore » ensemble simulations conducted with different anthropogenic emission absences using the MOZART-4 (model of ozone and related tracers, version 4) model. We demonstrate the effectiveness of our method by visualizing the MOZART-4 ensemble simulation data and evaluating the relative regional emission influences on tropospheric ozone concentrations. Positive feedbacks from domain experts and two case studies prove efficiency of our method.« less
Abuassba, Adnan O M; Zhang, Dezheng; Luo, Xiong; Shaheryar, Ahmad; Ali, Hazrat
2017-01-01
Extreme Learning Machine (ELM) is a fast-learning algorithm for a single-hidden layer feedforward neural network (SLFN). It often has good generalization performance. However, there are chances that it might overfit the training data due to having more hidden nodes than needed. To address the generalization performance, we use a heterogeneous ensemble approach. We propose an Advanced ELM Ensemble (AELME) for classification, which includes Regularized-ELM, L 2 -norm-optimized ELM (ELML2), and Kernel-ELM. The ensemble is constructed by training a randomly chosen ELM classifier on a subset of training data selected through random resampling. The proposed AELM-Ensemble is evolved by employing an objective function of increasing diversity and accuracy among the final ensemble. Finally, the class label of unseen data is predicted using majority vote approach. Splitting the training data into subsets and incorporation of heterogeneous ELM classifiers result in higher prediction accuracy, better generalization, and a lower number of base classifiers, as compared to other models (Adaboost, Bagging, Dynamic ELM ensemble, data splitting ELM ensemble, and ELM ensemble). The validity of AELME is confirmed through classification on several real-world benchmark datasets.
Abuassba, Adnan O. M.; Ali, Hazrat
2017-01-01
Extreme Learning Machine (ELM) is a fast-learning algorithm for a single-hidden layer feedforward neural network (SLFN). It often has good generalization performance. However, there are chances that it might overfit the training data due to having more hidden nodes than needed. To address the generalization performance, we use a heterogeneous ensemble approach. We propose an Advanced ELM Ensemble (AELME) for classification, which includes Regularized-ELM, L2-norm-optimized ELM (ELML2), and Kernel-ELM. The ensemble is constructed by training a randomly chosen ELM classifier on a subset of training data selected through random resampling. The proposed AELM-Ensemble is evolved by employing an objective function of increasing diversity and accuracy among the final ensemble. Finally, the class label of unseen data is predicted using majority vote approach. Splitting the training data into subsets and incorporation of heterogeneous ELM classifiers result in higher prediction accuracy, better generalization, and a lower number of base classifiers, as compared to other models (Adaboost, Bagging, Dynamic ELM ensemble, data splitting ELM ensemble, and ELM ensemble). The validity of AELME is confirmed through classification on several real-world benchmark datasets. PMID:28546808
Nanni, Loris; Lumini, Alessandra
2009-01-01
The focuses of this work are: to propose a novel method for building an ensemble of classifiers for peptide classification based on substitution matrices; to show the importance to select a proper set of the parameters of the classifiers that build the ensemble of learning systems. The HIV-1 protease cleavage site prediction problem is here studied. The results obtained by a blind testing protocol are reported, the comparison with other state-of-the-art approaches, based on ensemble of classifiers, allows to quantify the performance improvement obtained by the systems proposed in this paper. The simulation based on experimentally determined protease cleavage data has demonstrated the success of these new ensemble algorithms. Particularly interesting it is to note that also if the HIV-1 protease cleavage site prediction problem is considered linearly separable we obtain the best performance using an ensemble of non-linear classifiers.
Information-Theoretic Uncertainty of SCFG-Modeled Folding Space of The Non-coding RNA
Manzourolajdad, Amirhossein; Wang, Yingfeng; Shaw, Timothy I.; Malmberg, Russell L.
2012-01-01
RNA secondary structure ensembles define probability distributions for alternative equilibrium secondary structures of an RNA sequence. Shannon’s Entropy is a measure for the amount of diversity present in any ensemble. In this work, Shannon’s entropy of the SCFG ensemble on an RNA sequence is derived and implemented in polynomial time for both structurally ambiguous and unambiguous grammars. Micro RNA sequences generally have low folding entropy, as previously discovered. Surprisingly, signs of significantly high folding entropy were observed in certain ncRNA families. More effective models coupled with targeted randomization tests can lead to a better insight into folding features of these families. PMID:23160142
Hassan, Ahnaf Rashik; Bhuiyan, Mohammed Imamul Hassan
2017-03-01
Automatic sleep staging is essential for alleviating the burden of the physicians of analyzing a large volume of data by visual inspection. It is also a precondition for making an automated sleep monitoring system feasible. Further, computerized sleep scoring will expedite large-scale data analysis in sleep research. Nevertheless, most of the existing works on sleep staging are either multichannel or multiple physiological signal based which are uncomfortable for the user and hinder the feasibility of an in-home sleep monitoring device. So, a successful and reliable computer-assisted sleep staging scheme is yet to emerge. In this work, we propose a single channel EEG based algorithm for computerized sleep scoring. In the proposed algorithm, we decompose EEG signal segments using Ensemble Empirical Mode Decomposition (EEMD) and extract various statistical moment based features. The effectiveness of EEMD and statistical features are investigated. Statistical analysis is performed for feature selection. A newly proposed classification technique, namely - Random under sampling boosting (RUSBoost) is introduced for sleep stage classification. This is the first implementation of EEMD in conjunction with RUSBoost to the best of the authors' knowledge. The proposed feature extraction scheme's performance is investigated for various choices of classification models. The algorithmic performance of our scheme is evaluated against contemporary works in the literature. The performance of the proposed method is comparable or better than that of the state-of-the-art ones. The proposed algorithm gives 88.07%, 83.49%, 92.66%, 94.23%, and 98.15% for 6-state to 2-state classification of sleep stages on Sleep-EDF database. Our experimental outcomes reveal that RUSBoost outperforms other classification models for the feature extraction framework presented in this work. Besides, the algorithm proposed in this work demonstrates high detection accuracy for the sleep states S1 and REM. Statistical moment based features in the EEMD domain distinguish the sleep states successfully and efficaciously. The automated sleep scoring scheme propounded herein can eradicate the onus of the clinicians, contribute to the device implementation of a sleep monitoring system, and benefit sleep research. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Universal LD50 predictions using deep learning
NICEATM Predictive Models for Acute Oral Systemic Toxicity LD50 entry Risa R. Sayre (sayre.risa@epa.gov) & Christopher M. Grulke Our approach uses an ensemble of multilayer perceptron regressions to predict rat acute oral LD50 values from chemical features. Features were genera...
Inhomogeneous ensembles of radical pairs in chemical compasses
Procopio, Maria; Ritz, Thorsten
2016-01-01
The biophysical basis for the ability of animals to detect the geomagnetic field and to use it for finding directions remains a mystery of sensory biology. One much debated hypothesis suggests that an ensemble of specialized light-induced radical pair reactions can provide the primary signal for a magnetic compass sensor. The question arises what features of such a radical pair ensemble could be optimized by evolution so as to improve the detection of the direction of weak magnetic fields. Here, we focus on the overlooked aspect of the noise arising from inhomogeneity of copies of biomolecules in a realistic biological environment. Such inhomogeneity leads to variations of the radical pair parameters, thereby deteriorating the signal arising from an ensemble and providing a source of noise. We investigate the effect of variations in hyperfine interactions between different copies of simple radical pairs on the directional response of a compass system. We find that the choice of radical pair parameters greatly influences how strongly the directional response of an ensemble is affected by inhomogeneity. PMID:27804956
Ensemble of classifiers for confidence-rated classification of NDE signal
NASA Astrophysics Data System (ADS)
Banerjee, Portia; Safdarnejad, Seyed; Udpa, Lalita; Udpa, Satish
2016-02-01
Ensemble of classifiers in general, aims to improve classification accuracy by combining results from multiple weak hypotheses into a single strong classifier through weighted majority voting. Improved versions of ensemble of classifiers generate self-rated confidence scores which estimate the reliability of each of its prediction and boost the classifier using these confidence-rated predictions. However, such a confidence metric is based only on the rate of correct classification. In existing works, although ensemble of classifiers has been widely used in computational intelligence, the effect of all factors of unreliability on the confidence of classification is highly overlooked. With relevance to NDE, classification results are affected by inherent ambiguity of classifica-tion, non-discriminative features, inadequate training samples and noise due to measurement. In this paper, we extend the existing ensemble classification by maximizing confidence of every classification decision in addition to minimizing the classification error. Initial results of the approach on data from eddy current inspection show improvement in classification performance of defect and non-defect indications.
NASA Astrophysics Data System (ADS)
Gutiérrez, J. M.; Primo, C.; Rodríguez, M. A.; Fernández, J.
2008-02-01
We present a novel approach to characterize and graphically represent the spatiotemporal evolution of ensembles using a simple diagram. To this aim we analyze the fluctuations obtained as differences between each member of the ensemble and the control. The lognormal character of these fluctuations suggests a characterization in terms of the first two moments of the logarithmic transformed values. On one hand, the mean is associated with the exponential growth in time. On the other hand, the variance accounts for the spatial correlation and localization of fluctuations. In this paper we introduce the MVL (Mean-Variance of Logarithms) diagram to intuitively represent the interplay and evolution of these two quantities. We show that this diagram uncovers useful information about the spatiotemporal dynamics of the ensemble. Some universal features of the diagram are also described, associated either with the nonlinear system or with the ensemble method and illustrated using both toy models and numerical weather prediction systems.
Inhomogeneous ensembles of radical pairs in chemical compasses
NASA Astrophysics Data System (ADS)
Procopio, Maria; Ritz, Thorsten
2016-11-01
The biophysical basis for the ability of animals to detect the geomagnetic field and to use it for finding directions remains a mystery of sensory biology. One much debated hypothesis suggests that an ensemble of specialized light-induced radical pair reactions can provide the primary signal for a magnetic compass sensor. The question arises what features of such a radical pair ensemble could be optimized by evolution so as to improve the detection of the direction of weak magnetic fields. Here, we focus on the overlooked aspect of the noise arising from inhomogeneity of copies of biomolecules in a realistic biological environment. Such inhomogeneity leads to variations of the radical pair parameters, thereby deteriorating the signal arising from an ensemble and providing a source of noise. We investigate the effect of variations in hyperfine interactions between different copies of simple radical pairs on the directional response of a compass system. We find that the choice of radical pair parameters greatly influences how strongly the directional response of an ensemble is affected by inhomogeneity.
NASA Astrophysics Data System (ADS)
Seiller, G.; Anctil, F.; Roy, R.
2017-09-01
This paper outlines the design and experimentation of an Empirical Multistructure Framework (EMF) for lumped conceptual hydrological modeling. This concept is inspired from modular frameworks, empirical model development, and multimodel applications, and encompasses the overproduce and select paradigm. The EMF concept aims to reduce subjectivity in conceptual hydrological modeling practice and includes model selection in the optimisation steps, reducing initial assumptions on the prior perception of the dominant rainfall-runoff transformation processes. EMF generates thousands of new modeling options from, for now, twelve parent models that share their functional components and parameters. Optimisation resorts to ensemble calibration, ranking and selection of individual child time series based on optimal bias and reliability trade-offs, as well as accuracy and sharpness improvement of the ensemble. Results on 37 snow-dominated Canadian catchments and 20 climatically-diversified American catchments reveal the excellent potential of the EMF in generating new individual model alternatives, with high respective performance values, that may be pooled efficiently into ensembles of seven to sixty constitutive members, with low bias and high accuracy, sharpness, and reliability. A group of 1446 new models is highlighted to offer good potential on other catchments or applications, based on their individual and collective interests. An analysis of the preferred functional components reveals the importance of the production and total flow elements. Overall, results from this research confirm the added value of ensemble and flexible approaches for hydrological applications, especially in uncertain contexts, and open up new modeling possibilities.
Simulating ensembles of source water quality using a K-nearest neighbor resampling approach.
Towler, Erin; Rajagopalan, Balaji; Seidel, Chad; Summers, R Scott
2009-03-01
Climatological, geological, and water management factors can cause significant variability in surface water quality. As drinking water quality standards become more stringent, the ability to quantify the variability of source water quality becomes more important for decision-making and planning in water treatment for regulatory compliance. However, paucity of long-term water quality data makes it challenging to apply traditional simulation techniques. To overcome this limitation, we have developed and applied a robust nonparametric K-nearest neighbor (K-nn) bootstrap approach utilizing the United States Environmental Protection Agency's Information Collection Rule (ICR) data. In this technique, first an appropriate "feature vector" is formed from the best available explanatory variables. The nearest neighbors to the feature vector are identified from the ICR data and are resampled using a weight function. Repetition of this results in water quality ensembles, and consequently the distribution and the quantification of the variability. The main strengths of the approach are its flexibility, simplicity, and the ability to use a large amount of spatial data with limited temporal extent to provide water quality ensembles for any given location. We demonstrate this approach by applying it to simulate monthly ensembles of total organic carbon for two utilities in the U.S. with very different watersheds and to alkalinity and bromide at two other U.S. utilities.
NASA Astrophysics Data System (ADS)
Li, Ji; Ren, Fuji
Weblogs have greatly changed the communication ways of mankind. Affective analysis of blog posts is found valuable for many applications such as text-to-speech synthesis or computer-assisted recommendation. Traditional emotion recognition in text based on single-label classification can not satisfy higher requirements of affective computing. In this paper, the automatic identification of sentence emotion in weblogs is modeled as a multi-label text categorization task. Experiments are carried out on 12273 blog sentences from the Chinese emotion corpus Ren_CECps with 8-dimension emotion annotation. An ensemble algorithm RAKEL is used to recognize dominant emotions from the writer's perspective. Our emotion feature using detailed intensity representation for word emotions outperforms the other main features such as the word frequency feature and the traditional lexicon-based feature. In order to deal with relatively complex sentences, we integrate grammatical characteristics of punctuations, disjunctive connectives, modification relations and negation into features. It achieves 13.51% and 12.49% increases for Micro-averaged F1 and Macro-averaged F1 respectively compared to the traditional lexicon-based feature. Result shows that multiple-dimension emotion representation with grammatical features can efficiently classify sentence emotion in a multi-label problem.
Ensemble perception of color in autistic adults.
Maule, John; Stanworth, Kirstie; Pellicano, Elizabeth; Franklin, Anna
2017-05-01
Dominant accounts of visual processing in autism posit that autistic individuals have an enhanced access to details of scenes [e.g., weak central coherence] which is reflected in a general bias toward local processing. Furthermore, the attenuated priors account of autism predicts that the updating and use of summary representations is reduced in autism. Ensemble perception describes the extraction of global summary statistics of a visual feature from a heterogeneous set (e.g., of faces, sizes, colors), often in the absence of local item representation. The present study investigated ensemble perception in autistic adults using a rapidly presented (500 msec) ensemble of four, eight, or sixteen elements representing four different colors. We predicted that autistic individuals would be less accurate when averaging the ensembles, but more accurate in recognizing individual ensemble colors. The results were consistent with the predictions. Averaging was impaired in autism, but only when ensembles contained four elements. Ensembles of eight or sixteen elements were averaged equally accurately across groups. The autistic group also showed a corresponding advantage in rejecting colors that were not originally seen in the ensemble. The results demonstrate the local processing bias in autism, but also suggest that the global perceptual averaging mechanism may be compromised under some conditions. The theoretical implications of the findings and future avenues for research on summary statistics in autism are discussed. Autism Res 2017, 10: 839-851. © 2016 International Society for Autism Research, Wiley Periodicals, Inc. © 2016 International Society for Autism Research, Wiley Periodicals, Inc.
Ensemble perception of color in autistic adults
Stanworth, Kirstie; Pellicano, Elizabeth; Franklin, Anna
2016-01-01
Dominant accounts of visual processing in autism posit that autistic individuals have an enhanced access to details of scenes [e.g., weak central coherence] which is reflected in a general bias toward local processing. Furthermore, the attenuated priors account of autism predicts that the updating and use of summary representations is reduced in autism. Ensemble perception describes the extraction of global summary statistics of a visual feature from a heterogeneous set (e.g., of faces, sizes, colors), often in the absence of local item representation. The present study investigated ensemble perception in autistic adults using a rapidly presented (500 msec) ensemble of four, eight, or sixteen elements representing four different colors. We predicted that autistic individuals would be less accurate when averaging the ensembles, but more accurate in recognizing individual ensemble colors. The results were consistent with the predictions. Averaging was impaired in autism, but only when ensembles contained four elements. Ensembles of eight or sixteen elements were averaged equally accurately across groups. The autistic group also showed a corresponding advantage in rejecting colors that were not originally seen in the ensemble. The results demonstrate the local processing bias in autism, but also suggest that the global perceptual averaging mechanism may be compromised under some conditions. The theoretical implications of the findings and future avenues for research on summary statistics in autism are discussed. Autism Res 2017, 10: 839–851. © 2016 The Authors Autism Research published by Wiley Periodicals, Inc. on behalf of International Society for Autism Research PMID:27874263
Fluctuation instability of the Dirac Sea in quark models of strong interactions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zinovjev, G. M., E-mail: Gennady.Zinovjev@cern.ch; Molodtsov, S. V.
A number of exactly integrable (quark) models of quantum field theory that feature an infinite correlation length are considered. An instability of the standard vacuum quark ensemble, a Dirac sea (in spacetimes of dimension higher than three), is highlighted. It is due to a strong ground-state degeneracy, which, in turn, stems from a special character of the energy distribution. In the case where the momentumcutoff parameter tends to infinity, this distribution becomes infinitely narrow and leads to large (unlimited) fluctuations. A comparison of the results for various vacuum ensembles, including a Dirac sea, a neutral ensemble, a color superconductor, andmore » a Bardeen–Cooper–Schrieffer (BCS) state, was performed. In the presence of color quark interaction, a BCS state is unambiguously chosen as the ground state of the quark ensemble.« less
20180411 - Universal LD50 predictions using deep learning (ICCVAM)
NICEATM Predictive Models for Acute Oral Systemic Toxicity LD50 entry Risa R. Sayre (sayre.risa@epa.gov) & Christopher M. Grulke Our approach uses an ensemble of multilayer perceptron regressions to predict rat acute oral LD50 values from chemical features. Features were gene...
NASA Technical Reports Server (NTRS)
Bleck, Rainer; Bao, Jian-Wen; Benjamin, Stanley G.; Brown, John M.; Fiorino, Michael; Henderson, Thomas B.; Lee, Jin-Luen; MacDonald, Alexander E.; Madden, Paul; Middlecoff, Jacques;
2015-01-01
A hydrostatic global weather prediction model based on an icosahedral horizontal grid and a hybrid terrain following/ isentropic vertical coordinate is described. The model is an extension to three spatial dimensions of a previously developed, icosahedral, shallow-water model featuring user-selectable horizontal resolution and employing indirect addressing techniques. The vertical grid is adaptive to maximize the portion of the atmosphere mapped into the isentropic coordinate subdomain. The model, best described as a stacked shallow-water model, is being tested extensively on real-time medium-range forecasts to ready it for possible inclusion in operational multimodel ensembles for medium-range to seasonal prediction.
NASA Astrophysics Data System (ADS)
Belmonte, Juan Antonio; González-García, A. César
The Nabataeans built several monuments in Petra and elsewhere displaying decoration with a certain preference for astronomical motifs. A statistical analysis of the orientation of their sacred monuments demonstrates that astronomical orientations were often part of an elaborate plan and possibly reflect traces of the astral nature of Nabataean religion. Petra and other monuments in the ancient Nabataean kingdom demonstrate the interaction between landscape features and astronomical events. Among other things, the famous Ad Deir has revealed a fascinating ensemble of light and shadow effects, perhaps connected with the bulk of Nabataean mythology, while a series of suggestive solstitial and equinoctial alignments emanate from the impressive Urn Tomb, which might have helped bring about its selection as the cathedral of the city.
NASA Astrophysics Data System (ADS)
Li, Zhanjie; Yu, Jingshan; Xu, Xinyi; Sun, Wenchao; Pang, Bo; Yue, Jiajia
2018-06-01
Hydrological models are important and effective tools for detecting complex hydrological processes. Different models have different strengths when capturing the various aspects of hydrological processes. Relying on a single model usually leads to simulation uncertainties. Ensemble approaches, based on multi-model hydrological simulations, can improve application performance over single models. In this study, the upper Yalongjiang River Basin was selected for a case study. Three commonly used hydrological models (SWAT, VIC, and BTOPMC) were selected and used for independent simulations with the same input and initial values. Then, the BP neural network method was employed to combine the results from the three models. The results show that the accuracy of BP ensemble simulation is better than that of the single models.
Intentions and Perceptions: In Search of Alignment
ERIC Educational Resources Information Center
Sindberg, Laura K.
2009-01-01
Teachers plan for instruction in band, choir, and orchestra; this typically includes selecting repertoire and planning outcomes and strategies for achieving those goals with a vision toward excellent musical performance. Teachers in school music ensembles plan instruction that will lead to student learning. In the ensemble setting, this learning…
Evaluating lossy data compression on climate simulation data within a large ensemble
Baker, Allison H.; Hammerling, Dorit M.; Mickelson, Sheri A.; ...
2016-12-07
High-resolution Earth system model simulations generate enormous data volumes, and retaining the data from these simulations often strains institutional storage resources. Further, these exceedingly large storage requirements negatively impact science objectives, for example, by forcing reductions in data output frequency, simulation length, or ensemble size. To lessen data volumes from the Community Earth System Model (CESM), we advocate the use of lossy data compression techniques. While lossy data compression does not exactly preserve the original data (as lossless compression does), lossy techniques have an advantage in terms of smaller storage requirements. To preserve the integrity of the scientific simulation data,more » the effects of lossy data compression on the original data should, at a minimum, not be statistically distinguishable from the natural variability of the climate system, and previous preliminary work with data from CESM has shown this goal to be attainable. However, to ultimately convince climate scientists that it is acceptable to use lossy data compression, we provide climate scientists with access to publicly available climate data that have undergone lossy data compression. In particular, we report on the results of a lossy data compression experiment with output from the CESM Large Ensemble (CESM-LE) Community Project, in which we challenge climate scientists to examine features of the data relevant to their interests, and attempt to identify which of the ensemble members have been compressed and reconstructed. We find that while detecting distinguishing features is certainly possible, the compression effects noticeable in these features are often unimportant or disappear in post-processing analyses. In addition, we perform several analyses that directly compare the original data to the reconstructed data to investigate the preservation, or lack thereof, of specific features critical to climate science. Overall, we conclude that applying lossy data compression to climate simulation data is both advantageous in terms of data reduction and generally acceptable in terms of effects on scientific results.« less
Evaluating lossy data compression on climate simulation data within a large ensemble
NASA Astrophysics Data System (ADS)
Baker, Allison H.; Hammerling, Dorit M.; Mickelson, Sheri A.; Xu, Haiying; Stolpe, Martin B.; Naveau, Phillipe; Sanderson, Ben; Ebert-Uphoff, Imme; Samarasinghe, Savini; De Simone, Francesco; Carbone, Francesco; Gencarelli, Christian N.; Dennis, John M.; Kay, Jennifer E.; Lindstrom, Peter
2016-12-01
High-resolution Earth system model simulations generate enormous data volumes, and retaining the data from these simulations often strains institutional storage resources. Further, these exceedingly large storage requirements negatively impact science objectives, for example, by forcing reductions in data output frequency, simulation length, or ensemble size. To lessen data volumes from the Community Earth System Model (CESM), we advocate the use of lossy data compression techniques. While lossy data compression does not exactly preserve the original data (as lossless compression does), lossy techniques have an advantage in terms of smaller storage requirements. To preserve the integrity of the scientific simulation data, the effects of lossy data compression on the original data should, at a minimum, not be statistically distinguishable from the natural variability of the climate system, and previous preliminary work with data from CESM has shown this goal to be attainable. However, to ultimately convince climate scientists that it is acceptable to use lossy data compression, we provide climate scientists with access to publicly available climate data that have undergone lossy data compression. In particular, we report on the results of a lossy data compression experiment with output from the CESM Large Ensemble (CESM-LE) Community Project, in which we challenge climate scientists to examine features of the data relevant to their interests, and attempt to identify which of the ensemble members have been compressed and reconstructed. We find that while detecting distinguishing features is certainly possible, the compression effects noticeable in these features are often unimportant or disappear in post-processing analyses. In addition, we perform several analyses that directly compare the original data to the reconstructed data to investigate the preservation, or lack thereof, of specific features critical to climate science. Overall, we conclude that applying lossy data compression to climate simulation data is both advantageous in terms of data reduction and generally acceptable in terms of effects on scientific results.
Evaluating lossy data compression on climate simulation data within a large ensemble
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, Allison H.; Hammerling, Dorit M.; Mickelson, Sheri A.
High-resolution Earth system model simulations generate enormous data volumes, and retaining the data from these simulations often strains institutional storage resources. Further, these exceedingly large storage requirements negatively impact science objectives, for example, by forcing reductions in data output frequency, simulation length, or ensemble size. To lessen data volumes from the Community Earth System Model (CESM), we advocate the use of lossy data compression techniques. While lossy data compression does not exactly preserve the original data (as lossless compression does), lossy techniques have an advantage in terms of smaller storage requirements. To preserve the integrity of the scientific simulation data,more » the effects of lossy data compression on the original data should, at a minimum, not be statistically distinguishable from the natural variability of the climate system, and previous preliminary work with data from CESM has shown this goal to be attainable. However, to ultimately convince climate scientists that it is acceptable to use lossy data compression, we provide climate scientists with access to publicly available climate data that have undergone lossy data compression. In particular, we report on the results of a lossy data compression experiment with output from the CESM Large Ensemble (CESM-LE) Community Project, in which we challenge climate scientists to examine features of the data relevant to their interests, and attempt to identify which of the ensemble members have been compressed and reconstructed. We find that while detecting distinguishing features is certainly possible, the compression effects noticeable in these features are often unimportant or disappear in post-processing analyses. In addition, we perform several analyses that directly compare the original data to the reconstructed data to investigate the preservation, or lack thereof, of specific features critical to climate science. Overall, we conclude that applying lossy data compression to climate simulation data is both advantageous in terms of data reduction and generally acceptable in terms of effects on scientific results.« less
Computing a Comprehensible Model for Spam Filtering
NASA Astrophysics Data System (ADS)
Ruiz-Sepúlveda, Amparo; Triviño-Rodriguez, José L.; Morales-Bueno, Rafael
In this paper, we describe the application of the Desicion Tree Boosting (DTB) learning model to spam email filtering.This classification task implies the learning in a high dimensional feature space. So, it is an example of how the DTB algorithm performs in such feature space problems. In [1], it has been shown that hypotheses computed by the DTB model are more comprehensible that the ones computed by another ensemble methods. Hence, this paper tries to show that the DTB algorithm maintains the same comprehensibility of hypothesis in high dimensional feature space problems while achieving the performance of other ensemble methods. Four traditional evaluation measures (precision, recall, F1 and accuracy) have been considered for performance comparison between DTB and others models usually applied to spam email filtering. The size of the hypothesis computed by a DTB is smaller and more comprehensible than the hypothesis computed by Adaboost and Naïve Bayes.
Enzymatic Detoxication, Conformational Selection, and the Role of Molten Globule Active Sites*
Honaker, Matthew T.; Acchione, Mauro; Zhang, Wei; Mannervik, Bengt; Atkins, William M.
2013-01-01
The role of conformational ensembles in enzymatic reactions remains unclear. Discussion concerning “induced fit” versus “conformational selection” has, however, ignored detoxication enzymes, which exhibit catalytic promiscuity. These enzymes dominate drug metabolism and determine drug-drug interactions. The detoxication enzyme glutathione transferase A1–1 (GSTA1–1), exploits a molten globule-like active site to achieve remarkable catalytic promiscuity wherein the substrate-free conformational ensemble is broad with barrierless transitions between states. A quantitative index of catalytic promiscuity is used to compare engineered variants of GSTA1–1 and the catalytic promiscuity correlates strongly with characteristics of the thermodynamic partition function, for the substrate-free enzymes. Access to chemically disparate transition states is encoded by the substrate-free conformational ensemble. Pre-steady state catalytic data confirm an extension of the conformational selection model, wherein different substrates select different starting conformations. The kinetic liability of the conformational breadth is minimized by a smooth landscape. We propose that “local” molten globule behavior optimizes detoxication enzymes. PMID:23649628
Chang, Chi-Ying; Chang, Chia-Chi; Hsiao, Tzu-Chien
2013-01-01
Excitation-emission matrix (EEM) fluorescence spectroscopy is a noninvasive method for tissue diagnosis and has become important in clinical use. However, the intrinsic characterization of EEM fluorescence remains unclear. Photobleaching and the complexity of the chemical compounds make it difficult to distinguish individual compounds due to overlapping features. Conventional studies use principal component analysis (PCA) for EEM fluorescence analysis, and the relationship between the EEM features extracted by PCA and diseases has been examined. The spectral features of different tissue constituents are not fully separable or clearly defined. Recently, a non-stationary method called multi-dimensional ensemble empirical mode decomposition (MEEMD) was introduced; this method can extract the intrinsic oscillations on multiple spatial scales without loss of information. The aim of this study was to propose a fluorescence spectroscopy system for EEM measurements and to describe a method for extracting the intrinsic characteristics of EEM by MEEMD. The results indicate that, although PCA provides the principal factor for the spectral features associated with chemical compounds, MEEMD can provide additional intrinsic features with more reliable mapping of the chemical compounds. MEEMD has the potential to extract intrinsic fluorescence features and improve the detection of biochemical changes. PMID:24240806
Mass Conservation and Positivity Preservation with Ensemble-type Kalman Filter Algorithms
NASA Technical Reports Server (NTRS)
Janjic, Tijana; McLaughlin, Dennis B.; Cohn, Stephen E.; Verlaan, Martin
2013-01-01
Maintaining conservative physical laws numerically has long been recognized as being important in the development of numerical weather prediction (NWP) models. In the broader context of data assimilation, concerted efforts to maintain conservation laws numerically and to understand the significance of doing so have begun only recently. In order to enforce physically based conservation laws of total mass and positivity in the ensemble Kalman filter, we incorporate constraints to ensure that the filter ensemble members and the ensemble mean conserve mass and remain nonnegative through measurement updates. We show that the analysis steps of ensemble transform Kalman filter (ETKF) algorithm and ensemble Kalman filter algorithm (EnKF) can conserve the mass integral, but do not preserve positivity. Further, if localization is applied or if negative values are simply set to zero, then the total mass is not conserved either. In order to ensure mass conservation, a projection matrix that corrects for localization effects is constructed. In order to maintain both mass conservation and positivity preservation through the analysis step, we construct a data assimilation algorithms based on quadratic programming and ensemble Kalman filtering. Mass and positivity are both preserved by formulating the filter update as a set of quadratic programming problems that incorporate constraints. Some simple numerical experiments indicate that this approach can have a significant positive impact on the posterior ensemble distribution, giving results that are more physically plausible both for individual ensemble members and for the ensemble mean. The results show clear improvements in both analyses and forecasts, particularly in the presence of localized features. Behavior of the algorithm is also tested in presence of model error.
New technologies for examining the role of neuronal ensembles in drug addiction and fear.
Cruz, Fabio C; Koya, Eisuke; Guez-Barber, Danielle H; Bossert, Jennifer M; Lupica, Carl R; Shaham, Yavin; Hope, Bruce T
2013-11-01
Correlational data suggest that learned associations are encoded within neuronal ensembles. However, it has been difficult to prove that neuronal ensembles mediate learned behaviours because traditional pharmacological and lesion methods, and even newer cell type-specific methods, affect both activated and non-activated neurons. In addition, previous studies on synaptic and molecular alterations induced by learning did not distinguish between behaviourally activated and non-activated neurons. Here, we describe three new approaches--Daun02 inactivation, FACS sorting of activated neurons and Fos-GFP transgenic rats--that have been used to selectively target and study activated neuronal ensembles in models of conditioned drug effects and relapse. We also describe two new tools--Fos-tTA transgenic mice and inactivation of CREB-overexpressing neurons--that have been used to study the role of neuronal ensembles in conditioned fear.
Pairwise Classifier Ensemble with Adaptive Sub-Classifiers for fMRI Pattern Analysis.
Kim, Eunwoo; Park, HyunWook
2017-02-01
The multi-voxel pattern analysis technique is applied to fMRI data for classification of high-level brain functions using pattern information distributed over multiple voxels. In this paper, we propose a classifier ensemble for multiclass classification in fMRI analysis, exploiting the fact that specific neighboring voxels can contain spatial pattern information. The proposed method converts the multiclass classification to a pairwise classifier ensemble, and each pairwise classifier consists of multiple sub-classifiers using an adaptive feature set for each class-pair. Simulated and real fMRI data were used to verify the proposed method. Intra- and inter-subject analyses were performed to compare the proposed method with several well-known classifiers, including single and ensemble classifiers. The comparison results showed that the proposed method can be generally applied to multiclass classification in both simulations and real fMRI analyses.
Comparing ensemble learning methods based on decision tree classifiers for protein fold recognition.
Bardsiri, Mahshid Khatibi; Eftekhari, Mahdi
2014-01-01
In this paper, some methods for ensemble learning of protein fold recognition based on a decision tree (DT) are compared and contrasted against each other over three datasets taken from the literature. According to previously reported studies, the features of the datasets are divided into some groups. Then, for each of these groups, three ensemble classifiers, namely, random forest, rotation forest and AdaBoost.M1 are employed. Also, some fusion methods are introduced for combining the ensemble classifiers obtained in the previous step. After this step, three classifiers are produced based on the combination of classifiers of types random forest, rotation forest and AdaBoost.M1. Finally, the three different classifiers achieved are combined to make an overall classifier. Experimental results show that the overall classifier obtained by the genetic algorithm (GA) weighting fusion method, is the best one in comparison to previously applied methods in terms of classification accuracy.
Double-well chimeras in 2D lattice of chaotic bistable elements
NASA Astrophysics Data System (ADS)
Shepelev, I. A.; Bukh, A. V.; Vadivasova, T. E.; Anishchenko, V. S.; Zakharova, A.
2018-01-01
We investigate spatio-temporal dynamics of a 2D ensemble of nonlocally coupled chaotic cubic maps in a bistability regime. In particular, we perform a detailed study on the transition ;coherence - incoherence; for varying coupling strength for a fixed interaction radius. For the 2D ensemble we show the appearance of amplitude and phase chimera states previously reported for 1D ensembles of nonlocally coupled chaotic systems. Moreover, we uncover a novel type of chimera state, double-well chimera, which occurs due to the interplay of the bistability of the local dynamics and the 2D ensemble structure. Additionally, we find double-well chimera behavior for steady states which we call double-well chimera death. A distinguishing feature of chimera patterns observed in the lattice is that they mainly combine clusters of different chimera types: phase, amplitude and double-well chimeras.
A Canonical Ensemble Correlation Prediction Model for Seasonal Precipitation Anomaly
NASA Technical Reports Server (NTRS)
Shen, Samuel S. P.; Lau, William K. M.; Kim, Kyu-Myong; Li, Guilong
2001-01-01
This report describes an optimal ensemble forecasting model for seasonal precipitation and its error estimation. Each individual forecast is based on the canonical correlation analysis (CCA) in the spectral spaces whose bases are empirical orthogonal functions (EOF). The optimal weights in the ensemble forecasting crucially depend on the mean square error of each individual forecast. An estimate of the mean square error of a CCA prediction is made also using the spectral method. The error is decomposed onto EOFs of the predictand and decreases linearly according to the correlation between the predictor and predictand. This new CCA model includes the following features: (1) the use of area-factor, (2) the estimation of prediction error, and (3) the optimal ensemble of multiple forecasts. The new CCA model is applied to the seasonal forecasting of the United States precipitation field. The predictor is the sea surface temperature.
A Statistical Description of Neural Ensemble Dynamics
Long, John D.; Carmena, Jose M.
2011-01-01
The growing use of multi-channel neural recording techniques in behaving animals has produced rich datasets that hold immense potential for advancing our understanding of how the brain mediates behavior. One limitation of these techniques is they do not provide important information about the underlying anatomical connections among the recorded neurons within an ensemble. Inferring these connections is often intractable because the set of possible interactions grows exponentially with ensemble size. This is a fundamental challenge one confronts when interpreting these data. Unfortunately, the combination of expert knowledge and ensemble data is often insufficient for selecting a unique model of these interactions. Our approach shifts away from modeling the network diagram of the ensemble toward analyzing changes in the dynamics of the ensemble as they relate to behavior. Our contribution consists of adapting techniques from signal processing and Bayesian statistics to track the dynamics of ensemble data on time-scales comparable with behavior. We employ a Bayesian estimator to weigh prior information against the available ensemble data, and use an adaptive quantization technique to aggregate poorly estimated regions of the ensemble data space. Importantly, our method is capable of detecting changes in both the magnitude and structure of correlations among neurons missed by firing rate metrics. We show that this method is scalable across a wide range of time-scales and ensemble sizes. Lastly, the performance of this method on both simulated and real ensemble data is used to demonstrate its utility. PMID:22319486
Learning Compositional Shape Models of Multiple Distance Metrics by Information Projection.
Luo, Ping; Lin, Liang; Liu, Xiaobai
2016-07-01
This paper presents a novel compositional contour-based shape model by incorporating multiple distance metrics to account for varying shape distortions or deformations. Our approach contains two key steps: 1) contour feature generation and 2) generative model pursuit. For each category, we first densely sample an ensemble of local prototype contour segments from a few positive shape examples and describe each segment using three different types of distance metrics. These metrics are diverse and complementary with each other to capture various shape deformations. We regard the parameterized contour segment plus an additive residual ϵ as a basic subspace, namely, ϵ -ball, in the sense that it represents local shape variance under the certain distance metric. Using these ϵ -balls as features, we then propose a generative learning algorithm to pursue the compositional shape model, which greedily selects the most representative features under the information projection principle. In experiments, we evaluate our model on several public challenging data sets, and demonstrate that the integration of multiple shape distance metrics is capable of dealing various shape deformations, articulations, and background clutter, hence boosting system performance.
NASA Astrophysics Data System (ADS)
Chen, Jie; Brissette, François P.; Lucas-Picher, Philippe
2016-11-01
Given the ever increasing number of climate change simulations being carried out, it has become impractical to use all of them to cover the uncertainty of climate change impacts. Various methods have been proposed to optimally select subsets of a large ensemble of climate simulations for impact studies. However, the behaviour of optimally-selected subsets of climate simulations for climate change impacts is unknown, since the transfer process from climate projections to the impact study world is usually highly non-linear. Consequently, this study investigates the transferability of optimally-selected subsets of climate simulations in the case of hydrological impacts. Two different methods were used for the optimal selection of subsets of climate scenarios, and both were found to be capable of adequately representing the spread of selected climate model variables contained in the original large ensemble. However, in both cases, the optimal subsets had limited transferability to hydrological impacts. To capture a similar variability in the impact model world, many more simulations have to be used than those that are needed to simply cover variability from the climate model variables' perspective. Overall, both optimal subset selection methods were better than random selection when small subsets were selected from a large ensemble for impact studies. However, as the number of selected simulations increased, random selection often performed better than the two optimal methods. To ensure adequate uncertainty coverage, the results of this study imply that selecting as many climate change simulations as possible is the best avenue. Where this was not possible, the two optimal methods were found to perform adequately.
Clustering Molecular Dynamics Trajectories for Optimizing Docking Experiments
De Paris, Renata; Quevedo, Christian V.; Ruiz, Duncan D.; Norberto de Souza, Osmar; Barros, Rodrigo C.
2015-01-01
Molecular dynamics simulations of protein receptors have become an attractive tool for rational drug discovery. However, the high computational cost of employing molecular dynamics trajectories in virtual screening of large repositories threats the feasibility of this task. Computational intelligence techniques have been applied in this context, with the ultimate goal of reducing the overall computational cost so the task can become feasible. Particularly, clustering algorithms have been widely used as a means to reduce the dimensionality of molecular dynamics trajectories. In this paper, we develop a novel methodology for clustering entire trajectories using structural features from the substrate-binding cavity of the receptor in order to optimize docking experiments on a cloud-based environment. The resulting partition was selected based on three clustering validity criteria, and it was further validated by analyzing the interactions between 20 ligands and a fully flexible receptor (FFR) model containing a 20 ns molecular dynamics simulation trajectory. Our proposed methodology shows that taking into account features of the substrate-binding cavity as input for the k-means algorithm is a promising technique for accurately selecting ensembles of representative structures tailored to a specific ligand. PMID:25873944
Ocean eddies and climate predictability
NASA Astrophysics Data System (ADS)
Kirtman, Ben P.; Perlin, Natalie; Siqueira, Leo
2017-12-01
A suite of coupled climate model simulations and experiments are used to examine how resolved mesoscale ocean features affect aspects of climate variability, air-sea interactions, and predictability. In combination with control simulations, experiments with the interactive ensemble coupling strategy are used to further amplify the role of the oceanic mesoscale field and the associated air-sea feedbacks and predictability. The basic intent of the interactive ensemble coupling strategy is to reduce the atmospheric noise at the air-sea interface, allowing an assessment of how noise affects the variability, and in this case, it is also used to diagnose predictability from the perspective of signal-to-noise ratios. The climate variability is assessed from the perspective of sea surface temperature (SST) variance ratios, and it is shown that, unsurprisingly, mesoscale variability significantly increases SST variance. Perhaps surprising is the fact that the presence of mesoscale ocean features even further enhances the SST variance in the interactive ensemble simulation beyond what would be expected from simple linear arguments. Changes in the air-sea coupling between simulations are assessed using pointwise convective rainfall-SST and convective rainfall-SST tendency correlations and again emphasize how the oceanic mesoscale alters the local association between convective rainfall and SST. Understanding the possible relationships between the SST-forced signal and the weather noise is critically important in climate predictability. We use the interactive ensemble simulations to diagnose this relationship, and we find that the presence of mesoscale ocean features significantly enhances this link particularly in ocean eddy rich regions. Finally, we use signal-to-noise ratios to show that the ocean mesoscale activity increases model estimated predictability in terms of convective precipitation and atmospheric upper tropospheric circulation.
Ocean eddies and climate predictability.
Kirtman, Ben P; Perlin, Natalie; Siqueira, Leo
2017-12-01
A suite of coupled climate model simulations and experiments are used to examine how resolved mesoscale ocean features affect aspects of climate variability, air-sea interactions, and predictability. In combination with control simulations, experiments with the interactive ensemble coupling strategy are used to further amplify the role of the oceanic mesoscale field and the associated air-sea feedbacks and predictability. The basic intent of the interactive ensemble coupling strategy is to reduce the atmospheric noise at the air-sea interface, allowing an assessment of how noise affects the variability, and in this case, it is also used to diagnose predictability from the perspective of signal-to-noise ratios. The climate variability is assessed from the perspective of sea surface temperature (SST) variance ratios, and it is shown that, unsurprisingly, mesoscale variability significantly increases SST variance. Perhaps surprising is the fact that the presence of mesoscale ocean features even further enhances the SST variance in the interactive ensemble simulation beyond what would be expected from simple linear arguments. Changes in the air-sea coupling between simulations are assessed using pointwise convective rainfall-SST and convective rainfall-SST tendency correlations and again emphasize how the oceanic mesoscale alters the local association between convective rainfall and SST. Understanding the possible relationships between the SST-forced signal and the weather noise is critically important in climate predictability. We use the interactive ensemble simulations to diagnose this relationship, and we find that the presence of mesoscale ocean features significantly enhances this link particularly in ocean eddy rich regions. Finally, we use signal-to-noise ratios to show that the ocean mesoscale activity increases model estimated predictability in terms of convective precipitation and atmospheric upper tropospheric circulation.
NASA Astrophysics Data System (ADS)
Vidal, Jean-Philippe; Hingray, Benoît
2014-05-01
In order to better understand the uncertainties in the climate of the next decades, an increasingly large number of increasingly diverse climate projections is being produced by the climate research community through coordinated initiatives (e.g., CMIP5, CORDEX), but also through more specific experiments at both the global scale (perturbed parameter ensembles) and the regional-to-local scale (empirical statistical downscaling ensembles). When significant efforts are put into making such projections available online, very few works focus on how to make such an enormous amount of information actually usable by the impact and adaptation community. Climate services should therefore include guidelines and recommendations for identifying subsets of climate projections that would have (1) a size manageable by downstream modelling approaches and (2) the relevant properties for informing adaptation strategies. This works proposes a generic framework for identifying tailored subsets of climate projections that would meet both the objectives and the constraints of a specific impact / adaptation study in a typical top-down approach. This decision framework builds on two main preliminary tasks that lead to critical choices in the selection strategy: (1) understanding the requirements of the specific impact / adaptation study, and (2) characterizing the (downscaled) climate projections dataset available. An impact / adaptation study has two types of requirements. First, the study may aim at various outcomes for a given climate-related feature: the best estimate of the future, the range of possible futures, a set of representative futures, or a statistically interpretable ensemble of futures. Second, impact models may come with specific constraints on climate input variables, like spatio-temporal and between-variables coherence. Additionally, when concurrent impact models are used, the most restrictive constraints have to be considered in order to be able to assess the uncertainty associated from this modelling step. Besides, the climate projection dataset available for a given study has several characteristics that will heavily condition the type of conclusions that can be reached. Indeed, the dataset at hand may or not sample different types of uncertainty (socio-economic, structural, parametric, along with internal variability). Moreover, these types are present at different steps in the well-known cascade of uncertainty, from the emission / concentration scenarios and the global climate to the regional-to-local climate. Critical choices for the selection are therefore conditioned on all features above. The type of selection (picking out, culling, or statistical sampling) is closely related to the study objectives and the uncertainty types present in the dataset. Moreover, grounds for picking out or culling projections may stem from global, regional or feature-specific present-day performance, representativeness, or covered range. An example use of this framework is a hierarchical selection for 3 classes of impact models among 3000 transient climate projections from different runs of 4 GCMs, statistically downscaled by 3 probabilistic methods, and made available for an integrated water resource adaptation study in the Durance catchment (southern French Alps). This work is part of the GICC R2D2-20501 project (Risk, water Resources and sustainable Development of the Durance catchment in 2050) and the EU FP7 COMPLEX2 project (Knowledge Based Climate Mitigation Systems for a Low Carbon Economy).
Comparing Pattern Recognition Feature Sets for Sorting Triples in the FIRST Database
NASA Astrophysics Data System (ADS)
Proctor, D. D.
2006-07-01
Pattern recognition techniques have been used with increasing success for coping with the tremendous amounts of data being generated by automated surveys. Usually this process involves construction of training sets, the typical examples of data with known classifications. Given a feature set, along with the training set, statistical methods can be employed to generate a classifier. The classifier is then applied to process the remaining data. Feature set selection, however, is still an issue. This paper presents techniques developed for accommodating data for which a substantive portion of the training set cannot be classified unambiguously, a typical case for low-resolution data. Significance tests on the sort-ordered, sample-size-normalized vote distribution of an ensemble of decision trees is introduced as a method of evaluating relative quality of feature sets. The technique is applied to comparing feature sets for sorting a particular radio galaxy morphology, bent-doubles, from the Faint Images of the Radio Sky at Twenty Centimeters (FIRST) database. Also examined are alternative functional forms for feature sets. Associated standard deviations provide the means to evaluate the effect of the number of folds, the number of classifiers per fold, and the sample size on the resulting classifications. The technique also may be applied to situations for which, although accurate classifications are available, the feature set is clearly inadequate, but is desired nonetheless to make the best of available information.
Xu, Jing; Wang, Zhongbin; Tan, Chao; Si, Lei; Liu, Xinhua
2015-01-01
In order to guarantee the stable operation of shearers and promote construction of an automatic coal mining working face, an online cutting pattern recognition method with high accuracy and speed based on Improved Ensemble Empirical Mode Decomposition (IEEMD) and Probabilistic Neural Network (PNN) is proposed. An industrial microphone is installed on the shearer and the cutting sound is collected as the recognition criterion to overcome the disadvantages of giant size, contact measurement and low identification rate of traditional detectors. To avoid end-point effects and get rid of undesirable intrinsic mode function (IMF) components in the initial signal, IEEMD is conducted on the sound. The end-point continuation based on the practical storage data is performed first to overcome the end-point effect. Next the average correlation coefficient, which is calculated by the correlation of the first IMF with others, is introduced to select essential IMFs. Then the energy and standard deviation of the reminder IMFs are extracted as features and PNN is applied to classify the cutting patterns. Finally, a simulation example, with an accuracy of 92.67%, and an industrial application prove the efficiency and correctness of the proposed method. PMID:26528985
Wang, Huaqing; Li, Ruitong; Tang, Gang; Yuan, Hongfang; Zhao, Qingliang; Cao, Xi
2014-01-01
A Compound fault signal usually contains multiple characteristic signals and strong confusion noise, which makes it difficult to separate week fault signals from them through conventional ways, such as FFT-based envelope detection, wavelet transform or empirical mode decomposition individually. In order to improve the compound faults diagnose of rolling bearings via signals’ separation, the present paper proposes a new method to identify compound faults from measured mixed-signals, which is based on ensemble empirical mode decomposition (EEMD) method and independent component analysis (ICA) technique. With the approach, a vibration signal is firstly decomposed into intrinsic mode functions (IMF) by EEMD method to obtain multichannel signals. Then, according to a cross correlation criterion, the corresponding IMF is selected as the input matrix of ICA. Finally, the compound faults can be separated effectively by executing ICA method, which makes the fault features more easily extracted and more clearly identified. Experimental results validate the effectiveness of the proposed method in compound fault separating, which works not only for the outer race defect, but also for the rollers defect and the unbalance fault of the experimental system. PMID:25289644
Ensemble habitat mapping of invasive plant species
Stohlgren, T.J.; Ma, P.; Kumar, S.; Rocca, M.; Morisette, J.T.; Jarnevich, C.S.; Benson, N.
2010-01-01
Ensemble species distribution models combine the strengths of several species environmental matching models, while minimizing the weakness of any one model. Ensemble models may be particularly useful in risk analysis of recently arrived, harmful invasive species because species may not yet have spread to all suitable habitats, leaving species-environment relationships difficult to determine. We tested five individual models (logistic regression, boosted regression trees, random forest, multivariate adaptive regression splines (MARS), and maximum entropy model or Maxent) and ensemble modeling for selected nonnative plant species in Yellowstone and Grand Teton National Parks, Wyoming; Sequoia and Kings Canyon National Parks, California, and areas of interior Alaska. The models are based on field data provided by the park staffs, combined with topographic, climatic, and vegetation predictors derived from satellite data. For the four invasive plant species tested, ensemble models were the only models that ranked in the top three models for both field validation and test data. Ensemble models may be more robust than individual species-environment matching models for risk analysis. ?? 2010 Society for Risk Analysis.
ERIC Educational Resources Information Center
Dunst, C. J.; And Others
1981-01-01
The structural features of sensorimotor intelligence were assessed among three groups of retarded infants and toddlers. Hierarchical cluster analysis (HCA) was performed on two measures of relationship (stage congruence and intercorrelations). The potential utility of HCA for studying Piaget's "structure d'ensemble" stage criteria is…
Numerical approach on dynamic self-assembly of colloidal particles
NASA Astrophysics Data System (ADS)
Ibrahimi, Muhamet; Ilday, Serim; Makey, Ghaith; Pavlov, Ihor; Yavuz, Özgàn; Gulseren, Oguz; Ilday, Fatih Omer
Far from equilibrium systems of artificial ensembles are crucial for understanding many intelligent features in self-organized natural systems. However, the lack of established theory underlies a need for numerical implementations. Inspired by a novel work, we simulate a solution-suspended colloidal system that dynamically self assembles due to convective forces generated in the solvent when heated by a laser. In order to incorporate with random fluctuations of particles and continuously changing flow, we exploit a random-walk based Brownian motion model and a fluid dynamics solver prepared for games, respectively. Simulation results manage to fit to experiments and show many quantitative features of a non equilibrium dynamic self assembly, including phase space compression and an ensemble-energy input feedback loop.
Ensemble of One-Class Classifiers for Personal Risk Detection Based on Wearable Sensor Data.
Rodríguez, Jorge; Barrera-Animas, Ari Y; Trejo, Luis A; Medina-Pérez, Miguel Angel; Monroy, Raúl
2016-09-29
This study introduces the One-Class K-means with Randomly-projected features Algorithm (OCKRA). OCKRA is an ensemble of one-class classifiers built over multiple projections of a dataset according to random feature subsets. Algorithms found in the literature spread over a wide range of applications where ensembles of one-class classifiers have been satisfactorily applied; however, none is oriented to the area under our study: personal risk detection. OCKRA has been designed with the aim of improving the detection performance in the problem posed by the Personal RIsk DEtection(PRIDE) dataset. PRIDE was built based on 23 test subjects, where the data for each user were captured using a set of sensors embedded in a wearable band. The performance of OCKRA was compared against support vector machine and three versions of the Parzen window classifier. On average, experimental results show that OCKRA outperformed the other classifiers for at least 0.53% of the area under the curve (AUC). In addition, OCKRA achieved an AUC above 90% for more than 57% of the users.
Ensemble of One-Class Classifiers for Personal Risk Detection Based on Wearable Sensor Data
Rodríguez, Jorge; Barrera-Animas, Ari Y.; Trejo, Luis A.; Medina-Pérez, Miguel Angel; Monroy, Raúl
2016-01-01
This study introduces the One-Class K-means with Randomly-projected features Algorithm (OCKRA). OCKRA is an ensemble of one-class classifiers built over multiple projections of a dataset according to random feature subsets. Algorithms found in the literature spread over a wide range of applications where ensembles of one-class classifiers have been satisfactorily applied; however, none is oriented to the area under our study: personal risk detection. OCKRA has been designed with the aim of improving the detection performance in the problem posed by the Personal RIsk DEtection(PRIDE) dataset. PRIDE was built based on 23 test subjects, where the data for each user were captured using a set of sensors embedded in a wearable band. The performance of OCKRA was compared against support vector machine and three versions of the Parzen window classifier. On average, experimental results show that OCKRA outperformed the other classifiers for at least 0.53% of the area under the curve (AUC). In addition, OCKRA achieved an AUC above 90% for more than 57% of the users. PMID:27690054
Ross, Joseph S; Bates, Jonathan; Parzynski, Craig S; Akar, Joseph G; Curtis, Jeptha P; Desai, Nihar R; Freeman, James V; Gamble, Ginger M; Kuntz, Richard; Li, Shu-Xia; Marinac-Dabic, Danica; Masoudi, Frederick A; Normand, Sharon-Lise T; Ranasinghe, Isuru; Shaw, Richard E; Krumholz, Harlan M
2017-01-01
Machine learning methods may complement traditional analytic methods for medical device surveillance. Using data from the National Cardiovascular Data Registry for implantable cardioverter-defibrillators (ICDs) linked to Medicare administrative claims for longitudinal follow-up, we applied three statistical approaches to safety-signal detection for commonly used dual-chamber ICDs that used two propensity score (PS) models: one specified by subject-matter experts (PS-SME), and the other one by machine learning-based selection (PS-ML). The first approach used PS-SME and cumulative incidence (time-to-event), the second approach used PS-SME and cumulative risk (Data Extraction and Longitudinal Trend Analysis [DELTA]), and the third approach used PS-ML and cumulative risk (embedded feature selection). Safety-signal surveillance was conducted for eleven dual-chamber ICD models implanted at least 2,000 times over 3 years. Between 2006 and 2010, there were 71,948 Medicare fee-for-service beneficiaries who received dual-chamber ICDs. Cumulative device-specific unadjusted 3-year event rates varied for three surveyed safety signals: death from any cause, 12.8%-20.9%; nonfatal ICD-related adverse events, 19.3%-26.3%; and death from any cause or nonfatal ICD-related adverse event, 27.1%-37.6%. Agreement among safety signals detected/not detected between the time-to-event and DELTA approaches was 90.9% (360 of 396, k =0.068), between the time-to-event and embedded feature-selection approaches was 91.7% (363 of 396, k =-0.028), and between the DELTA and embedded feature selection approaches was 88.1% (349 of 396, k =-0.042). Three statistical approaches, including one machine learning method, identified important safety signals, but without exact agreement. Ensemble methods may be needed to detect all safety signals for further evaluation during medical device surveillance.
ERIC Educational Resources Information Center
Randles, Clint
2013-01-01
This column offers the personal reflections of the author on being a member of the band Touch, an iPad performing ensemble composed of music education faculty members and doctoral students at the University of South Florida. The ensemble primarily performed its own arrangements of popular music selections from a number of genres including, but not…
Evaluative and Behavioral Correlates to Intrarehearsal Achievement in High School Bands
ERIC Educational Resources Information Center
Montemayor, Mark
2014-01-01
The purpose of this study was to investigate relationships of teaching effectiveness, ensemble performance quality, and selected rehearsal procedures to various measures of intrarehearsal achievement (i.e., musical improvement exhibited by an ensemble during the course of a single rehearsal). Twenty-nine high school bands were observed in two…
NASA Astrophysics Data System (ADS)
Chan, Yi-Tung; Wang, Shuenn-Jyi; Tsai, Chung-Hsien
2017-09-01
Public safety is a matter of national security and people's livelihoods. In recent years, intelligent video-surveillance systems have become important active-protection systems. A surveillance system that provides early detection and threat assessment could protect people from crowd-related disasters and ensure public safety. Image processing is commonly used to extract features, e.g., people, from a surveillance video. However, little research has been conducted on the relationship between foreground detection and feature extraction. Most current video-surveillance research has been developed for restricted environments, in which the extracted features are limited by having information from a single foreground; they do not effectively represent the diversity of crowd behavior. This paper presents a general framework based on extracting ensemble features from the foreground of a surveillance video to analyze a crowd. The proposed method can flexibly integrate different foreground-detection technologies to adapt to various monitored environments. Furthermore, the extractable representative features depend on the heterogeneous foreground data. Finally, a classification algorithm is applied to these features to automatically model crowd behavior and distinguish an abnormal event from normal patterns. The experimental results demonstrate that the proposed method's performance is both comparable to that of state-of-the-art methods and satisfies the requirements of real-time applications.
Quantifying rapid changes in cardiovascular state with a moving ensemble average.
Cieslak, Matthew; Ryan, William S; Babenko, Viktoriya; Erro, Hannah; Rathbun, Zoe M; Meiring, Wendy; Kelsey, Robert M; Blascovich, Jim; Grafton, Scott T
2018-04-01
MEAP, the moving ensemble analysis pipeline, is a new open-source tool designed to perform multisubject preprocessing and analysis of cardiovascular data, including electrocardiogram (ECG), impedance cardiogram (ICG), and continuous blood pressure (BP). In addition to traditional ensemble averaging, MEAP implements a moving ensemble averaging method that allows for the continuous estimation of indices related to cardiovascular state, including cardiac output, preejection period, heart rate variability, and total peripheral resistance, among others. Here, we define the moving ensemble technique mathematically, highlighting its differences from fixed-window ensemble averaging. We describe MEAP's interface and features for signal processing, artifact correction, and cardiovascular-based fMRI analysis. We demonstrate the accuracy of MEAP's novel B point detection algorithm on a large collection of hand-labeled ICG waveforms. As a proof of concept, two subjects completed a series of four physical and cognitive tasks (cold pressor, Valsalva maneuver, video game, random dot kinetogram) on 3 separate days while ECG, ICG, and BP were recorded. Critically, the moving ensemble method reliably captures the rapid cyclical cardiovascular changes related to the baroreflex during the Valsalva maneuver and the classic cold pressor response. Cardiovascular measures were seen to vary considerably within repetitions of the same cognitive task for each individual, suggesting that a carefully designed paradigm could be used to capture fast-acting event-related changes in cardiovascular state. © 2017 Society for Psychophysiological Research.
Impact of ensemble learning in the assessment of skeletal maturity.
Cunha, Pedro; Moura, Daniel C; Guevara López, Miguel Angel; Guerra, Conceição; Pinto, Daniela; Ramos, Isabel
2014-09-01
The assessment of the bone age, or skeletal maturity, is an important task in pediatrics that measures the degree of maturation of children's bones. Nowadays, there is no standard clinical procedure for assessing bone age and the most widely used approaches are the Greulich and Pyle and the Tanner and Whitehouse methods. Computer methods have been proposed to automatize the process; however, there is a lack of exploration about how to combine the features of the different parts of the hand, and how to take advantage of ensemble techniques for this purpose. This paper presents a study where the use of ensemble techniques for improving bone age assessment is evaluated. A new computer method was developed that extracts descriptors for each joint of each finger, which are then combined using different ensemble schemes for obtaining a final bone age value. Three popular ensemble schemes are explored in this study: bagging, stacking and voting. Best results were achieved by bagging with a rule-based regression (M5P), scoring a mean absolute error of 10.16 months. Results show that ensemble techniques improve the prediction performance of most of the evaluated regression algorithms, always achieving best or comparable to best results. Therefore, the success of the ensemble methods allow us to conclude that their use may improve computer-based bone age assessment, offering a scalable option for utilizing multiple regions of interest and combining their output.
Verifying and Postprocesing the Ensemble Spread-Error Relationship
NASA Astrophysics Data System (ADS)
Hopson, Tom; Knievel, Jason; Liu, Yubao; Roux, Gregory; Wu, Wanli
2013-04-01
With the increased utilization of ensemble forecasts in weather and hydrologic applications, there is a need to verify their benefit over less expensive deterministic forecasts. One such potential benefit of ensemble systems is their capacity to forecast their own forecast error through the ensemble spread-error relationship. The paper begins by revisiting the limitations of the Pearson correlation alone in assessing this relationship. Next, we introduce two new metrics to consider in assessing the utility an ensemble's varying dispersion. We argue there are two aspects of an ensemble's dispersion that should be assessed. First, and perhaps more fundamentally: is there enough variability in the ensembles dispersion to justify the maintenance of an expensive ensemble prediction system (EPS), irrespective of whether the EPS is well-calibrated or not? To diagnose this, the factor that controls the theoretical upper limit of the spread-error correlation can be useful. Secondly, does the variable dispersion of an ensemble relate to variable expectation of forecast error? Representing the spread-error correlation in relation to its theoretical limit can provide a simple diagnostic of this attribute. A context for these concepts is provided by assessing two operational ensembles: 30-member Western US temperature forecasts for the U.S. Army Test and Evaluation Command and 51-member Brahmaputra River flow forecasts of the Climate Forecast and Applications Project for Bangladesh. Both of these systems utilize a postprocessing technique based on quantile regression (QR) under a step-wise forward selection framework leading to ensemble forecasts with both good reliability and sharpness. In addition, the methodology utilizes the ensemble's ability to self-diagnose forecast instability to produce calibrated forecasts with informative skill-spread relationships. We will describe both ensemble systems briefly, review the steps used to calibrate the ensemble forecast, and present verification statistics using error-spread metrics, along with figures from operational ensemble forecasts before and after calibration.
Multiple-instance ensemble learning for hyperspectral images
NASA Astrophysics Data System (ADS)
Ergul, Ugur; Bilgin, Gokhan
2017-10-01
An ensemble framework for multiple-instance (MI) learning (MIL) is introduced for use in hyperspectral images (HSIs) by inspiring the bagging (bootstrap aggregation) method in ensemble learning. Ensemble-based bagging is performed by a small percentage of training samples, and MI bags are formed by a local windowing process with variable window sizes on selected instances. In addition to bootstrap aggregation, random subspace is another method used to diversify base classifiers. The proposed method is implemented using four MIL classification algorithms. The classifier model learning phase is carried out with MI bags, and the estimation phase is performed over single-test instances. In the experimental part of the study, two different HSIs that have ground-truth information are used, and comparative results are demonstrated with state-of-the-art classification methods. In general, the MI ensemble approach produces more compact results in terms of both diversity and error compared to equipollent non-MIL algorithms.
New technologies for examining neuronal ensembles in drug addiction and fear
Cruz, Fabio C.; Koya, Eisuke; Guez-Barber, Danielle H.; Bossert, Jennifer M.; Lupica, Carl R.; Shaham, Yavin; Hope, Bruce T.
2015-01-01
Correlational data suggest that learned associations are encoded within neuronal ensembles. However, it has been difficult to prove that neuronal ensembles mediate learned behaviours because traditional pharmacological and lesion methods, and even newer cell type-specific methods, affect both activated and non-activated neurons. Additionally, previous studies on synaptic and molecular alterations induced by learning did not distinguish between behaviourally activated and non-activated neurons. Here, we describe three new approaches—Daun02 inactivation, FACS sorting of activated neurons and c-fos-GFP transgenic rats — that have been used to selectively target and study activated neuronal ensembles in models of conditioned drug effects and relapse. We also describe two new tools — c-fos-tTA mice and inactivation of CREB-overexpressing neurons — that have been used to study the role of neuronal ensembles in conditioned fear. PMID:24088811
NASA Astrophysics Data System (ADS)
Fernández, J.; Frías, M. D.; Cabos, W. D.; Cofiño, A. S.; Domínguez, M.; Fita, L.; Gaertner, M. A.; García-Díez, M.; Gutiérrez, J. M.; Jiménez-Guerrero, P.; Liguori, G.; Montávez, J. P.; Romera, R.; Sánchez, E.
2018-03-01
We present an unprecedented ensemble of 196 future climate projections arising from different global and regional model intercomparison projects (MIPs): CMIP3, CMIP5, ENSEMBLES, ESCENA, EURO- and Med-CORDEX. This multi-MIP ensemble includes all regional climate model (RCM) projections publicly available to date, along with their driving global climate models (GCMs). We illustrate consistent and conflicting messages using continental Spain and the Balearic Islands as target region. The study considers near future (2021-2050) changes and their dependence on several uncertainty sources sampled in the multi-MIP ensemble: GCM, future scenario, internal variability, RCM, and spatial resolution. This initial work focuses on mean seasonal precipitation and temperature changes. The results show that the potential GCM-RCM combinations have been explored very unevenly, with favoured GCMs and large ensembles of a few RCMs that do not respond to any ensemble design. Therefore, the grand-ensemble is weighted towards a few models. The selection of a balanced, credible sub-ensemble is challenged in this study by illustrating several conflicting responses between the RCM and its driving GCM and among different RCMs. Sub-ensembles from different initiatives are dominated by different uncertainty sources, being the driving GCM the main contributor to uncertainty in the grand-ensemble. For this analysis of the near future changes, the emission scenario does not lead to a strong uncertainty. Despite the extra computational effort, for mean seasonal changes, the increase in resolution does not lead to important changes.
Shafizadeh-Moghadam, Hossein; Valavi, Roozbeh; Shahabi, Himan; Chapi, Kamran; Shirzadi, Ataollah
2018-07-01
In this research, eight individual machine learning and statistical models are implemented and compared, and based on their results, seven ensemble models for flood susceptibility assessment are introduced. The individual models included artificial neural networks, classification and regression trees, flexible discriminant analysis, generalized linear model, generalized additive model, boosted regression trees, multivariate adaptive regression splines, and maximum entropy, and the ensemble models were Ensemble Model committee averaging (EMca), Ensemble Model confidence interval Inferior (EMciInf), Ensemble Model confidence interval Superior (EMciSup), Ensemble Model to estimate the coefficient of variation (EMcv), Ensemble Model to estimate the mean (EMmean), Ensemble Model to estimate the median (EMmedian), and Ensemble Model based on weighted mean (EMwmean). The data set covered 201 flood events in the Haraz watershed (Mazandaran province in Iran) and 10,000 randomly selected non-occurrence points. Among the individual models, the Area Under the Receiver Operating Characteristic (AUROC), which showed the highest value, belonged to boosted regression trees (0.975) and the lowest value was recorded for generalized linear model (0.642). On the other hand, the proposed EMmedian resulted in the highest accuracy (0.976) among all models. In spite of the outstanding performance of some models, nevertheless, variability among the prediction of individual models was considerable. Therefore, to reduce uncertainty, creating more generalizable, more stable, and less sensitive models, ensemble forecasting approaches and in particular the EMmedian is recommended for flood susceptibility assessment. Copyright © 2018 Elsevier Ltd. All rights reserved.
Mixture EMOS model for calibrating ensemble forecasts of wind speed.
Baran, S; Lerch, S
2016-03-01
Ensemble model output statistics (EMOS) is a statistical tool for post-processing forecast ensembles of weather variables obtained from multiple runs of numerical weather prediction models in order to produce calibrated predictive probability density functions. The EMOS predictive probability density function is given by a parametric distribution with parameters depending on the ensemble forecasts. We propose an EMOS model for calibrating wind speed forecasts based on weighted mixtures of truncated normal (TN) and log-normal (LN) distributions where model parameters and component weights are estimated by optimizing the values of proper scoring rules over a rolling training period. The new model is tested on wind speed forecasts of the 50 member European Centre for Medium-range Weather Forecasts ensemble, the 11 member Aire Limitée Adaptation dynamique Développement International-Hungary Ensemble Prediction System ensemble of the Hungarian Meteorological Service, and the eight-member University of Washington mesoscale ensemble, and its predictive performance is compared with that of various benchmark EMOS models based on single parametric families and combinations thereof. The results indicate improved calibration of probabilistic and accuracy of point forecasts in comparison with the raw ensemble and climatological forecasts. The mixture EMOS model significantly outperforms the TN and LN EMOS methods; moreover, it provides better calibrated forecasts than the TN-LN combination model and offers an increased flexibility while avoiding covariate selection problems. © 2016 The Authors Environmetrics Published by JohnWiley & Sons Ltd.
Tanner, Evan P; Papeş, Monica; Elmore, R Dwayne; Fuhlendorf, Samuel D; Davis, Craig A
2017-01-01
Ecological niche models (ENMs) have increasingly been used to estimate the potential effects of climate change on species' distributions worldwide. Recently, predictions of species abundance have also been obtained with such models, though knowledge about the climatic variables affecting species abundance is often lacking. To address this, we used a well-studied guild (temperate North American quail) and the Maxent modeling algorithm to compare model performance of three variable selection approaches: correlation/variable contribution (CVC), biological (i.e., variables known to affect species abundance), and random. We then applied the best approach to forecast potential distributions, under future climatic conditions, and analyze future potential distributions in light of available abundance data and presence-only occurrence data. To estimate species' distributional shifts we generated ensemble forecasts using four global circulation models, four representative concentration pathways, and two time periods (2050 and 2070). Furthermore, we present distributional shifts where 75%, 90%, and 100% of our ensemble models agreed. The CVC variable selection approach outperformed our biological approach for four of the six species. Model projections indicated species-specific effects of climate change on future distributions of temperate North American quail. The Gambel's quail (Callipepla gambelii) was the only species predicted to gain area in climatic suitability across all three scenarios of ensemble model agreement. Conversely, the scaled quail (Callipepla squamata) was the only species predicted to lose area in climatic suitability across all three scenarios of ensemble model agreement. Our models projected future loss of areas for the northern bobwhite (Colinus virginianus) and scaled quail in portions of their distributions which are currently areas of high abundance. Climatic variables that influence local abundance may not always scale up to influence species' distributions. Special attention should be given to selecting variables for ENMs, and tests of model performance should be used to validate the choice of variables.
Spatio-temporal behaviour of medium-range ensemble forecasts
NASA Astrophysics Data System (ADS)
Kipling, Zak; Primo, Cristina; Charlton-Perez, Andrew
2010-05-01
Using the recently-developed mean-variance of logarithms (MVL) diagram, together with the TIGGE archive of medium-range ensemble forecasts from nine different centres, we present an analysis of the spatio-temporal dynamics of their perturbations, and show how the differences between models and perturbation techniques can explain the shape of their characteristic MVL curves. We also consider the use of the MVL diagram to compare the growth of perturbations within the ensemble with the growth of the forecast error, showing that there is a much closer correspondence for some models than others. We conclude by looking at how the MVL technique might assist in selecting models for inclusion in a multi-model ensemble, and suggest an experiment to test its potential in this context.
Ruiz, Duncan D. A.; Norberto de Souza, Osmar
2015-01-01
Protein receptor conformations, obtained from molecular dynamics (MD) simulations, have become a promising treatment of its explicit flexibility in molecular docking experiments applied to drug discovery and development. However, incorporating the entire ensemble of MD conformations in docking experiments to screen large candidate compound libraries is currently an unfeasible task. Clustering algorithms have been widely used as a means to reduce such ensembles to a manageable size. Most studies investigate different algorithms using pairwise Root-Mean Square Deviation (RMSD) values for all, or part of the MD conformations. Nevertheless, the RMSD only may not be the most appropriate gauge to cluster conformations when the target receptor has a plastic active site, since they are influenced by changes that occur on other parts of the structure. Hence, we have applied two partitioning methods (k-means and k-medoids) and four agglomerative hierarchical methods (Complete linkage, Ward’s, Unweighted Pair Group Method and Weighted Pair Group Method) to analyze and compare the quality of partitions between a data set composed of properties from an enzyme receptor substrate-binding cavity and two data sets created using different RMSD approaches. Ensembles of representative MD conformations were generated by selecting a medoid of each group from all partitions analyzed. We investigated the performance of our new method for evaluating binding conformation of drug candidates to the InhA enzyme, which were performed by cross-docking experiments between a 20 ns MD trajectory and 20 different ligands. Statistical analyses showed that the novel ensemble, which is represented by only 0.48% of the MD conformations, was able to reproduce 75% of all dynamic behaviors within the binding cavity for the docking experiments performed. Moreover, this new approach not only outperforms the other two RMSD-clustering solutions, but it also shows to be a promising strategy to distill biologically relevant information from MD trajectories, especially for docking purposes. PMID:26218832
De Paris, Renata; Quevedo, Christian V; Ruiz, Duncan D A; Norberto de Souza, Osmar
2015-01-01
Protein receptor conformations, obtained from molecular dynamics (MD) simulations, have become a promising treatment of its explicit flexibility in molecular docking experiments applied to drug discovery and development. However, incorporating the entire ensemble of MD conformations in docking experiments to screen large candidate compound libraries is currently an unfeasible task. Clustering algorithms have been widely used as a means to reduce such ensembles to a manageable size. Most studies investigate different algorithms using pairwise Root-Mean Square Deviation (RMSD) values for all, or part of the MD conformations. Nevertheless, the RMSD only may not be the most appropriate gauge to cluster conformations when the target receptor has a plastic active site, since they are influenced by changes that occur on other parts of the structure. Hence, we have applied two partitioning methods (k-means and k-medoids) and four agglomerative hierarchical methods (Complete linkage, Ward's, Unweighted Pair Group Method and Weighted Pair Group Method) to analyze and compare the quality of partitions between a data set composed of properties from an enzyme receptor substrate-binding cavity and two data sets created using different RMSD approaches. Ensembles of representative MD conformations were generated by selecting a medoid of each group from all partitions analyzed. We investigated the performance of our new method for evaluating binding conformation of drug candidates to the InhA enzyme, which were performed by cross-docking experiments between a 20 ns MD trajectory and 20 different ligands. Statistical analyses showed that the novel ensemble, which is represented by only 0.48% of the MD conformations, was able to reproduce 75% of all dynamic behaviors within the binding cavity for the docking experiments performed. Moreover, this new approach not only outperforms the other two RMSD-clustering solutions, but it also shows to be a promising strategy to distill biologically relevant information from MD trajectories, especially for docking purposes.
Synaptic Ensemble Underlying the Selection and Consolidation of Neuronal Circuits during Learning.
Hoshiba, Yoshio; Wada, Takeyoshi; Hayashi-Takagi, Akiko
2017-01-01
Memories are crucial to the cognitive essence of who we are as human beings. Accumulating evidence has suggested that memories are stored as a subset of neurons that probably fire together in the same ensemble. Such formation of cell ensembles must meet contradictory requirements of being plastic and responsive during learning, but also stable in order to maintain the memory. Although synaptic potentiation is presumed to be the cellular substrate for this process, the link between the two remains correlational. With the application of the latest optogenetic tools, it has been possible to collect direct evidence of the contributions of synaptic potentiation in the formation and consolidation of cell ensemble in a learning task specific manner. In this review, we summarize the current view of the causative role of synaptic plasticity as the cellular mechanism underlying the encoding of memory and recalling of learned memories. In particular, we will be focusing on the latest optoprobe developed for the visualization of such "synaptic ensembles." We further discuss how a new synaptic ensemble could contribute to the formation of cell ensembles during learning and memory. With the development and application of novel research tools in the future, studies on synaptic ensembles will pioneer new discoveries, eventually leading to a comprehensive understanding of how the brain works.
The Fukushima-137Cs deposition case study: properties of the multi-model ensemble.
Solazzo, E; Galmarini, S
2015-01-01
In this paper we analyse the properties of an eighteen-member ensemble generated by the combination of five atmospheric dispersion modelling systems and six meteorological data sets. The models have been applied to the total deposition of (137)Cs, following the nuclear accident of the Fukushima power plant in March 2011. Analysis is carried out with the scope of determining whether the ensemble is reliable, sufficiently diverse and if its accuracy and precision can be improved. Although ensemble practice is becoming more and more popular in many geophysical applications, good practice guidelines are missing as to how models should be combined for the ensembles to offer an improvement over single model realisations. We show that the ensemble of models share large portions of bias and variance and make use of several techniques to further show that subsets of models can explain the same amount of variance as the full ensemble mean with the advantage of being poorly correlated, allowing to save computational resources and reduce noise (and thus improving accuracy). We further propose and discuss two methods for selecting subsets of skilful and diverse members, and prove that, in the contingency of the present analysis, their mean outscores the full ensemble mean in terms of both accuracy (error) and precision (variance). Copyright © 2014. Published by Elsevier Ltd.
Qian, Yun; Yan, Huiping; Hou, Zhangshuan; ...
2015-04-10
We investigate the sensitivity of precipitation characteristics (mean, extreme and diurnal cycle) to a set of uncertain parameters that influence the qualitative and quantitative behavior of the cloud and aerosol processes in the Community Atmosphere Model (CAM5). We adopt both the Latin hypercube and quasi-Monte Carlo sampling approaches to effectively explore the high-dimensional parameter space and then conduct two large sets of simulations. One set consists of 1100 simulations (cloud ensemble) perturbing 22 parameters related to cloud physics and convection, and the other set consists of 256 simulations (aerosol ensemble) focusing on 16 parameters related to aerosols and cloud microphysics.more » Results show that for the 22 parameters perturbed in the cloud ensemble, the six having the greatest influences on the global mean precipitation are identified, three of which (related to the deep convection scheme) are the primary contributors to the total variance of the phase and amplitude of the precipitation diurnal cycle over land. The extreme precipitation characteristics are sensitive to a fewer number of parameters. The precipitation does not always respond monotonically to parameter change. The influence of individual parameters does not depend on the sampling approaches or concomitant parameters selected. Generally the GLM is able to explain more of the parametric sensitivity of global precipitation than local or regional features. The total explained variance for precipitation is primarily due to contributions from the individual parameters (75-90% in total). The total variance shows a significant seasonal variability in the mid-latitude continental regions, but very small in tropical continental regions.« less
Bhargav, K K; Ram, S; Majumder, S B
2012-04-01
Nanocrystallites La0.8Pb0.2(Fe0.8Co0.2)O3 (LPFC) when bonded through a surface layer (carbon) in small ensembles display surface sensitive magnetism useful for biological probes, electrodes, and toxic gas sensors. A simple dispersion and hydrolysis of the salts in ethylene glycol (EG) in water is explored to form ensembles of the nanocrystallites (NCs) by combustion of a liquid precursor gel slowly in microwave at 70-80 dgrees C (apparent) in a closed container in air. In a dilute sample, the EG molecules mediate hydrolyzed species to configure in small groups in process to form a gel. Proposed models describe how a residual carbon bridges a stable bonded layer of a graphene-oxide-like hybrid structure on the LPFC-NCs in attenuating the magnetic structure. SEM images, measured from a pelletized sample which was used to study the gas sensing features in terms of the electrical resistance, describe plate shaped NCs, typically 30-60 nm widths, 60-180 nm lengths and -50 m2/g surface area (after heating at -750 degrees C). These NCs are arranged in ensembles (200-900 nm size). As per the X-ray diffraction, the plates (a Pnma orthorhombic structure) bear only small strain -0.0023 N/m2 and oxygen vacancies. The phonon and electronic bands from a bonded surface layer disappear when it is etched out slowly by heating above 550 degrees C in air. The surface layer actively promotes selective H2 gas sensor properties.
A Diagnostics Tool to detect ensemble forecast system anomaly and guide operational decisions
NASA Astrophysics Data System (ADS)
Park, G. H.; Srivastava, A.; Shrestha, E.; Thiemann, M.; Day, G. N.; Draijer, S.
2017-12-01
The hydrologic community is moving toward using ensemble forecasts to take uncertainty into account during the decision-making process. The New York City Department of Environmental Protection (DEP) implements several types of ensemble forecasts in their decision-making process: ensemble products for a statistical model (Hirsch and enhanced Hirsch); the National Weather Service (NWS) Advanced Hydrologic Prediction Service (AHPS) forecasts based on the classical Ensemble Streamflow Prediction (ESP) technique; and the new NWS Hydrologic Ensemble Forecasting Service (HEFS) forecasts. To remove structural error and apply the forecasts to additional forecast points, the DEP post processes both the AHPS and the HEFS forecasts. These ensemble forecasts provide mass quantities of complex data, and drawing conclusions from these forecasts is time-consuming and difficult. The complexity of these forecasts also makes it difficult to identify system failures resulting from poor data, missing forecasts, and server breakdowns. To address these issues, we developed a diagnostic tool that summarizes ensemble forecasts and provides additional information such as historical forecast statistics, forecast skill, and model forcing statistics. This additional information highlights the key information that enables operators to evaluate the forecast in real-time, dynamically interact with the data, and review additional statistics, if needed, to make better decisions. We used Bokeh, a Python interactive visualization library, and a multi-database management system to create this interactive tool. This tool compiles and stores data into HTML pages that allows operators to readily analyze the data with built-in user interaction features. This paper will present a brief description of the ensemble forecasts, forecast verification results, and the intended applications for the diagnostic tool.
Human resource recommendation algorithm based on ensemble learning and Spark
NASA Astrophysics Data System (ADS)
Cong, Zihan; Zhang, Xingming; Wang, Haoxiang; Xu, Hongjie
2017-08-01
Aiming at the problem of “information overload” in the human resources industry, this paper proposes a human resource recommendation algorithm based on Ensemble Learning. The algorithm considers the characteristics and behaviours of both job seeker and job features in the real business circumstance. Firstly, the algorithm uses two ensemble learning methods-Bagging and Boosting. The outputs from both learning methods are then merged to form user interest model. Based on user interest model, job recommendation can be extracted for users. The algorithm is implemented as a parallelized recommendation system on Spark. A set of experiments have been done and analysed. The proposed algorithm achieves significant improvement in accuracy, recall rate and coverage, compared with recommendation algorithms such as UserCF and ItemCF.
Selected Influences on Solo and Small-Ensemble Festival Ratings: Replication and Extension
ERIC Educational Resources Information Center
Bergee, Martin J.; McWhirter, Jamila L.
2005-01-01
Festival performance is no trivial endeavor. At one midwestern state festival alone, 10,938 events received a rating over a 3-year period (2001-2003). Such an extensive level of participation justifies sustained study. To learn more about variables that may underlie success at solo and small ensemble evaluative festivals, Bergee and Platt (2003)…
ERIC Educational Resources Information Center
Kinney, Daryl W.
2004-01-01
This study compared collegiate subjects who had participated in high school performing ensembles (participants) with subjects who had not (non-participants) on their ability to perform expressively and to perceive expression in music. In Phase I, subjects (N = 56) were asked to perform three song selections, expressively and unexpressively, using…
EMPIRE and pyenda: Two ensemble-based data assimilation systems written in Fortran and Python
NASA Astrophysics Data System (ADS)
Geppert, Gernot; Browne, Phil; van Leeuwen, Peter Jan; Merker, Claire
2017-04-01
We present and compare the features of two ensemble-based data assimilation frameworks, EMPIRE and pyenda. Both frameworks allow to couple models to the assimilation codes using the Message Passing Interface (MPI), leading to extremely efficient and fast coupling between models and the data-assimilation codes. The Fortran-based system EMPIRE (Employing Message Passing Interface for Researching Ensembles) is optimized for parallel, high-performance computing. It currently includes a suite of data assimilation algorithms including variants of the ensemble Kalman and several the particle filters. EMPIRE is targeted at models of all kinds of complexity and has been coupled to several geoscience models, eg. the Lorenz-63 model, a barotropic vorticity model, the general circulation model HadCM3, the ocean model NEMO, and the land-surface model JULES. The Python-based system pyenda (Python Ensemble Data Assimilation) allows Fortran- and Python-based models to be used for data assimilation. Models can be coupled either using MPI or by using a Python interface. Using Python allows quick prototyping and pyenda is aimed at small to medium scale models. pyenda currently includes variants of the ensemble Kalman filter and has been coupled to the Lorenz-63 model, an advection-based precipitation nowcasting scheme, and the dynamic global vegetation model JSBACH.
Multiscale Macromolecular Simulation: Role of Evolving Ensembles
Singharoy, A.; Joshi, H.; Ortoleva, P.J.
2013-01-01
Multiscale analysis provides an algorithm for the efficient simulation of macromolecular assemblies. This algorithm involves the coevolution of a quasiequilibrium probability density of atomic configurations and the Langevin dynamics of spatial coarse-grained variables denoted order parameters (OPs) characterizing nanoscale system features. In practice, implementation of the probability density involves the generation of constant OP ensembles of atomic configurations. Such ensembles are used to construct thermal forces and diffusion factors that mediate the stochastic OP dynamics. Generation of all-atom ensembles at every Langevin timestep is computationally expensive. Here, multiscale computation for macromolecular systems is made more efficient by a method that self-consistently folds in ensembles of all-atom configurations constructed in an earlier step, history, of the Langevin evolution. This procedure accounts for the temporal evolution of these ensembles, accurately providing thermal forces and diffusions. It is shown that efficiency and accuracy of the OP-based simulations is increased via the integration of this historical information. Accuracy improves with the square root of the number of historical timesteps included in the calculation. As a result, CPU usage can be decreased by a factor of 3-8 without loss of accuracy. The algorithm is implemented into our existing force-field based multiscale simulation platform and demonstrated via the structural dynamics of viral capsomers. PMID:22978601
A review of multimodel superensemble forecasting for weather, seasonal climate, and hurricanes
NASA Astrophysics Data System (ADS)
Krishnamurti, T. N.; Kumar, V.; Simon, A.; Bhardwaj, A.; Ghosh, T.; Ross, R.
2016-06-01
This review provides a summary of work in the area of ensemble forecasts for weather, climate, oceans, and hurricanes. This includes a combination of multiple forecast model results that does not dwell on the ensemble mean but uses a unique collective bias reduction procedure. A theoretical framework for this procedure is provided, utilizing a suite of models that is constructed from the well-known Lorenz low-order nonlinear system. A tutorial that includes a walk-through table and illustrates the inner workings of the multimodel superensemble's principle is provided. Systematic errors in a single deterministic model arise from a host of features that range from the model's initial state (data assimilation), resolution, representation of physics, dynamics, and ocean processes, local aspects of orography, water bodies, and details of the land surface. Models, in their diversity of representation of such features, end up leaving unique signatures of systematic errors. The multimodel superensemble utilizes as many as 10 million weights to take into account the bias errors arising from these diverse features of multimodels. The design of a single deterministic forecast models that utilizes multiple features from the use of the large volume of weights is provided here. This has led to a better understanding of the error growths and the collective bias reductions for several of the physical parameterizations within diverse models, such as cumulus convection, planetary boundary layer physics, and radiative transfer. A number of examples for weather, seasonal climate, hurricanes and sub surface oceanic forecast skills of member models, the ensemble mean, and the superensemble are provided.
Evaluation of a New Ensemble Learning Framework for Mass Classification in Mammograms.
Rahmani Seryasat, Omid; Haddadnia, Javad
2018-06-01
Mammography is the most common screening method for diagnosis of breast cancer. In this study, a computer-aided system for diagnosis of benignity and malignity of the masses was implemented in mammogram images. In the computer aided diagnosis system, we first reduce the noise in the mammograms using an effective noise removal technique. After the noise removal, the mass in the region of interest must be segmented and this segmentation is done using a deformable model. After the mass segmentation, a number of features are extracted from it. These features include: features of the mass shape and border, tissue properties, and the fractal dimension. After extracting a large number of features, a proper subset must be chosen from among them. In this study, we make use of a new method on the basis of a genetic algorithm for selection of a proper set of features. After determining the proper features, a classifier is trained. To classify the samples, a new architecture for combination of the classifiers is proposed. In this architecture, easy and difficult samples are identified and trained using different classifiers. Finally, the proposed mass diagnosis system was also tested on mini-Mammographic Image Analysis Society and digital database for screening mammography databases. The obtained results indicate that the proposed system can compete with the state-of-the-art methods in terms of accuracy. Copyright © 2017 Elsevier Inc. All rights reserved.
Deep ensemble learning of sparse regression models for brain disease diagnosis.
Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang
2017-04-01
Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer's disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call 'Deep Ensemble Sparse Regression Network.' To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. Copyright © 2017 Elsevier B.V. All rights reserved.
Deep ensemble learning of sparse regression models for brain disease diagnosis
Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang
2018-01-01
Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer’s disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call ‘ Deep Ensemble Sparse Regression Network.’ To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. PMID:28167394
Maciag, Joseph J.; Mackenzie, Sarah H.; Tucker, Matthew B.; Schipper, Joshua L.; Swartz, Paul; Clark, A. Clay
2016-01-01
The native ensemble of caspases is described globally by a complex energy landscape where the binding of substrate selects for the active conformation, whereas targeting an allosteric site in the dimer interface selects an inactive conformation that contains disordered active-site loops. Mutations and posttranslational modifications stabilize high-energy inactive conformations, with mostly formed, but distorted, active sites. To examine the interconversion of active and inactive states in the ensemble, we used detection of related solvent positions to analyze 4,995 waters in 15 high-resolution (<2.0 Å) structures of wild-type caspase-3, resulting in 450 clusters with the most highly conserved set containing 145 water molecules. The data show that regions of the protein that contact the conserved waters also correspond to sites of posttranslational modifications, suggesting that the conserved waters are an integral part of allosteric mechanisms. To test this hypothesis, we created a library of 19 caspase-3 variants through saturation mutagenesis in a single position of the allosteric site of the dimer interface, and we show that the enzyme activity varies by more than four orders of magnitude. Altogether, our database consists of 37 high-resolution structures of caspase-3 variants, and we demonstrate that the decrease in activity correlates with a loss of conserved water molecules. The data show that the activity of caspase-3 can be fine-tuned through globally desolvating the active conformation within the native ensemble, providing a mechanism for cells to repartition the ensemble and thus fine-tune activity through conformational selection. PMID:27681633
Maciag, Joseph J; Mackenzie, Sarah H; Tucker, Matthew B; Schipper, Joshua L; Swartz, Paul; Clark, A Clay
2016-10-11
The native ensemble of caspases is described globally by a complex energy landscape where the binding of substrate selects for the active conformation, whereas targeting an allosteric site in the dimer interface selects an inactive conformation that contains disordered active-site loops. Mutations and posttranslational modifications stabilize high-energy inactive conformations, with mostly formed, but distorted, active sites. To examine the interconversion of active and inactive states in the ensemble, we used detection of related solvent positions to analyze 4,995 waters in 15 high-resolution (<2.0 Å) structures of wild-type caspase-3, resulting in 450 clusters with the most highly conserved set containing 145 water molecules. The data show that regions of the protein that contact the conserved waters also correspond to sites of posttranslational modifications, suggesting that the conserved waters are an integral part of allosteric mechanisms. To test this hypothesis, we created a library of 19 caspase-3 variants through saturation mutagenesis in a single position of the allosteric site of the dimer interface, and we show that the enzyme activity varies by more than four orders of magnitude. Altogether, our database consists of 37 high-resolution structures of caspase-3 variants, and we demonstrate that the decrease in activity correlates with a loss of conserved water molecules. The data show that the activity of caspase-3 can be fine-tuned through globally desolvating the active conformation within the native ensemble, providing a mechanism for cells to repartition the ensemble and thus fine-tune activity through conformational selection.
Kunz, Ralf; Timpmann, Kõu; Southall, June; Cogdell, Richard J.; Freiberg, Arvi; Köhler, Jürgen
2014-01-01
We have recorded fluorescence-excitation and emission spectra from single LH2 complexes from Rhodopseudomonas (Rps.) acidophila. Both types of spectra show strong temporal spectral fluctuations that can be visualized as spectral diffusion plots. Comparison of the excitation and emission spectra reveals that for most of the complexes the lowest exciton transition is not observable in the excitation spectra due to the cutoff of the detection filter characteristics. However, from the spectral diffusion plots we have the full spectral and temporal information at hand and can select those complexes for which the excitation spectra are complete. Correlating the red most spectral feature of the excitation spectrum with the blue most spectral feature of the emission spectrum allows an unambiguous assignment of the lowest exciton state. Hence, application of fluorescence-excitation and emission spectroscopy on the same individual LH2 complex allows us to decipher spectral subtleties that are usually hidden in traditional ensemble spectroscopy. PMID:24806933
Si, Lei; Wang, Zhongbin; Liu, Xinhua; Tan, Chao; Xu, Jing; Zheng, Kehong
2015-11-13
In order to efficiently and accurately identify the cutting condition of a shearer, this paper proposed an intelligent multi-sensor data fusion identification method using the parallel quasi-Newton neural network (PQN-NN) and the Dempster-Shafer (DS) theory. The vibration acceleration signals and current signal of six cutting conditions were collected from a self-designed experimental system and some special state features were extracted from the intrinsic mode functions (IMFs) based on the ensemble empirical mode decomposition (EEMD). In the experiment, three classifiers were trained and tested by the selected features of the measured data, and the DS theory was used to combine the identification results of three single classifiers. Furthermore, some comparisons with other methods were carried out. The experimental results indicate that the proposed method performs with higher detection accuracy and credibility than the competing algorithms. Finally, an industrial application example in the fully mechanized coal mining face was demonstrated to specify the effect of the proposed system.
Approach to the problem of the parameters optimization of the shooting system
NASA Astrophysics Data System (ADS)
Demidova, L. A.; Sablina, V. A.; Sokolova, Y. S.
2018-02-01
The problem of the objects identification on the base of their hyperspectral features has been considered. It is offered to use the SVM classifiers’ ensembles, adapted to specifics of the problem of the objects identification on the base of their hyperspectral features. The results of the objects identification on the base of their hyperspectral features with using of the SVM classifiers have been presented.
Metabolomic biosignature differentiates melancholic depressive patients from healthy controls.
Liu, Yashu; Yieh, Lynn; Yang, Tao; Drinkenburg, Wilhelmus; Peeters, Pieter; Steckler, Thomas; Narayan, Vaibhav A; Wittenberg, Gayle; Ye, Jieping
2016-08-23
Major depressive disorder (MDD) is a heterogeneous disease at the level of clinical symptoms, and this heterogeneity is likely reflected at the level of biology. Two clinical subtypes within MDD that have garnered interest are "melancholic depression" and "anxious depression". Metabolomics enables us to characterize hundreds of small molecules that comprise the metabolome, and recent work suggests the blood metabolome may be able to inform treatment decisions for MDD, however work is at an early stage. Here we examine a metabolomics data set to (1) test whether clinically homogenous MDD subtypes are also more biologically homogeneous, and hence more predictiable, (2) devise a robust machine learning framework that preserves biological meaning, and (3) describe the metabolomic biosignature for melancholic depression. With the proposed computational system we achieves around 80 % classification accuracy, sensitivity and specificity for melancholic depression, but only ~72 % for anxious depression or MDD, suggesting the blood metabolome contains more information about melancholic depression.. We develop an ensemble feature selection framework (EFSF) in which features are first clustered, and learning then takes place on the cluster centroids, retaining information about correlated features during the feature selection process rather than discarding them as most machine learning methods will do. Analysis of the most discriminative feature clusters revealed differences in metabolic classes such as amino acids and lipids as well as pathways studied extensively in MDD such as the activation of cortisol in chronic stress. We find the greater clinical homogeneity does indeed lead to better prediction based on biological measurements in the case of melancholic depression. Melancholic depression is shown to be associated with changes in amino acids, catecholamines, lipids, stress hormones, and immune-related metabolites. The proposed computational framework can be adapted to analyze data from many other biomedical applications where the data has similar characteristics.
Tatinati, Sivanagaraja; Nazarpour, Kianoush; Tech Ang, Wei; Veluvolu, Kalyana C
2016-08-01
Successful treatment of tumors with motion-adaptive radiotherapy requires accurate prediction of respiratory motion, ideally with a prediction horizon larger than the latency in radiotherapy system. Accurate prediction of respiratory motion is however a non-trivial task due to the presence of irregularities and intra-trace variabilities, such as baseline drift and temporal changes in fundamental frequency pattern. In this paper, to enhance the accuracy of the respiratory motion prediction, we propose a stacked regression ensemble framework that integrates heterogeneous respiratory motion prediction algorithms. We further address two crucial issues for developing a successful ensemble framework: (1) selection of appropriate prediction methods to ensemble (level-0 methods) among the best existing prediction methods; and (2) finding a suitable generalization approach that can successfully exploit the relative advantages of the chosen level-0 methods. The efficacy of the developed ensemble framework is assessed with real respiratory motion traces acquired from 31 patients undergoing treatment. Results show that the developed ensemble framework improves the prediction performance significantly compared to the best existing methods. Copyright © 2016 IPEM. Published by Elsevier Ltd. All rights reserved.
Synchrony suppression in ensembles of coupled oscillators via adaptive vanishing feedback.
Montaseri, Ghazal; Yazdanpanah, Mohammad Javad; Pikovsky, Arkady; Rosenblum, Michael
2013-09-01
Synchronization and emergence of a collective mode is a general phenomenon, frequently observed in ensembles of coupled self-sustained oscillators of various natures. In several circumstances, in particular in cases of neurological pathologies, this state of the active medium is undesirable. Destruction of this state by a specially designed stimulation is a challenge of high clinical relevance. Typically, the precise effect of an external action on the ensemble is unknown, since the microscopic description of the oscillators and their interactions are not available. We show that, desynchronization in case of a large degree of uncertainty about important features of the system is nevertheless possible; it can be achieved by virtue of a feedback loop with an additional adaptation of parameters. The adaptation also ensures desynchronization of ensembles with non-stationary, time-varying parameters. We perform the stability analysis of the feedback-controlled system and demonstrate efficient destruction of synchrony for several models, including those of spiking and bursting neurons.
Synchrony suppression in ensembles of coupled oscillators via adaptive vanishing feedback
NASA Astrophysics Data System (ADS)
Montaseri, Ghazal; Javad Yazdanpanah, Mohammad; Pikovsky, Arkady; Rosenblum, Michael
2013-09-01
Synchronization and emergence of a collective mode is a general phenomenon, frequently observed in ensembles of coupled self-sustained oscillators of various natures. In several circumstances, in particular in cases of neurological pathologies, this state of the active medium is undesirable. Destruction of this state by a specially designed stimulation is a challenge of high clinical relevance. Typically, the precise effect of an external action on the ensemble is unknown, since the microscopic description of the oscillators and their interactions are not available. We show that, desynchronization in case of a large degree of uncertainty about important features of the system is nevertheless possible; it can be achieved by virtue of a feedback loop with an additional adaptation of parameters. The adaptation also ensures desynchronization of ensembles with non-stationary, time-varying parameters. We perform the stability analysis of the feedback-controlled system and demonstrate efficient destruction of synchrony for several models, including those of spiking and bursting neurons.
Elsawy, Amr S; Eldawlatly, Seif; Taher, Mohamed; Aly, Gamal M
2014-01-01
The current trend to use Brain-Computer Interfaces (BCIs) with mobile devices mandates the development of efficient EEG data processing methods. In this paper, we demonstrate the performance of a Principal Component Analysis (PCA) ensemble classifier for P300-based spellers. We recorded EEG data from multiple subjects using the Emotiv neuroheadset in the context of a classical oddball P300 speller paradigm. We compare the performance of the proposed ensemble classifier to the performance of traditional feature extraction and classifier methods. Our results demonstrate the capability of the PCA ensemble classifier to classify P300 data recorded using the Emotiv neuroheadset with an average accuracy of 86.29% on cross-validation data. In addition, offline testing of the recorded data reveals an average classification accuracy of 73.3% that is significantly higher than that achieved using traditional methods. Finally, we demonstrate the effect of the parameters of the P300 speller paradigm on the performance of the method.
Validation of a Model of Extramusical Influences on Solo and Small-Ensemble Festival Ratings
ERIC Educational Resources Information Center
Bergee, Martin J.
2006-01-01
This is the fourth in a series of studies whose purpose has been to develop a theoretical model of selected extramusical variables' ability to explain solo and small-ensemble festival ratings. Authors of the second and third of these (Bergee & McWhirter, 2005; Bergee & Westfall, 2005) used logistic regression as the basis for their…
Selecting a climate model subset to optimise key ensemble properties
NASA Astrophysics Data System (ADS)
Herger, Nadja; Abramowitz, Gab; Knutti, Reto; Angélil, Oliver; Lehmann, Karsten; Sanderson, Benjamin M.
2018-02-01
End users studying impacts and risks caused by human-induced climate change are often presented with large multi-model ensembles of climate projections whose composition and size are arbitrarily determined. An efficient and versatile method that finds a subset which maintains certain key properties from the full ensemble is needed, but very little work has been done in this area. Therefore, users typically make their own somewhat subjective subset choices and commonly use the equally weighted model mean as a best estimate. However, different climate model simulations cannot necessarily be regarded as independent estimates due to the presence of duplicated code and shared development history. Here, we present an efficient and flexible tool that makes better use of the ensemble as a whole by finding a subset with improved mean performance compared to the multi-model mean while at the same time maintaining the spread and addressing the problem of model interdependence. Out-of-sample skill and reliability are demonstrated using model-as-truth experiments. This approach is illustrated with one set of optimisation criteria but we also highlight the flexibility of cost functions, depending on the focus of different users. The technique is useful for a range of applications that, for example, minimise present-day bias to obtain an accurate ensemble mean, reduce dependence in ensemble spread, maximise future spread, ensure good performance of individual models in an ensemble, reduce the ensemble size while maintaining important ensemble characteristics, or optimise several of these at the same time. As in any calibration exercise, the final ensemble is sensitive to the metric, observational product, and pre-processing steps used.
A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields.
Carlin, Michael A; Elhilali, Mounya
2015-12-01
One of the hallmarks of sound processing in the brain is the ability of the nervous system to adapt to changing behavioral demands and surrounding soundscapes. It can dynamically shift sensory and cognitive resources to focus on relevant sounds. Neurophysiological studies indicate that this ability is supported by adaptively retuning the shapes of cortical spectro-temporal receptive fields (STRFs) to enhance features of target sounds while suppressing those of task-irrelevant distractors. Because an important component of human communication is the ability of a listener to dynamically track speech in noisy environments, the solution obtained by auditory neurophysiology implies a useful adaptation strategy for speech activity detection (SAD). SAD is an important first step in a number of automated speech processing systems, and performance is often reduced in highly noisy environments. In this paper, we describe how task-driven adaptation is induced in an ensemble of neurophysiological STRFs, and show how speech-adapted STRFs reorient themselves to enhance spectro-temporal modulations of speech while suppressing those associated with a variety of nonspeech sounds. We then show how an adapted ensemble of STRFs can better detect speech in unseen noisy environments compared to an unadapted ensemble and a noise-robust baseline. Finally, we use a stimulus reconstruction task to demonstrate how the adapted STRF ensemble better captures the spectrotemporal modulations of attended speech in clean and noisy conditions. Our results suggest that a biologically plausible adaptation framework can be applied to speech processing systems to dynamically adapt feature representations for improving noise robustness.
NASA Astrophysics Data System (ADS)
Chen, BinQiang; Zhang, ZhouSuo; Zi, YanYang; He, ZhengJia; Sun, Chuang
2013-10-01
Detecting transient vibration signatures is of vital importance for vibration-based condition monitoring and fault detection of the rotating machinery. However, raw mechanical signals collected by vibration sensors are generally mixtures of physical vibrations of the multiple mechanical components installed in the examined machinery. Fault-generated incipient vibration signatures masked by interfering contents are difficult to be identified. The fast kurtogram (FK) is a concise and smart gadget for characterizing these vibration features. The multi-rate filter-bank (MRFB) and the spectral kurtosis (SK) indicator of the FK are less powerful when strong interfering vibration contents exist, especially when the FK are applied to vibration signals of short duration. It is encountered that the impulsive interfering contents not authentically induced by mechanical faults complicate the optimal analyzing process and lead to incorrect choosing of the optimal analysis subband, therefore the original FK may leave out the essential fault signatures. To enhance the analyzing performance of FK for industrial applications, an improved version of fast kurtogram, named as "fast spatial-spectral ensemble kurtosis kurtogram", is presented. In the proposed technique, discrete quasi-analytic wavelet tight frame (QAWTF) expansion methods are incorporated as the detection filters. The QAWTF, constructed based on dual tree complex wavelet transform, possesses better vibration transient signature extracting ability and enhanced time-frequency localizability compared with conventional wavelet packet transforms (WPTs). Moreover, in the constructed QAWTF, a non-dyadic ensemble wavelet subband generating strategy is put forward to produce extra wavelet subbands that are capable of identifying fault features located in transition-band of WPT. On the other hand, an enhanced signal impulsiveness evaluating indicator, named "spatial-spectral ensemble kurtosis" (SSEK), is put forward and utilized as the quantitative measure to select optimal analyzing parameters. The SSEK indicator is robuster in evaluating the impulsiveness intensity of vibration signals due to its better suppressing ability of Gaussian noise, harmonics and sporadic impulsive shocks. Numerical validations, an experimental test and two engineering applications were used to verify the effectiveness of the proposed technique. The analyzing results of the numerical validations, experimental tests and engineering applications demonstrate that the proposed technique possesses robuster transient vibration content detecting performance in comparison with the original FK and the WPT-based FK method, especially when they are applied to the processing of vibration signals of relative limited duration.
Khan, Adil G; Poort, Jasper; Chadwick, Angus; Blot, Antonin; Sahani, Maneesh; Mrsic-Flogel, Thomas D; Hofer, Sonja B
2018-06-01
How learning enhances neural representations for behaviorally relevant stimuli via activity changes of cortical cell types remains unclear. We simultaneously imaged responses of pyramidal cells (PYR) along with parvalbumin (PV), somatostatin (SOM), and vasoactive intestinal peptide (VIP) inhibitory interneurons in primary visual cortex while mice learned to discriminate visual patterns. Learning increased selectivity for task-relevant stimuli of PYR, PV and SOM subsets but not VIP cells. Strikingly, PV neurons became as selective as PYR cells, and their functional interactions reorganized, leading to the emergence of stimulus-selective PYR-PV ensembles. Conversely, SOM activity became strongly decorrelated from the network, and PYR-SOM coupling before learning predicted selectivity increases in individual PYR cells. Thus, learning differentially shapes the activity and interactions of multiple cell classes: while SOM inhibition may gate selectivity changes, PV interneurons become recruited into stimulus-specific ensembles and provide more selective inhibition as the network becomes better at discriminating behaviorally relevant stimuli.
Lahiri, A; Roy, Abhijit Guha; Sheet, Debdoot; Biswas, Prabir Kumar
2016-08-01
Automated segmentation of retinal blood vessels in label-free fundus images entails a pivotal role in computed aided diagnosis of ophthalmic pathologies, viz., diabetic retinopathy, hypertensive disorders and cardiovascular diseases. The challenge remains active in medical image analysis research due to varied distribution of blood vessels, which manifest variations in their dimensions of physical appearance against a noisy background. In this paper we formulate the segmentation challenge as a classification task. Specifically, we employ unsupervised hierarchical feature learning using ensemble of two level of sparsely trained denoised stacked autoencoder. First level training with bootstrap samples ensures decoupling and second level ensemble formed by different network architectures ensures architectural revision. We show that ensemble training of auto-encoders fosters diversity in learning dictionary of visual kernels for vessel segmentation. SoftMax classifier is used for fine tuning each member autoencoder and multiple strategies are explored for 2-level fusion of ensemble members. On DRIVE dataset, we achieve maximum average accuracy of 95.33% with an impressively low standard deviation of 0.003 and Kappa agreement coefficient of 0.708. Comparison with other major algorithms substantiates the high efficacy of our model.
Generalized multiple kernel learning with data-dependent priors.
Mao, Qi; Tsang, Ivor W; Gao, Shenghua; Wang, Li
2015-06-01
Multiple kernel learning (MKL) and classifier ensemble are two mainstream methods for solving learning problems in which some sets of features/views are more informative than others, or the features/views within a given set are inconsistent. In this paper, we first present a novel probabilistic interpretation of MKL such that maximum entropy discrimination with a noninformative prior over multiple views is equivalent to the formulation of MKL. Instead of using the noninformative prior, we introduce a novel data-dependent prior based on an ensemble of kernel predictors, which enhances the prediction performance of MKL by leveraging the merits of the classifier ensemble. With the proposed probabilistic framework of MKL, we propose a hierarchical Bayesian model to learn the proposed data-dependent prior and classification model simultaneously. The resultant problem is convex and other information (e.g., instances with either missing views or missing labels) can be seamlessly incorporated into the data-dependent priors. Furthermore, a variety of existing MKL models can be recovered under the proposed MKL framework and can be readily extended to incorporate these priors. Extensive experiments demonstrate the benefits of our proposed framework in supervised and semisupervised settings, as well as in tasks with partial correspondence among multiple views.
Constructing optimal ensemble projections for predictive environmental modelling in Northern Eurasia
NASA Astrophysics Data System (ADS)
Anisimov, Oleg; Kokorev, Vasily
2013-04-01
Large uncertainties in climate impact modelling are associated with the forcing climate data. This study is targeted at the evaluation of the quality of GCM-based climatic projections in the specific context of predictive environmental modelling in Northern Eurasia. To accomplish this task, we used the output from 36 CMIP5 GCMs from the IPCC AR-5 data base for the control period 1975-2005 and calculated several climatic characteristics and indexes that are most often used in the impact models, i.e. the summer warmth index, duration of the vegetation growth period, precipitation sums, dryness index, thawing degree-day sums, and the annual temperature amplitude. We used data from 744 weather stations in Russia and neighbouring countries to analyze the spatial patterns of modern climatic change and to delineate 17 large regions with coherent temperature changes in the past few decades. GSM results and observational data were averaged over the coherent regions and compared with each other. Ultimately, we evaluated the skills of individual models, ranked them in the context of regional impact modelling and identified top-end GCMs that "better than average" reproduce modern regional changes of the selected meteorological parameters and climatic indexes. Selected top-end GCMs were used to compose several ensembles, each combining results from the different number of models. Ensembles were ranked using the same algorithm and outliers eliminated. We then used data from top-end ensembles for the 2000-2100 period to construct the climatic projections that are likely to be "better than average" in predicting climatic parameters that govern the state of environment in Northern Eurasia. The ultimate conclusions of our study are the following. • High-end GCMs that demonstrate excellent skills in conventional atmospheric model intercomparison experiments are not necessarily the best in replicating climatic characteristics that govern the state of environment in Northern Eurasia, and independent model evaluation on regional level is necessary to identify "better than average" GCMs. • Each of the ensembles combining results from several "better than average" models replicate selected meteorological parameters and climatic indexes better than any single GCM. The ensemble skills are parameter-specific and depend on models it consists of. The best results are not necessarily those based on the ensemble comprised by all "better than average" models. • Comprehensive evaluation of climatic scenarios using specific criteria narrows the range of uncertainties in environmental projections.
The interplay of biomolecules and water at the origin of the active behavior of living organisms
NASA Astrophysics Data System (ADS)
Del Giudice, E.; Stefanini, P.; Tedeschi, A.; Vitiello, G.
2011-12-01
It is shown that the main component of living matter, namely liquid water, is not an ensemble of independent molecules but an ensemble of phase correlated molecules kept in tune by an electromagnetic (e.m) field trapped in the ensemble. This field and the correlated potential govern the interaction among biomolecules suspended in water and are in turn affected by the chemical interactions of molecules. In particular, the phase of the coherent fields appears to play an important role in this dynamics. Recent experiments reported by the Montagnier group seem to corroborate this theory. Some features of the dynamics of human organisms, as reported by psychotherapy, holistic medicine and Eastern traditions, are analyzed in this frame and could find a rationale in this context.
Zhao, Huawei
2009-01-01
A ZEMAX model was constructed to simulate a clinical trial of intraocular lenses (IOLs) based on a clinically oriented Monte Carlo ensemble analysis using postoperative ocular parameters. The purpose of this model is to test the feasibility of streamlining and optimizing both the design process and the clinical testing of IOLs. This optical ensemble analysis (OEA) is also validated. Simulated pseudophakic eyes were generated by using the tolerancing and programming features of ZEMAX optical design software. OEA methodology was verified by demonstrating that the results of clinical performance simulations were consistent with previously published clinical performance data using the same types of IOLs. From these results we conclude that the OEA method can objectively simulate the potential clinical trial performance of IOLs.
Tuning to optimize SVM approach for assisting ovarian cancer diagnosis with photoacoustic imaging.
Wang, Rui; Li, Rui; Lei, Yanyan; Zhu, Quing
2015-01-01
Support vector machine (SVM) is one of the most effective classification methods for cancer detection. The efficiency and quality of a SVM classifier depends strongly on several important features and a set of proper parameters. Here, a series of classification analyses, with one set of photoacoustic data from ovarian tissues ex vivo and a widely used breast cancer dataset- the Wisconsin Diagnostic Breast Cancer (WDBC), revealed the different accuracy of a SVM classification in terms of the number of features used and the parameters selected. A pattern recognition system is proposed by means of SVM-Recursive Feature Elimination (RFE) with the Radial Basis Function (RBF) kernel. To improve the effectiveness and robustness of the system, an optimized tuning ensemble algorithm called as SVM-RFE(C) with correlation filter was implemented to quantify feature and parameter information based on cross validation. The proposed algorithm is first demonstrated outperforming SVM-RFE on WDBC. Then the best accuracy of 94.643% and sensitivity of 94.595% were achieved when using SVM-RFE(C) to test 57 new PAT data from 19 patients. The experiment results show that the classifier constructed with SVM-RFE(C) algorithm is able to learn additional information from new data and has significant potential in ovarian cancer diagnosis.
NASA Astrophysics Data System (ADS)
Hawkins, L. R.; Rupp, D. E.; Li, S.; Sarah, S.; McNeall, D. J.; Mote, P.; Betts, R. A.; Wallom, D.
2017-12-01
Changing regional patterns of surface temperature, precipitation, and humidity may cause ecosystem-scale changes in vegetation, altering the distribution of trees, shrubs, and grasses. A changing vegetation distribution, in turn, alters the albedo, latent heat flux, and carbon exchanged with the atmosphere with resulting feedbacks onto the regional climate. However, a wide range of earth-system processes that affect the carbon, energy, and hydrologic cycles occur at sub grid scales in climate models and must be parameterized. The appropriate parameter values in such parameterizations are often poorly constrained, leading to uncertainty in predictions of how the ecosystem will respond to changes in forcing. To better understand the sensitivity of regional climate to parameter selection and to improve regional climate and vegetation simulations, we used a large perturbed physics ensemble and a suite of statistical emulators. We dynamically downscaled a super-ensemble (multiple parameter sets and multiple initial conditions) of global climate simulations using a 25-km resolution regional climate model HadRM3p with the land-surface scheme MOSES2 and dynamic vegetation module TRIFFID. We simultaneously perturbed land surface parameters relating to the exchange of carbon, water, and energy between the land surface and atmosphere in a large super-ensemble of regional climate simulations over the western US. Statistical emulation was used as a computationally cost-effective tool to explore uncertainties in interactions. Regions of parameter space that did not satisfy observational constraints were eliminated and an ensemble of parameter sets that reduce regional biases and span a range of plausible interactions among earth system processes were selected. This study demonstrated that by combining super-ensemble simulations with statistical emulation, simulations of regional climate could be improved while simultaneously accounting for a range of plausible land-atmosphere feedback strengths.
NASA Astrophysics Data System (ADS)
Peishu, Zong; Jianping, Tang; Shuyu, Wang; Lingyun, Xie; Jianwei, Yu; Yunqian, Zhu; Xiaorui, Niu; Chao, Li
2017-08-01
The parameterization of physical processes is one of the critical elements to properly simulate the regional climate over eastern China. It is essential to conduct detailed analyses on the effect of physical parameterization schemes on regional climate simulation, to provide more reliable regional climate change information. In this paper, we evaluate the 25-year (1983-2007) summer monsoon climate characteristics of precipitation and surface air temperature by using the regional spectral model (RSM) with different physical schemes. The ensemble results using the reliability ensemble averaging (REA) method are also assessed. The result shows that the RSM model has the capacity to reproduce the spatial patterns, the variations, and the temporal tendency of surface air temperature and precipitation over eastern China. And it tends to predict better climatology characteristics over the Yangtze River basin and the South China. The impact of different physical schemes on RSM simulations is also investigated. Generally, the CLD3 cloud water prediction scheme tends to produce larger precipitation because of its overestimation of the low-level moisture. The systematic biases derived from the KF2 cumulus scheme are larger than those from the RAS scheme. The scale-selective bias correction (SSBC) method improves the simulation of the temporal and spatial characteristics of surface air temperature and precipitation and advances the circulation simulation capacity. The REA ensemble results show significant improvement in simulating temperature and precipitation distribution, which have much higher correlation coefficient and lower root mean square error. The REA result of selected experiments is better than that of nonselected experiments, indicating the necessity of choosing better ensemble samples for ensemble.
Park, Jae Hong; Kim, Chang-Eop; Shin, Jaewoo; Im, Changkyun; Koh, Chin Su; Seo, In Seok; Kim, Sang Jeong; Shin, Hyung-Cheul
2013-10-01
Chronic monitoring of the state of the bladder can be used to notify patients with urinary dysfunction when the bladder should be voided. Given that many spinal neurons respond both to somatic and visceral inputs, it is necessary to extract bladder information selectively from the spinal cord. Here, we hypothesize that sensory information with distinct modalities should be represented by the distinct ensemble activity patterns within the neuronal population and, therefore, analyzing the activity patterns of the neuronal population could distinguish bladder fullness from somatic stimuli. We simultaneously recorded 26-27 single unit activities in response to bladder distension or tactile stimuli in the dorsal spinal cord of each Sprague-Dawley rat. In order to discriminate between bladder fullness and tactile stimulus inputs, we analyzed the ensemble activity patterns of the entire neuronal population. A support vector machine (SVM) was employed as a classifier, and discrimination performance was measured by k-fold cross-validation tests. Most of the units responding to bladder fullness also responded to the tactile stimuli (88.9-100%). The SVM classifier precisely distinguished the bladder fullness from the somatic input (100%), indicating that the ensemble activity patterns of the unit population in the spinal cord are distinct enough to identify the current input modality. Moreover, our ensemble activity pattern-based classifier showed high robustness against random losses of signals. This study is the first to demonstrate that the two main issues of electroneurographic monitoring of bladder fullness, low signals and selectiveness, can be solved by an ensemble activity pattern-based approach, improving the feasibility of chronic monitoring of bladder fullness by neural recording.
Clark, M.R.; Gangopadhyay, S.; Hay, L.; Rajagopalan, B.; Wilby, R.
2004-01-01
A number of statistical methods that are used to provide local-scale ensemble forecasts of precipitation and temperature do not contain realistic spatial covariability between neighboring stations or realistic temporal persistence for subsequent forecast lead times. To demonstrate this point, output from a global-scale numerical weather prediction model is used in a stepwise multiple linear regression approach to downscale precipitation and temperature to individual stations located in and around four study basins in the United States. Output from the forecast model is downscaled for lead times up to 14 days. Residuals in the regression equation are modeled stochastically to provide 100 ensemble forecasts. The precipitation and temperature ensembles from this approach have a poor representation of the spatial variability and temporal persistence. The spatial correlations for downscaled output are considerably lower than observed spatial correlations at short forecast lead times (e.g., less than 5 days) when there is high accuracy in the forecasts. At longer forecast lead times, the downscaled spatial correlations are close to zero. Similarly, the observed temporal persistence is only partly present at short forecast lead times. A method is presented for reordering the ensemble output in order to recover the space-time variability in precipitation and temperature fields. In this approach, the ensemble members for a given forecast day are ranked and matched with the rank of precipitation and temperature data from days randomly selected from similar dates in the historical record. The ensembles are then reordered to correspond to the original order of the selection of historical data. Using this approach, the observed intersite correlations, intervariable correlations, and the observed temporal persistence are almost entirely recovered. This reordering methodology also has applications for recovering the space-time variability in modeled streamflow. ?? 2004 American Meteorological Society.
Model Selection for Monitoring CO2 Plume during Sequestration
DOE Office of Scientific and Technical Information (OSTI.GOV)
2014-12-31
The model selection method developed as part of this project mainly includes four steps: (1) assessing the connectivity/dynamic characteristics of a large prior ensemble of models, (2) model clustering using multidimensional scaling coupled with k-mean clustering, (3) model selection using the Bayes' rule in the reduced model space, (4) model expansion using iterative resampling of the posterior models. The fourth step expresses one of the advantages of the method: it provides a built-in means of quantifying the uncertainty in predictions made with the selected models. In our application to plume monitoring, by expanding the posterior space of models, the finalmore » ensemble of representations of geological model can be used to assess the uncertainty in predicting the future displacement of the CO2 plume. The software implementation of this approach is attached here.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Timofeev, Andrey V.; Egorov, Dmitry V.
This paper presents new results concerning selection of an optimal information fusion formula for an ensemble of Lipschitz classifiers. The goal of information fusion is to create an integral classificatory which could provide better generalization ability of the ensemble while achieving a practically acceptable level of effectiveness. The problem of information fusion is very relevant for data processing in multi-channel C-OTDR-monitoring systems. In this case we have to effectively classify targeted events which appear in the vicinity of the monitored object. Solution of this problem is based on usage of an ensemble of Lipschitz classifiers each of which corresponds tomore » a respective channel. We suggest a brand new method for information fusion in case of ensemble of Lipschitz classifiers. This method is called “The Weighing of Inversely as Lipschitz Constants” (WILC). Results of WILC-method practical usage in multichannel C-OTDR monitoring systems are presented.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Crossno, Patricia J.; Gittinger, Jaxon; Hunt, Warren L.
Slycat™ is a web-based system for performing data analysis and visualization of potentially large quantities of remote, high-dimensional data. Slycat™ specializes in working with ensemble data. An ensemble is a group of related data sets, which typically consists of a set of simulation runs exploring the same problem space. An ensemble can be thought of as a set of samples within a multi-variate domain, where each sample is a vector whose value defines a point in high-dimensional space. To understand and describe the underlying problem being modeled in the simulations, ensemble analysis looks for shared behaviors and common features acrossmore » the group of runs. Additionally, ensemble analysis tries to quantify differences found in any members that deviate from the rest of the group. The Slycat™ system integrates data management, scalable analysis, and visualization. Results are viewed remotely on a user’s desktop via commodity web clients using a multi-tiered hierarchy of computation and data storage, as shown in Figure 1. Our goal is to operate on data as close to the source as possible, thereby reducing time and storage costs associated with data movement. Consequently, we are working to develop parallel analysis capabilities that operate on High Performance Computing (HPC) platforms, to explore approaches for reducing data size, and to implement strategies for staging computation across the Slycat™ hierarchy. Within Slycat™, data and visual analysis are organized around projects, which are shared by a project team. Project members are explicitly added, each with a designated set of permissions. Although users sign-in to access Slycat™, individual accounts are not maintained. Instead, authentication is used to determine project access. Within projects, Slycat™ models capture analysis results and enable data exploration through various visual representations. Although for scientists each simulation run is a model of real-world phenomena given certain conditions, we use the term model to refer to our modeling of the ensemble data, not the physics. Different model types often provide complementary perspectives on data features when analyzing the same data set. Each model visualizes data at several levels of abstraction, allowing the user to range from viewing the ensemble holistically to accessing numeric parameter values for a single run. Bookmarks provide a mechanism for sharing results, enabling interesting model states to be labeled and saved.« less
Jun, Sanghoon; Kim, Namkug; Seo, Joon Beom; Lee, Young Kyung; Lynch, David A
2017-12-01
We propose the use of ensemble classifiers to overcome inter-scanner variations in the differentiation of regional disease patterns in high-resolution computed tomography (HRCT) images of diffuse interstitial lung disease patients obtained from different scanners. A total of 600 rectangular 20 × 20-pixel regions of interest (ROIs) on HRCT images obtained from two different scanners (GE and Siemens) and the whole lung area of 92 HRCT images were classified as one of six regional pulmonary disease patterns by two expert radiologists. Textual and shape features were extracted from each ROI and the whole lung parenchyma. For automatic classification, individual and ensemble classifiers were trained and tested with the ROI dataset. We designed the following three experimental sets: an intra-scanner study in which the training and test sets were from the same scanner, an integrated scanner study in which the data from the two scanners were merged, and an inter-scanner study in which the training and test sets were acquired from different scanners. In the ROI-based classification, the ensemble classifiers showed better (p < 0.001) accuracy (89.73%, SD = 0.43) than the individual classifiers (88.38%, SD = 0.31) in the integrated scanner test. The ensemble classifiers also showed partial improvements in the intra- and inter-scanner tests. In the whole lung classification experiment, the quantification accuracies of the ensemble classifiers with integrated training (49.57%) were higher (p < 0.001) than the individual classifiers (48.19%). Furthermore, the ensemble classifiers also showed better performance in both the intra- and inter-scanner experiments. We concluded that the ensemble classifiers provide better performance when using integrated scanner images.
The role of model dynamics in ensemble Kalman filter performance for chaotic systems
Ng, G.-H.C.; McLaughlin, D.; Entekhabi, D.; Ahanin, A.
2011-01-01
The ensemble Kalman filter (EnKF) is susceptible to losing track of observations, or 'diverging', when applied to large chaotic systems such as atmospheric and ocean models. Past studies have demonstrated the adverse impact of sampling error during the filter's update step. We examine how system dynamics affect EnKF performance, and whether the absence of certain dynamic features in the ensemble may lead to divergence. The EnKF is applied to a simple chaotic model, and ensembles are checked against singular vectors of the tangent linear model, corresponding to short-term growth and Lyapunov vectors, corresponding to long-term growth. Results show that the ensemble strongly aligns itself with the subspace spanned by unstable Lyapunov vectors. Furthermore, the filter avoids divergence only if the full linearized long-term unstable subspace is spanned. However, short-term dynamics also become important as non-linearity in the system increases. Non-linear movement prevents errors in the long-term stable subspace from decaying indefinitely. If these errors then undergo linear intermittent growth, a small ensemble may fail to properly represent all important modes, causing filter divergence. A combination of long and short-term growth dynamics are thus critical to EnKF performance. These findings can help in developing practical robust filters based on model dynamics. ?? 2011 The Authors Tellus A ?? 2011 John Wiley & Sons A/S.
Predicting protein function and other biomedical characteristics with heterogeneous ensembles
Whalen, Sean; Pandey, Om Prakash
2015-01-01
Prediction problems in biomedical sciences, including protein function prediction (PFP), are generally quite difficult. This is due in part to incomplete knowledge of the cellular phenomenon of interest, the appropriateness and data quality of the variables and measurements used for prediction, as well as a lack of consensus regarding the ideal predictor for specific problems. In such scenarios, a powerful approach to improving prediction performance is to construct heterogeneous ensemble predictors that combine the output of diverse individual predictors that capture complementary aspects of the problems and/or datasets. In this paper, we demonstrate the potential of such heterogeneous ensembles, derived from stacking and ensemble selection methods, for addressing PFP and other similar biomedical prediction problems. Deeper analysis of these results shows that the superior predictive ability of these methods, especially stacking, can be attributed to their attention to the following aspects of the ensemble learning process: (i) better balance of diversity and performance, (ii) more effective calibration of outputs and (iii) more robust incorporation of additional base predictors. Finally, to make the effective application of heterogeneous ensembles to large complex datasets (big data) feasible, we present DataSink, a distributed ensemble learning framework, and demonstrate its sound scalability using the examined datasets. DataSink is publicly available from https://github.com/shwhalen/datasink. PMID:26342255
Automated ensemble assembly and validation of microbial genomes.
Koren, Sergey; Treangen, Todd J; Hill, Christopher M; Pop, Mihai; Phillippy, Adam M
2014-05-03
The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible. To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers. Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.
NASA Astrophysics Data System (ADS)
Marhoubi, Asmaa H.; Saravi, Sara; Edirisinghe, Eran A.
2015-05-01
The present generation of mobile handheld devices comes equipped with a large number of sensors. The key sensors include the Ambient Light Sensor, Proximity Sensor, Gyroscope, Compass and the Accelerometer. Many mobile applications are driven based on the readings obtained from either one or two of these sensors. However the presence of multiple-sensors will enable the determination of more detailed activities that are carried out by the user of a mobile device, thus enabling smarter mobile applications to be developed that responds more appropriately to user behavior and device usage. In the proposed research we use recent advances in machine learning to fuse together the data obtained from all key sensors of a mobile device. We investigate the possible use of single and ensemble classifier based approaches to identify a mobile device's behavior in the space it is present. Feature selection algorithms are used to remove non-discriminant features that often lead to poor classifier performance. As the sensor readings are noisy and include a significant proportion of missing values and outliers, we use machine learning based approaches to clean the raw data obtained from the sensors, before use. Based on selected practical case studies, we demonstrate the ability to accurately recognize device behavior based on multi-sensor data fusion.
Młynarski, Wiktor
2014-01-01
To date a number of studies have shown that receptive field shapes of early sensory neurons can be reproduced by optimizing coding efficiency of natural stimulus ensembles. A still unresolved question is whether the efficient coding hypothesis explains formation of neurons which explicitly represent environmental features of different functional importance. This paper proposes that the spatial selectivity of higher auditory neurons emerges as a direct consequence of learning efficient codes for natural binaural sounds. Firstly, it is demonstrated that a linear efficient coding transform—Independent Component Analysis (ICA) trained on spectrograms of naturalistic simulated binaural sounds extracts spatial information present in the signal. A simple hierarchical ICA extension allowing for decoding of sound position is proposed. Furthermore, it is shown that units revealing spatial selectivity can be learned from a binaural recording of a natural auditory scene. In both cases a relatively small subpopulation of learned spectrogram features suffices to perform accurate sound localization. Representation of the auditory space is therefore learned in a purely unsupervised way by maximizing the coding efficiency and without any task-specific constraints. This results imply that efficient coding is a useful strategy for learning structures which allow for making behaviorally vital inferences about the environment. PMID:24639644
NASA Astrophysics Data System (ADS)
Cook, L. M.; Samaras, C.; Anderson, C.
2016-12-01
Engineers generally use historical precipitation trends to inform assumptions and parameters for long-lived infrastructure designs. However, resilient design calls for the adjustment of current engineering practice to incorporate a range of future climate conditions that are likely to be different than the past. Despite the availability of future projections from downscaled climate models, there remains a considerable mismatch between climate model outputs and the inputs needed in the engineering community to incorporate climate resiliency. These factors include differences in temporal and spatial scales, model uncertainties, and a lack of criteria for selection of an ensemble of models. This research addresses the limitations to working with climate data by providing a framework for the use of publicly available downscaled climate projections to inform engineering resiliency. The framework consists of five steps: 1) selecting the data source based on the engineering application, 2) extracting the data at a specific location, 3) validating for performance against observed data, 4) post-processing for bias or scale, and 5) selecting the ensemble and calculating statistics. The framework is illustrated with an example application to extreme precipitation-frequency statistics, the 25-year daily precipitation depth, using four publically available climate data sources: NARCCAP, USGS, Reclamation, and MACA. The attached figure presents the results for step 5 from the framework, analyzing how the 24H25Y depth changes when the model ensemble is culled based on model performance against observed data, for both post-processing techniques: bias-correction and change factor. Culling the model ensemble increases both the mean and median values for all data sources, and reduces range for NARCCAP and MACA ensembles due to elimination of poorer performing models, and in some cases, those that predict a decrease in future 24H25Y precipitation volumes. This result is especially relevant to engineers who wish to reduce the range of the ensemble and remove contradicting models; however, this result is not generalizable for all cases. Finally, this research highlights the need for the formation of an intermediate entity that is able to translate climate projections into relevant engineering information.
NASA Astrophysics Data System (ADS)
Wengel, C.; Latif, M.; Park, W.; Harlaß, J.; Bayr, T.
2018-02-01
The El Niño/Southern Oscillation (ENSO) is characterized by a seasonal phase locking, with strongest eastern and central equatorial Pacific sea surface temperature (SST) anomalies during boreal winter and weakest SST anomalies during boreal spring. In this study, key feedbacks controlling seasonal ENSO phase locking in the Kiel Climate Model (KCM) are identified by employing Bjerknes index stability analysis. A large ensemble of simulations with the KCM is analyzed, where the individual runs differ in either the number of vertical atmospheric levels or coefficients used in selected atmospheric parameterizations. All integrations use the identical ocean model. The ensemble-mean features realistic seasonal ENSO phase locking. ENSO phase locking is very sensitive to changes in the mean-state realized by the modifications described above. An excessive equatorial cold tongue leads to weak phase locking by reducing the Ekman feedback and thermocline feedback in late boreal fall and early boreal winter. Seasonal ENSO phase locking also is sensitive to the shortwave feedback as part of the thermal damping in early boreal spring, which strongly depends on eastern and central equatorial Pacific SST. The results obtained from the KCM are consistent with those from models participating in the Coupled Model Intercomparison Project phase 5 (CMIP5).
Johnson, David K; Karanicolas, John
2013-01-01
Despite intense interest and considerable effort via high-throughput screening, there are few examples of small molecules that directly inhibit protein-protein interactions. This suggests that many protein interaction surfaces may not be intrinsically "druggable" by small molecules, and elevates in importance the few successful examples as model systems for improving our fundamental understanding of druggability. Here we describe an approach for exploring protein fluctuations enriched in conformations containing surface pockets suitable for small molecule binding. Starting from a set of seven unbound protein structures, we find that the presence of low-energy pocket-containing conformations is indeed a signature of druggable protein interaction sites and that analogous surface pockets are not formed elsewhere on the protein. We further find that ensembles of conformations generated with this biased approach structurally resemble known inhibitor-bound structures more closely than equivalent ensembles of unbiased conformations. Collectively these results suggest that "druggability" is a property encoded on a protein surface through its propensity to form pockets, and inspire a model in which the crude features of the predisposed pocket(s) restrict the range of complementary ligands; additional smaller conformational changes then respond to details of a particular ligand. We anticipate that the insights described here will prove useful in selecting protein targets for therapeutic intervention.
2012-01-01
Background Existing methods for predicting protein solubility on overexpression in Escherichia coli advance performance by using ensemble classifiers such as two-stage support vector machine (SVM) based classifiers and a number of feature types such as physicochemical properties, amino acid and dipeptide composition, accompanied with feature selection. It is desirable to develop a simple and easily interpretable method for predicting protein solubility, compared to existing complex SVM-based methods. Results This study proposes a novel scoring card method (SCM) by using dipeptide composition only to estimate solubility scores of sequences for predicting protein solubility. SCM calculates the propensities of 400 individual dipeptides to be soluble using statistic discrimination between soluble and insoluble proteins of a training data set. Consequently, the propensity scores of all dipeptides are further optimized using an intelligent genetic algorithm. The solubility score of a sequence is determined by the weighted sum of all propensity scores and dipeptide composition. To evaluate SCM by performance comparisons, four data sets with different sizes and variation degrees of experimental conditions were used. The results show that the simple method SCM with interpretable propensities of dipeptides has promising performance, compared with existing SVM-based ensemble methods with a number of feature types. Furthermore, the propensities of dipeptides and solubility scores of sequences can provide insights to protein solubility. For example, the analysis of dipeptide scores shows high propensity of α-helix structure and thermophilic proteins to be soluble. Conclusions The propensities of individual dipeptides to be soluble are varied for proteins under altered experimental conditions. For accurately predicting protein solubility using SCM, it is better to customize the score card of dipeptide propensities by using a training data set under the same specified experimental conditions. The proposed method SCM with solubility scores and dipeptide propensities can be easily applied to the protein function prediction problems that dipeptide composition features play an important role. Availability The used datasets, source codes of SCM, and supplementary files are available at http://iclab.life.nctu.edu.tw/SCM/. PMID:23282103
NASA Astrophysics Data System (ADS)
Kwon, Young-Oh; Camacho, Alicia; Martinez, Carlos; Seo, Hyodae
2018-01-01
The atmospheric jet and blocking distributions, especially in the North Atlantic sector, have been challenging features for a climate model to realistically reproduce. This study examines climatological distributions of winter (December-February) daily jet latitude and blocking in the North Atlantic from the 40-member Community Earth System Model version 1 Large Ensemble (CESM1LE) simulations. This analysis aims at examining whether a broad range of internal climate variability encompassed by a large ensemble of simulations results in an improved representation of the jet latitude distributions and blocking days in CESM1LE. In the historical runs (1951-2005), the daily zonal wind at 850 hPa exhibits three distinct preferred latitudes for the eddy-driven jet position as seen in the reanalysis datasets, which represents a significant improvement from the previous version of the same model. However, the meridional separations between the three jet latitudes are much smaller than those in the reanalyses. In particular, the jet rarely migrates to the observed southernmost position around 37°N. This leads to the bias in blocking frequency that is too low over Greenland and too high over the Azores. These features are shown to be remarkably stable across the 40 ensemble members with negligible member-to-member spread. This result implies the range of internal variability of winter jet latitude and blocking frequency within the 55-year segment from each ensemble member is comparable to that represented by the full large ensemble. Comparison with 2046-2100 from the RCP8.5 future projection runs suggests that the daily jet position is projected to maintain the same three preferred latitudes, with a slightly higher frequency of occurrence over the central latitude around 50°N, instead of shifting poleward in the future as documented in some previous studies. In addition, the daily jet speed is projected not to change significantly between 1951-2005 and 2046-2100. On the other hand, the climatological mean jet is projected to become slightly more elongated and stronger on its southern flank, and the blocking frequency over the Azores is projected to decrease.
Causal network in a deafferented non-human primate brain.
Balasubramanian, Karthikeyan; Takahashi, Kazutaka; Hatsopoulos, Nicholas G
2015-01-01
De-afferented/efferented neural ensembles can undergo causal changes when interfaced to neuroprosthetic devices. These changes occur via recruitment or isolation of neurons, alterations in functional connectivity within the ensemble and/or changes in the role of neurons, i.e., excitatory/inhibitory. In this work, emergence of a causal network and changes in the dynamics are demonstrated for a deafferented brain region exposed to BMI (brain-machine interface) learning. The BMI was controlling a robot for reach-and-grasp behavior. And, the motor cortical regions used for the BMI were deafferented due to chronic amputation, and ensembles of neurons were decoded for velocity control of the multi-DOF robot. A generalized linear model-framework based Granger causality (GLM-GC) technique was used in estimating the ensemble connectivity. Model selection was based on the AIC (Akaike Information Criterion).
New Developments in the Data Assimilation Research Testbed
NASA Astrophysics Data System (ADS)
Hoar, T. J.; Anderson, J. L.; Raeder, K.; Karspeck, A. R.; Romine, G.; Liu, H.; Collins, N.
2011-12-01
NCAR's Data Assimilation Research Testbed (DART) is a community facility that provides ensemble data assimilation tools for geophysical applications. DART works with an expanding set of models and a wide range of conventional and novel observations, and provides a variety of assimilation algorithms and diagnostic tools. The Kodiak release of DART became available in July 2011 and includes more than 20 major feature enhancements, support for 24 models, support for (at least) 14 observation formats, expanded documentation and diagnostic tools, and 12 new utilities. A few examples of research projects that demonstrate the effectiveness and flexibility of the DART are described. The Community Atmosphere Model (CAM) and DART assimilated all the observations that were used in the NCEP/NCAR Reanalysis to produce a global, 6-hourly, 80-member ensemble reanalysis for 1998 through the present. The dataset is ideal for research applications that would benefit from an ensemble of equally-likely atmospheric states that are consistent with observations. Individual ensemble members may be used as a "data atmosphere" in any Community Earth System Model (CESM) experiment. The CESM interfaces for the Parallel Ocean Program (POP) and the Community Land Model (CLM) also support multiple instances, allowing data assimilation experiments exploiting unique atmospheric forcing for each POP or CLM model instance. A multi-year DART ocean assimilation has been completed and provides valuable insight into the successes and challenges of oceanic data assimilation. The DART/CLM research focuses on snow cover fraction and snow depth. The Weather Research and Forecasting (WRF) model was used with DART to perform a real-time CONUS domain mesoscale ensemble analysis with continuous cycling for 47 days. A member was selected once daily for high-resolution convective forecasts supporting a test phase of the Deep Convective Clouds and Chemistry experiment and the Storm Prediction Center spring experiment. The impacts of Moderate Resolution Imaging Spectroradiometer (MODIS) infrared and Advanced Microwave Scanning Radiometer (AMSR) microwave total precipitable water (TPW) observations on analyses and forecasts of tropical cyclone Sinlaku (2008) are investigated by performing assimilations with a 45km resolution WRF model over the Western Pacific domain for 8-14 Septmber, 2008. Particular emphasis is on the performance of the assimilation algorithms in the hurricane core and the impact of novel observations in the hurricane core.
exprso: an R-package for the rapid implementation of machine learning algorithms.
Quinn, Thomas; Tylee, Daniel; Glatt, Stephen
2016-01-01
Machine learning plays a major role in many scientific investigations. However, non-expert programmers may struggle to implement the elaborate pipelines necessary to build highly accurate and generalizable models. We introduce exprso , a new R package that is an intuitive machine learning suite designed specifically for non-expert programmers. Built initially for the classification of high-dimensional data, exprso uses an object-oriented framework to encapsulate a number of common analytical methods into a series of interchangeable modules. This includes modules for feature selection, classification, high-throughput parameter grid-searching, elaborate cross-validation schemes (e.g., Monte Carlo and nested cross-validation), ensemble classification, and prediction. In addition, exprso also supports multi-class classification (through the 1-vs-all generalization of binary classifiers) and the prediction of continuous outcomes.
Predicting students' happiness from physiology, phone, mobility, and behavioral data.
Jaques, Natasha; Taylor, Sara; Azaria, Asaph; Ghandeharioun, Asma; Sano, Akane; Picard, Rosalind
2015-09-01
In order to model students' happiness, we apply machine learning methods to data collected from undergrad students monitored over the course of one month each. The data collected include physiological signals, location, smartphone logs, and survey responses to behavioral questions. Each day, participants reported their wellbeing on measures including stress, health, and happiness. Because of the relationship between happiness and depression, modeling happiness may help us to detect individuals who are at risk of depression and guide interventions to help them. We are also interested in how behavioral factors (such as sleep and social activity) affect happiness positively and negatively. A variety of machine learning and feature selection techniques are compared, including Gaussian Mixture Models and ensemble classification. We achieve 70% classification accuracy of self-reported happiness on held-out test data.
Ali, Safdar; Majid, Abdul
2015-04-01
The diagnostic of human breast cancer is an intricate process and specific indicators may produce negative results. In order to avoid misleading results, accurate and reliable diagnostic system for breast cancer is indispensable. Recently, several interesting machine-learning (ML) approaches are proposed for prediction of breast cancer. To this end, we developed a novel classifier stacking based evolutionary ensemble system "Can-Evo-Ens" for predicting amino acid sequences associated with breast cancer. In this paper, first, we selected four diverse-type of ML algorithms of Naïve Bayes, K-Nearest Neighbor, Support Vector Machines, and Random Forest as base-level classifiers. These classifiers are trained individually in different feature spaces using physicochemical properties of amino acids. In order to exploit the decision spaces, the preliminary predictions of base-level classifiers are stacked. Genetic programming (GP) is then employed to develop a meta-classifier that optimal combine the predictions of the base classifiers. The most suitable threshold value of the best-evolved predictor is computed using Particle Swarm Optimization technique. Our experiments have demonstrated the robustness of Can-Evo-Ens system for independent validation dataset. The proposed system has achieved the highest value of Area Under Curve (AUC) of ROC Curve of 99.95% for cancer prediction. The comparative results revealed that proposed approach is better than individual ML approaches and conventional ensemble approaches of AdaBoostM1, Bagging, GentleBoost, and Random Subspace. It is expected that the proposed novel system would have a major impact on the fields of Biomedical, Genomics, Proteomics, Bioinformatics, and Drug Development. Copyright © 2015 Elsevier Inc. All rights reserved.
A statistical state dynamics approach to wall turbulence.
Farrell, B F; Gayme, D F; Ioannou, P J
2017-03-13
This paper reviews results obtained using statistical state dynamics (SSD) that demonstrate the benefits of adopting this perspective for understanding turbulence in wall-bounded shear flows. The SSD approach used in this work employs a second-order closure that retains only the interaction between the streamwise mean flow and the streamwise mean perturbation covariance. This closure restricts nonlinearity in the SSD to that explicitly retained in the streamwise constant mean flow together with nonlinear interactions between the mean flow and the perturbation covariance. This dynamical restriction, in which explicit perturbation-perturbation nonlinearity is removed from the perturbation equation, results in a simplified dynamics referred to as the restricted nonlinear (RNL) dynamics. RNL systems, in which a finite ensemble of realizations of the perturbation equation share the same mean flow, provide tractable approximations to the SSD, which is equivalent to an infinite ensemble RNL system. This infinite ensemble system, referred to as the stochastic structural stability theory system, introduces new analysis tools for studying turbulence. RNL systems provide computationally efficient means to approximate the SSD and produce self-sustaining turbulence exhibiting qualitative features similar to those observed in direct numerical simulations despite greatly simplified dynamics. The results presented show that RNL turbulence can be supported by as few as a single streamwise varying component interacting with the streamwise constant mean flow and that judicious selection of this truncated support or 'band-limiting' can be used to improve quantitative accuracy of RNL turbulence. These results suggest that the SSD approach provides new analytical and computational tools that allow new insights into wall turbulence.This article is part of the themed issue 'Toward the development of high-fidelity models of wall turbulence at large Reynolds number'. © 2017 The Author(s).
A statistical state dynamics approach to wall turbulence
Gayme, D. F.; Ioannou, P. J.
2017-01-01
This paper reviews results obtained using statistical state dynamics (SSD) that demonstrate the benefits of adopting this perspective for understanding turbulence in wall-bounded shear flows. The SSD approach used in this work employs a second-order closure that retains only the interaction between the streamwise mean flow and the streamwise mean perturbation covariance. This closure restricts nonlinearity in the SSD to that explicitly retained in the streamwise constant mean flow together with nonlinear interactions between the mean flow and the perturbation covariance. This dynamical restriction, in which explicit perturbation–perturbation nonlinearity is removed from the perturbation equation, results in a simplified dynamics referred to as the restricted nonlinear (RNL) dynamics. RNL systems, in which a finite ensemble of realizations of the perturbation equation share the same mean flow, provide tractable approximations to the SSD, which is equivalent to an infinite ensemble RNL system. This infinite ensemble system, referred to as the stochastic structural stability theory system, introduces new analysis tools for studying turbulence. RNL systems provide computationally efficient means to approximate the SSD and produce self-sustaining turbulence exhibiting qualitative features similar to those observed in direct numerical simulations despite greatly simplified dynamics. The results presented show that RNL turbulence can be supported by as few as a single streamwise varying component interacting with the streamwise constant mean flow and that judicious selection of this truncated support or ‘band-limiting’ can be used to improve quantitative accuracy of RNL turbulence. These results suggest that the SSD approach provides new analytical and computational tools that allow new insights into wall turbulence. This article is part of the themed issue ‘Toward the development of high-fidelity models of wall turbulence at large Reynolds number’. PMID:28167577
Chen, Wei; Li, Hui; Hou, Enke; Wang, Shengquan; Wang, Guirong; Panahi, Mahdi; Li, Tao; Peng, Tao; Guo, Chen; Niu, Chao; Xiao, Lele; Wang, Jiale; Xie, Xiaoshen; Ahmad, Baharin Bin
2018-09-01
The aim of the current study was to produce groundwater spring potential maps using novel ensemble weights-of-evidence (WoE) with logistic regression (LR) and functional tree (FT) models. First, a total of 66 springs were identified by field surveys, out of which 70% of the spring locations were used for training the models and 30% of the spring locations were employed for the validation process. Second, a total of 14 affecting factors including aspect, altitude, slope, plan curvature, profile curvature, stream power index (SPI), topographic wetness index (TWI), sediment transport index (STI), lithology, normalized difference vegetation index (NDVI), land use, soil, distance to roads, and distance to streams was used to analyze the spatial relationship between these affecting factors and spring occurrences. Multicollinearity analysis and feature selection of the correlation attribute evaluation (CAE) method were employed to optimize the affecting factors. Subsequently, the novel ensembles of the WoE, LR, and FT models were constructed using the training dataset. Finally, the receiver operating characteristic (ROC) curves, standard error, confidence interval (CI) at 95%, and significance level P were employed to validate and compare the performance of three models. Overall, all three models performed well for groundwater spring potential evaluation. The prediction capability of the FT model, with the highest AUC values, the smallest standard errors, the narrowest CIs, and the smallest P values for the training and validation datasets, is better compared to those of other models. The groundwater spring potential maps can be adopted for the management of water resources and land use by planners and engineers. Copyright © 2018 Elsevier B.V. All rights reserved.
Stereo-selective partitioning of translation-to-internal energy conversion in gas ensembles
DOE Office of Scientific and Technical Information (OSTI.GOV)
McCaffery, Anthony J., E-mail: A.J.McCaffery@sussex.ac.uk
2014-11-07
A recent computational study of translation-to-internal energy transfer to H{sub 2} (v = 0,j = 0), hereinafter denoted H{sub 2} (0;0), in a bath of H atoms [A. J. McCaffery and R. J. Marsh, J. Chem. Phys. 139, 234310 (2013)] revealed an unexpected energy partitioning in which the H{sub 2} vibrational temperature greatly exceeds that of rotation. This occurs despite rotation and vibration distributions being close to Boltzmann from early in ensemble evolution. In this work, the study is extended to include H{sub 2} (0;0), O{sub 2} (0;0), and HF (0;0) in a wide range of atomic bath gases comprisingmore » some 22 ensembles in all. Translation-to-internal energy conversion in the systems studied was found to be relatively inefficient, falling approximately with (√μ′){sup −1} as bath gas mass increases, where μ′ is the reduced mass of the diatomic–bath gas pair. In all 22 systems studied, T{sub v} exceeds T{sub r} – by a factor > 4 for some pairs. Analysis of the constraints that influence (0;0) → (1;j) excitation for each diatomic–atom pair in momentum–angular momentum space demonstrates that a vibrational preference results from energy constraints that limit permitted collision trajectories to those of low effective impact parameter, i.e., to those that are axial or near axial on impact with the Newton surface. This implies that a steric constraint is an inherent feature of vibration-rotation excitation and arises because momentum and energy barriers must be overcome before rotational states may be populated in the higher vibrational level.« less
Random-Forest Classification of High-Resolution Remote Sensing Images and Ndsm Over Urban Areas
NASA Astrophysics Data System (ADS)
Sun, X. F.; Lin, X. G.
2017-09-01
As an intermediate step between raw remote sensing data and digital urban maps, remote sensing data classification has been a challenging and long-standing research problem in the community of remote sensing. In this work, an effective classification method is proposed for classifying high-resolution remote sensing data over urban areas. Starting from high resolution multi-spectral images and 3D geometry data, our method proceeds in three main stages: feature extraction, classification, and classified result refinement. First, we extract color, vegetation index and texture features from the multi-spectral image and compute the height, elevation texture and differential morphological profile (DMP) features from the 3D geometry data. Then in the classification stage, multiple random forest (RF) classifiers are trained separately, then combined to form a RF ensemble to estimate each sample's category probabilities. Finally the probabilities along with the feature importance indicator outputted by RF ensemble are used to construct a fully connected conditional random field (FCCRF) graph model, by which the classification results are refined through mean-field based statistical inference. Experiments on the ISPRS Semantic Labeling Contest dataset show that our proposed 3-stage method achieves 86.9% overall accuracy on the test data.
Short text sentiment classification based on feature extension and ensemble classifier
NASA Astrophysics Data System (ADS)
Liu, Yang; Zhu, Xie
2018-05-01
With the rapid development of Internet social media, excavating the emotional tendencies of the short text information from the Internet, the acquisition of useful information has attracted the attention of researchers. At present, the commonly used can be attributed to the rule-based classification and statistical machine learning classification methods. Although micro-blog sentiment analysis has made good progress, there still exist some shortcomings such as not highly accurate enough and strong dependence from sentiment classification effect. Aiming at the characteristics of Chinese short texts, such as less information, sparse features, and diverse expressions, this paper considers expanding the original text by mining related semantic information from the reviews, forwarding and other related information. First, this paper uses Word2vec to compute word similarity to extend the feature words. And then uses an ensemble classifier composed of SVM, KNN and HMM to analyze the emotion of the short text of micro-blog. The experimental results show that the proposed method can make good use of the comment forwarding information to extend the original features. Compared with the traditional method, the accuracy, recall and F1 value obtained by this method have been improved.
Numerical weather prediction model tuning via ensemble prediction system
NASA Astrophysics Data System (ADS)
Jarvinen, H.; Laine, M.; Ollinaho, P.; Solonen, A.; Haario, H.
2011-12-01
This paper discusses a novel approach to tune predictive skill of numerical weather prediction (NWP) models. NWP models contain tunable parameters which appear in parameterizations schemes of sub-grid scale physical processes. Currently, numerical values of these parameters are specified manually. In a recent dual manuscript (QJRMS, revised) we developed a new concept and method for on-line estimation of the NWP model parameters. The EPPES ("Ensemble prediction and parameter estimation system") method requires only minimal changes to the existing operational ensemble prediction infra-structure and it seems very cost-effective because practically no new computations are introduced. The approach provides an algorithmic decision making tool for model parameter optimization in operational NWP. In EPPES, statistical inference about the NWP model tunable parameters is made by (i) generating each member of the ensemble of predictions using different model parameter values, drawn from a proposal distribution, and (ii) feeding-back the relative merits of the parameter values to the proposal distribution, based on evaluation of a suitable likelihood function against verifying observations. In the presentation, the method is first illustrated in low-order numerical tests using a stochastic version of the Lorenz-95 model which effectively emulates the principal features of ensemble prediction systems. The EPPES method correctly detects the unknown and wrongly specified parameters values, and leads to an improved forecast skill. Second, results with an atmospheric general circulation model based ensemble prediction system show that the NWP model tuning capacity of EPPES scales up to realistic models and ensemble prediction systems. Finally, a global top-end NWP model tuning exercise with preliminary results is published.
Ensemble Pruning for Glaucoma Detection in an Unbalanced Data Set.
Adler, Werner; Gefeller, Olaf; Gul, Asma; Horn, Folkert K; Khan, Zardad; Lausen, Berthold
2016-12-07
Random forests are successful classifier ensemble methods consisting of typically 100 to 1000 classification trees. Ensemble pruning techniques reduce the computational cost, especially the memory demand, of random forests by reducing the number of trees without relevant loss of performance or even with increased performance of the sub-ensemble. The application to the problem of an early detection of glaucoma, a severe eye disease with low prevalence, based on topographical measurements of the eye background faces specific challenges. We examine the performance of ensemble pruning strategies for glaucoma detection in an unbalanced data situation. The data set consists of 102 topographical features of the eye background of 254 healthy controls and 55 glaucoma patients. We compare the area under the receiver operating characteristic curve (AUC), and the Brier score on the total data set, in the majority class, and in the minority class of pruned random forest ensembles obtained with strategies based on the prediction accuracy of greedily grown sub-ensembles, the uncertainty weighted accuracy, and the similarity between single trees. To validate the findings and to examine the influence of the prevalence of glaucoma in the data set, we additionally perform a simulation study with lower prevalences of glaucoma. In glaucoma classification all three pruning strategies lead to improved AUC and smaller Brier scores on the total data set with sub-ensembles as small as 30 to 80 trees compared to the classification results obtained with the full ensemble consisting of 1000 trees. In the simulation study, we were able to show that the prevalence of glaucoma is a critical factor and lower prevalence decreases the performance of our pruning strategies. The memory demand for glaucoma classification in an unbalanced data situation based on random forests could effectively be reduced by the application of pruning strategies without loss of performance in a population with increased risk of glaucoma.
NASA Astrophysics Data System (ADS)
Roberge, S.; Chokmani, K.; De Sève, D.
2012-04-01
The snow cover plays an important role in the hydrological cycle of Quebec (Eastern Canada). Consequently, evaluating its spatial extent interests the authorities responsible for the management of water resources, especially hydropower companies. The main objective of this study is the development of a snow-cover mapping strategy using remote sensing data and ensemble based systems techniques. Planned to be tested in a near real-time operational mode, this snow-cover mapping strategy has the advantage to provide the probability of a pixel to be snow covered and its uncertainty. Ensemble systems are made of two key components. First, a method is needed to build an ensemble of classifiers that is diverse as much as possible. Second, an approach is required to combine the outputs of individual classifiers that make up the ensemble in such a way that correct decisions are amplified, and incorrect ones are cancelled out. In this study, we demonstrate the potential of ensemble systems to snow-cover mapping using remote sensing data. The chosen classifier is a sequential thresholds algorithm using NOAA-AVHRR data adapted to conditions over Eastern Canada. Its special feature is the use of a combination of six sequential thresholds varying according to the day in the winter season. Two versions of the snow-cover mapping algorithm have been developed: one is specific for autumn (from October 1st to December 31st) and the other for spring (from March 16th to May 31st). In order to build the ensemble based system, different versions of the algorithm are created by varying randomly its parameters. One hundred of the versions are included in the ensemble. The probability of a pixel to be snow, no-snow or cloud covered corresponds to the amount of votes the pixel has been classified as such by all classifiers. The overall performance of ensemble based mapping is compared to the overall performance of the chosen classifier, and also with ground observations at meteorological stations.
Design of an Evolutionary Approach for Intrusion Detection
2013-01-01
A novel evolutionary approach is proposed for effective intrusion detection based on benchmark datasets. The proposed approach can generate a pool of noninferior individual solutions and ensemble solutions thereof. The generated ensembles can be used to detect the intrusions accurately. For intrusion detection problem, the proposed approach could consider conflicting objectives simultaneously like detection rate of each attack class, error rate, accuracy, diversity, and so forth. The proposed approach can generate a pool of noninferior solutions and ensembles thereof having optimized trade-offs values of multiple conflicting objectives. In this paper, a three-phase, approach is proposed to generate solutions to a simple chromosome design in the first phase. In the first phase, a Pareto front of noninferior individual solutions is approximated. In the second phase of the proposed approach, the entire solution set is further refined to determine effective ensemble solutions considering solution interaction. In this phase, another improved Pareto front of ensemble solutions over that of individual solutions is approximated. The ensemble solutions in improved Pareto front reported improved detection results based on benchmark datasets for intrusion detection. In the third phase, a combination method like majority voting method is used to fuse the predictions of individual solutions for determining prediction of ensemble solution. Benchmark datasets, namely, KDD cup 1999 and ISCX 2012 dataset, are used to demonstrate and validate the performance of the proposed approach for intrusion detection. The proposed approach can discover individual solutions and ensemble solutions thereof with a good support and a detection rate from benchmark datasets (in comparison with well-known ensemble methods like bagging and boosting). In addition, the proposed approach is a generalized classification approach that is applicable to the problem of any field having multiple conflicting objectives, and a dataset can be represented in the form of labelled instances in terms of its features. PMID:24376390
Climatic Models Ensemble-based Mid-21st Century Runoff Projections: A Bayesian Framework
NASA Astrophysics Data System (ADS)
Achieng, K. O.; Zhu, J.
2017-12-01
There are a number of North American Regional Climate Change Assessment Program (NARCCAP) climatic models that have been used to project surface runoff in the mid-21st century. Statistical model selection techniques are often used to select the model that best fits data. However, model selection techniques often lead to different conclusions. In this study, ten models are averaged in Bayesian paradigm to project runoff. Bayesian Model Averaging (BMA) is used to project and identify effect of model uncertainty on future runoff projections. Baseflow separation - a two-digital filter which is also called Eckhardt filter - is used to separate USGS streamflow (total runoff) into two components: baseflow and surface runoff. We use this surface runoff as the a priori runoff when conducting BMA of runoff simulated from the ten RCM models. The primary objective of this study is to evaluate how well RCM multi-model ensembles simulate surface runoff, in a Bayesian framework. Specifically, we investigate and discuss the following questions: How well do ten RCM models ensemble jointly simulate surface runoff by averaging over all the models using BMA, given a priori surface runoff? What are the effects of model uncertainty on surface runoff simulation?
NASA Astrophysics Data System (ADS)
Karmalkar, A.; Sexton, D.; Murphy, J.
2017-12-01
We present exploratory work towards developing an efficient strategy to select variants of a state-of-the-art but expensive climate model suitable for climate projection studies. The strategy combines information from a set of idealized perturbed parameter ensemble (PPE) and CMIP5 multi-model ensemble (MME) experiments, and uses two criteria as basis to select model variants for a PPE suitable for future projections: a) acceptable model performance at two different timescales, and b) maintaining diversity in model response to climate change. We demonstrate that there is a strong relationship between model errors at weather and climate timescales for a variety of key variables. This relationship is used to filter out parts of parameter space that do not give credible simulations of historical climate, while minimizing the impact on ranges in forcings and feedbacks that drive model responses to climate change. We use statistical emulation to explore the parameter space thoroughly, and demonstrate that about 90% can be filtered out without affecting diversity in global-scale climate change responses. This leads to identification of plausible parts of parameter space from which model variants can be selected for projection studies.
NASA Astrophysics Data System (ADS)
Iachimciuc, Igor
The dissertation is in two parts, a theoretical study and a musical composition. In Part I the music of Gyorgy Kurtag is analyzed from the point of view of sound color. A brief description of what is understood by the term sound color, and various ways of achieving specific coloristic effects, are presented in the Introduction. An examination of Kurtag's approaches to the domain of sound color occupies the chapters that follow. The musical examples that are analyzed are selected from Kurtag's different compositional periods, showing a certain consistency in sound color techniques, the most important of which are already present in the String Quartet, Op. 1. The compositions selected for analysis are written for different ensembles, but regardless of the instrumentation, certain principles of the formation and organization of sound color remain the same. Rather than relying on extended instrumental techniques, Kurtag creates a large variety of sound colors using traditional means such as pitch material, register, density, rhythm, timbral combinations, dynamics, texture, spatial displacement of the instruments, and the overall musical context. Each sound color unit in Kurtag's music is a separate entity, conceived as a complete microcosm. Sound color units can either be juxtaposed as contrasting elements, forming sound color variations, or superimposed, often resulting in a Klangfarbenmelodie effect. Some of the same gestural figures (objets trouves) appear in different compositions, but with significant coloristic modifications. Thus, the principle of sound color variations is not only a strong organizational tool, but also a characteristic stylistic feature of the music of Gyorgy Kurtag. Part II, Leopard's Path (2010), for flute, clarinet, violin, cello, cimbalom, and piano, is an original composition inspired by the painting of Jesse Allen, a San Francisco based artist. The composition is conceived as a cycle of thirteen short movements. Ten of these movements are the musical interpretation of the objects presented in the painting, and are stylistically similar. These movements are scored for the entire ensemble. The other three movements, entitled Interludes, provide a stylistic contrast, and are not directly connected with the painting.
Molecular dynamics simulations using temperature-enhanced essential dynamics replica exchange.
Kubitzki, Marcus B; de Groot, Bert L
2007-06-15
Today's standard molecular dynamics simulations of moderately sized biomolecular systems at full atomic resolution are typically limited to the nanosecond timescale and therefore suffer from limited conformational sampling. Efficient ensemble-preserving algorithms like replica exchange (REX) may alleviate this problem somewhat but are still computationally prohibitive due to the large number of degrees of freedom involved. Aiming at increased sampling efficiency, we present a novel simulation method combining the ideas of essential dynamics and REX. Unlike standard REX, in each replica only a selection of essential collective modes of a subsystem of interest (essential subspace) is coupled to a higher temperature, with the remainder of the system staying at a reference temperature, T(0). This selective excitation along with the replica framework permits efficient approximate ensemble-preserving conformational sampling and allows much larger temperature differences between replicas, thereby considerably enhancing sampling efficiency. Ensemble properties and sampling performance of the method are discussed using dialanine and guanylin test systems, with multi-microsecond molecular dynamics simulations of these test systems serving as references.
Krishnan, Ranjani; Walton, Emily B; Van Vliet, Krystyn J
2009-11-01
As computational resources increase, molecular dynamics simulations of biomolecules are becoming an increasingly informative complement to experimental studies. In particular, it has now become feasible to use multiple initial molecular configurations to generate an ensemble of replicate production-run simulations that allows for more complete characterization of rare events such as ligand-receptor unbinding. However, there are currently no explicit guidelines for selecting an ensemble of initial configurations for replicate simulations. Here, we use clustering analysis and steered molecular dynamics simulations to demonstrate that the configurational changes accessible in molecular dynamics simulations of biomolecules do not necessarily correlate with observed rare-event properties. This informs selection of a representative set of initial configurations. We also employ statistical analysis to identify the minimum number of replicate simulations required to sufficiently sample a given biomolecular property distribution. Together, these results suggest a general procedure for generating an ensemble of replicate simulations that will maximize accurate characterization of rare-event property distributions in biomolecules.
Chetverikov, Andrey; Campana, Gianluca; Kristjánsson, Árni
2017-10-01
Colors are rarely uniform, yet little is known about how people represent color distributions. We introduce a new method for studying color ensembles based on intertrial learning in visual search. Participants looked for an oddly colored diamond among diamonds with colors taken from either uniform or Gaussian color distributions. On test trials, the targets had various distances in feature space from the mean of the preceding distractor color distribution. Targets on test trials therefore served as probes into probabilistic representations of distractor colors. Test-trial response times revealed a striking similarity between the physical distribution of colors and their internal representations. The results demonstrate that the visual system represents color ensembles in a more detailed way than previously thought, coding not only mean and variance but, most surprisingly, the actual shape (uniform or Gaussian) of the distribution of colors in the environment.
Liu, Jing; Zhao, Songzheng; Wang, Gang
2018-01-01
With the development of Web 2.0 technology, social media websites have become lucrative but under-explored data sources for extracting adverse drug events (ADEs), which is a serious health problem. Besides ADE, other semantic relation types (e.g., drug indication and beneficial effect) could hold between the drug and adverse event mentions, making ADE relation extraction - distinguishing ADE relationship from other relation types - necessary. However, conducting ADE relation extraction in social media environment is not a trivial task because of the expertise-dependent, time-consuming and costly annotation process, and the feature space's high-dimensionality attributed to intrinsic characteristics of social media data. This study aims to develop a framework for ADE relation extraction using patient-generated content in social media with better performance than that delivered by previous efforts. To achieve the objective, a general semi-supervised ensemble learning framework, SSEL-ADE, was developed. The framework exploited various lexical, semantic, and syntactic features, and integrated ensemble learning and semi-supervised learning. A series of experiments were conducted to verify the effectiveness of the proposed framework. Empirical results demonstrate the effectiveness of each component of SSEL-ADE and reveal that our proposed framework outperforms most of existing ADE relation extraction methods The SSEL-ADE can facilitate enhanced ADE relation extraction performance, thereby providing more reliable support for pharmacovigilance. Moreover, the proposed semi-supervised ensemble methods have the potential of being applied to effectively deal with other social media-based problems. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Liu, Li; Gao, Chao; Xuan, Weidong; Xu, Yue-Ping
2017-11-01
Ensemble flood forecasts by hydrological models using numerical weather prediction products as forcing data are becoming more commonly used in operational flood forecasting applications. In this study, a hydrological ensemble flood forecasting system comprised of an automatically calibrated Variable Infiltration Capacity model and quantitative precipitation forecasts from TIGGE dataset is constructed for Lanjiang Basin, Southeast China. The impacts of calibration strategies and ensemble methods on the performance of the system are then evaluated. The hydrological model is optimized by the parallel programmed ε-NSGA II multi-objective algorithm. According to the solutions by ε-NSGA II, two differently parameterized models are determined to simulate daily flows and peak flows at each of the three hydrological stations. Then a simple yet effective modular approach is proposed to combine these daily and peak flows at the same station into one composite series. Five ensemble methods and various evaluation metrics are adopted. The results show that ε-NSGA II can provide an objective determination on parameter estimation, and the parallel program permits a more efficient simulation. It is also demonstrated that the forecasts from ECMWF have more favorable skill scores than other Ensemble Prediction Systems. The multimodel ensembles have advantages over all the single model ensembles and the multimodel methods weighted on members and skill scores outperform other methods. Furthermore, the overall performance at three stations can be satisfactory up to ten days, however the hydrological errors can degrade the skill score by approximately 2 days, and the influence persists until a lead time of 10 days with a weakening trend. With respect to peak flows selected by the Peaks Over Threshold approach, the ensemble means from single models or multimodels are generally underestimated, indicating that the ensemble mean can bring overall improvement in forecasting of flows. For peak values taking flood forecasts from each individual member into account is more appropriate.
NASA Astrophysics Data System (ADS)
Clark, E.; Wood, A.; Nijssen, B.; Newman, A. J.; Mendoza, P. A.
2016-12-01
The System for Hydrometeorological Applications, Research and Prediction (SHARP), developed at the National Center for Atmospheric Research (NCAR), University of Washington, U.S. Army Corps of Engineers, and U.S. Bureau of Reclamation, is a fully automated ensemble prediction system for short-term to seasonal applications. It incorporates uncertainty in initial hydrologic conditions (IHCs) and in hydrometeorological predictions. In this implementation, IHC uncertainty is estimated by propagating an ensemble of 100 plausible temperature and precipitation time series through the Sacramento/Snow-17 model. The forcing ensemble explicitly accounts for measurement and interpolation uncertainties in the development of gridded meteorological forcing time series. The resulting ensemble of derived IHCs exhibits a broad range of possible soil moisture and snow water equivalent (SWE) states. To select the IHCs that are most consistent with the observations, we employ a particle filter (PF) that weights IHC ensemble members based on observations of streamflow and SWE. These particles are then used to initialize ensemble precipitation and temperature forecasts downscaled from the Global Ensemble Forecast System (GEFS), generating a streamflow forecast ensemble. We test this method in two basins in the Pacific Northwest that are important for water resources management: 1) the Green River upstream of Howard Hanson Dam, and 2) the South Fork Flathead River upstream of Hungry Horse Dam. The first of these is characterized by mixed snow and rain, while the second is snow-dominated. The PF-based forecasts are compared to forecasts based on a single IHC (corresponding to median streamflow) paired with the full GEFS ensemble, and 2) the full IHC ensemble, without filtering, paired with the full GEFS ensemble. In addition to assessing improvements in the spread of IHCs, we perform a hindcast experiment to evaluate the utility of PF-based data assimilation on streamflow forecasts at 1- to 7-day lead times.
Uncertainties in climate assessment for the case of aviation NO
Holmes, Christopher D.; Tang, Qi; Prather, Michael J.
2011-01-01
Nitrogen oxides emitted from aircraft engines alter the chemistry of the atmosphere, perturbing the greenhouse gases methane (CH4) and ozone (O3). We quantify uncertainties in radiative forcing (RF) due to short-lived increases in O3, long-lived decreases in CH4 and O3, and their net effect, using the ensemble of published models and a factor decomposition of each forcing. The decomposition captures major features of the ensemble, and also shows which processes drive the total uncertainty in several climate metrics. Aviation-specific factors drive most of the uncertainty for the short-lived O3 and long-lived CH4 RFs, but a nonaviation factor dominates for long-lived O3. The model ensemble shows strong anticorrelation between the short-lived and long-lived RF perturbations (R2 = 0.87). Uncertainty in the net RF is highly sensitive to this correlation. We reproduce the correlation and ensemble spread in one model, showing that processes controlling the background tropospheric abundance of nitrogen oxides are likely responsible for the modeling uncertainty in climate impacts from aviation. PMID:21690364
Four types of ensemble coding in data visualizations.
Szafir, Danielle Albers; Haroz, Steve; Gleicher, Michael; Franconeri, Steven
2016-01-01
Ensemble coding supports rapid extraction of visual statistics about distributed visual information. Researchers typically study this ability with the goal of drawing conclusions about how such coding extracts information from natural scenes. Here we argue that a second domain can serve as another strong inspiration for understanding ensemble coding: graphs, maps, and other visual presentations of data. Data visualizations allow observers to leverage their ability to perform visual ensemble statistics on distributions of spatial or featural visual information to estimate actual statistics on data. We survey the types of visual statistical tasks that occur within data visualizations across everyday examples, such as scatterplots, and more specialized images, such as weather maps or depictions of patterns in text. We divide these tasks into four categories: identification of sets of values, summarization across those values, segmentation of collections, and estimation of structure. We point to unanswered questions for each category and give examples of such cross-pollination in the current literature. Increased collaboration between the data visualization and perceptual psychology research communities can inspire new solutions to challenges in visualization while simultaneously exposing unsolved problems in perception research.
Clustering-Based Ensemble Learning for Activity Recognition in Smart Homes
Jurek, Anna; Nugent, Chris; Bi, Yaxin; Wu, Shengli
2014-01-01
Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks. PMID:25014095
Clustering-based ensemble learning for activity recognition in smart homes.
Jurek, Anna; Nugent, Chris; Bi, Yaxin; Wu, Shengli
2014-07-10
Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks.
Muhlestein, Whitney E; Akagi, Dallin S; Kallos, Justiss A; Morone, Peter J; Weaver, Kyle D; Thompson, Reid C; Chambless, Lola B
2018-04-01
Objective Machine learning (ML) algorithms are powerful tools for predicting patient outcomes. This study pilots a novel approach to algorithm selection and model creation using prediction of discharge disposition following meningioma resection as a proof of concept. Materials and Methods A diversity of ML algorithms were trained on a single-institution database of meningioma patients to predict discharge disposition. Algorithms were ranked by predictive power and top performers were combined to create an ensemble model. The final ensemble was internally validated on never-before-seen data to demonstrate generalizability. The predictive power of the ensemble was compared with a logistic regression. Further analyses were performed to identify how important variables impact the ensemble. Results Our ensemble model predicted disposition significantly better than a logistic regression (area under the curve of 0.78 and 0.71, respectively, p = 0.01). Tumor size, presentation at the emergency department, body mass index, convexity location, and preoperative motor deficit most strongly influence the model, though the independent impact of individual variables is nuanced. Conclusion Using a novel ML technique, we built a guided ML ensemble model that predicts discharge destination following meningioma resection with greater predictive power than a logistic regression, and that provides greater clinical insight than a univariate analysis. These techniques can be extended to predict many other patient outcomes of interest.
Li, Shaobo; Liu, Guokai; Tang, Xianghong; Lu, Jianguang; Hu, Jianjun
2017-07-28
Intelligent machine health monitoring and fault diagnosis are becoming increasingly important for modern manufacturing industries. Current fault diagnosis approaches mostly depend on expert-designed features for building prediction models. In this paper, we proposed IDSCNN, a novel bearing fault diagnosis algorithm based on ensemble deep convolutional neural networks and an improved Dempster-Shafer theory based evidence fusion. The convolutional neural networks take the root mean square (RMS) maps from the FFT (Fast Fourier Transformation) features of the vibration signals from two sensors as inputs. The improved D-S evidence theory is implemented via distance matrix from evidences and modified Gini Index. Extensive evaluations of the IDSCNN on the Case Western Reserve Dataset showed that our IDSCNN algorithm can achieve better fault diagnosis performance than existing machine learning methods by fusing complementary or conflicting evidences from different models and sensors and adapting to different load conditions.
A novel method for intelligent fault diagnosis of rolling bearings using ensemble deep auto-encoders
NASA Astrophysics Data System (ADS)
Shao, Haidong; Jiang, Hongkai; Lin, Ying; Li, Xingqiu
2018-03-01
Automatic and accurate identification of rolling bearings fault categories, especially for the fault severities and fault orientations, is still a major challenge in rotating machinery fault diagnosis. In this paper, a novel method called ensemble deep auto-encoders (EDAEs) is proposed for intelligent fault diagnosis of rolling bearings. Firstly, different activation functions are employed as the hidden functions to design a series of auto-encoders (AEs) with different characteristics. Secondly, EDAEs are constructed with various auto-encoders for unsupervised feature learning from the measured vibration signals. Finally, a combination strategy is designed to ensure accurate and stable diagnosis results. The proposed method is applied to analyze the experimental bearing vibration signals. The results confirm that the proposed method can get rid of the dependence on manual feature extraction and overcome the limitations of individual deep learning models, which is more effective than the existing intelligent diagnosis methods.
Li, Shaobo; Liu, Guokai; Tang, Xianghong; Lu, Jianguang
2017-01-01
Intelligent machine health monitoring and fault diagnosis are becoming increasingly important for modern manufacturing industries. Current fault diagnosis approaches mostly depend on expert-designed features for building prediction models. In this paper, we proposed IDSCNN, a novel bearing fault diagnosis algorithm based on ensemble deep convolutional neural networks and an improved Dempster–Shafer theory based evidence fusion. The convolutional neural networks take the root mean square (RMS) maps from the FFT (Fast Fourier Transformation) features of the vibration signals from two sensors as inputs. The improved D-S evidence theory is implemented via distance matrix from evidences and modified Gini Index. Extensive evaluations of the IDSCNN on the Case Western Reserve Dataset showed that our IDSCNN algorithm can achieve better fault diagnosis performance than existing machine learning methods by fusing complementary or conflicting evidences from different models and sensors and adapting to different load conditions. PMID:28788099
A Fractional Cartesian Composition Model for Semi-Spatial Comparative Visualization Design.
Kolesar, Ivan; Bruckner, Stefan; Viola, Ivan; Hauser, Helwig
2017-01-01
The study of spatial data ensembles leads to substantial visualization challenges in a variety of applications. In this paper, we present a model for comparative visualization that supports the design of according ensemble visualization solutions by partial automation. We focus on applications, where the user is interested in preserving selected spatial data characteristics of the data as much as possible-even when many ensemble members should be jointly studied using comparative visualization. In our model, we separate the design challenge into a minimal set of user-specified parameters and an optimization component for the automatic configuration of the remaining design variables. We provide an illustrated formal description of our model and exemplify our approach in the context of several application examples from different domains in order to demonstrate its generality within the class of comparative visualization problems for spatial data ensembles.
NASA Astrophysics Data System (ADS)
O'Connor, Alison; Kirtman, Benjamin; Harrison, Scott; Gorman, Joe
2016-05-01
The US Navy faces several limitations when planning operations in regard to forecasting environmental conditions. Currently, mission analysis and planning tools rely heavily on short-term (less than a week) forecasts or long-term statistical climate products. However, newly available data in the form of weather forecast ensembles provides dynamical and statistical extended-range predictions that can produce more accurate predictions if ensemble members can be combined correctly. Charles River Analytics is designing the Climatological Observations for Maritime Prediction and Analysis Support Service (COMPASS), which performs data fusion over extended-range multi-model ensembles, such as the North American Multi-Model Ensemble (NMME), to produce a unified forecast for several weeks to several seasons in the future. We evaluated thirty years of forecasts using machine learning to select predictions for an all-encompassing and superior forecast that can be used to inform the Navy's decision planning process.
Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data
Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.
2016-01-01
We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872
An ensemble rank learning approach for gene prioritization.
Lee, Po-Feng; Soo, Von-Wun
2013-01-01
Several different computational approaches have been developed to solve the gene prioritization problem. We intend to use the ensemble boosting learning techniques to combine variant computational approaches for gene prioritization in order to improve the overall performance. In particular we add a heuristic weighting function to the Rankboost algorithm according to: 1) the absolute ranks generated by the adopted methods for a certain gene, and 2) the ranking relationship between all gene-pairs from each prioritization result. We select 13 known prostate cancer genes in OMIM database as training set and protein coding gene data in HGNC database as test set. We adopt the leave-one-out strategy for the ensemble rank boosting learning. The experimental results show that our ensemble learning approach outperforms the four gene-prioritization methods in ToppGene suite in the ranking results of the 13 known genes in terms of mean average precision, ROC and AUC measures.
Ross, Joseph S; Bates, Jonathan; Parzynski, Craig S; Akar, Joseph G; Curtis, Jeptha P; Desai, Nihar R; Freeman, James V; Gamble, Ginger M; Kuntz, Richard; Li, Shu-Xia; Marinac-Dabic, Danica; Masoudi, Frederick A; Normand, Sharon-Lise T; Ranasinghe, Isuru; Shaw, Richard E; Krumholz, Harlan M
2017-01-01
Background Machine learning methods may complement traditional analytic methods for medical device surveillance. Methods and results Using data from the National Cardiovascular Data Registry for implantable cardioverter–defibrillators (ICDs) linked to Medicare administrative claims for longitudinal follow-up, we applied three statistical approaches to safety-signal detection for commonly used dual-chamber ICDs that used two propensity score (PS) models: one specified by subject-matter experts (PS-SME), and the other one by machine learning-based selection (PS-ML). The first approach used PS-SME and cumulative incidence (time-to-event), the second approach used PS-SME and cumulative risk (Data Extraction and Longitudinal Trend Analysis [DELTA]), and the third approach used PS-ML and cumulative risk (embedded feature selection). Safety-signal surveillance was conducted for eleven dual-chamber ICD models implanted at least 2,000 times over 3 years. Between 2006 and 2010, there were 71,948 Medicare fee-for-service beneficiaries who received dual-chamber ICDs. Cumulative device-specific unadjusted 3-year event rates varied for three surveyed safety signals: death from any cause, 12.8%–20.9%; nonfatal ICD-related adverse events, 19.3%–26.3%; and death from any cause or nonfatal ICD-related adverse event, 27.1%–37.6%. Agreement among safety signals detected/not detected between the time-to-event and DELTA approaches was 90.9% (360 of 396, k=0.068), between the time-to-event and embedded feature-selection approaches was 91.7% (363 of 396, k=−0.028), and between the DELTA and embedded feature selection approaches was 88.1% (349 of 396, k=−0.042). Conclusion Three statistical approaches, including one machine learning method, identified important safety signals, but without exact agreement. Ensemble methods may be needed to detect all safety signals for further evaluation during medical device surveillance. PMID:28860874
Ensembles of novelty detection classifiers for structural health monitoring using guided waves
NASA Astrophysics Data System (ADS)
Dib, Gerges; Karpenko, Oleksii; Koricho, Ermias; Khomenko, Anton; Haq, Mahmoodul; Udpa, Lalita
2018-01-01
Guided wave structural health monitoring uses sparse sensor networks embedded in sophisticated structures for defect detection and characterization. The biggest challenge of those sensor networks is developing robust techniques for reliable damage detection under changing environmental and operating conditions (EOC). To address this challenge, we develop a novelty classifier for damage detection based on one class support vector machines. We identify appropriate features for damage detection and introduce a feature aggregation method which quadratically increases the number of available training observations. We adopt a two-level voting scheme by using an ensemble of classifiers and predictions. Each classifier is trained on a different segment of the guided wave signal, and each classifier makes an ensemble of predictions based on a single observation. Using this approach, the classifier can be trained using a small number of baseline signals. We study the performance using Monte-Carlo simulations of an analytical model and data from impact damage experiments on a glass fiber composite plate. We also demonstrate the classifier performance using two types of baseline signals: fixed and rolling baseline training set. The former requires prior knowledge of baseline signals from all EOC, while the latter does not and leverages the fact that EOC vary slowly over time and can be modeled as a Gaussian process.
NASA Astrophysics Data System (ADS)
Karmalkar, A.
2017-12-01
Ensembles of dynamically downscaled climate change simulations are routinely used to capture uncertainty in projections at regional scales. I assess the reliability of two such ensembles for North America - NARCCAP and NA-CORDEX - by investigating the impact of model selection on representing uncertainty in regional projections, and the ability of the regional climate models (RCMs) to provide reliable information. These aspects - discussed for the six regions used in the US National Climate Assessment - provide an important perspective on the interpretation of downscaled results. I show that selecting general circulation models for downscaling based on their equilibrium climate sensitivities is a reasonable choice, but the six models chosen for NA-CORDEX do a poor job at representing uncertainty in winter temperature and precipitation projections in many parts of the eastern US, which lead to overconfident projections. The RCM performance is highly variable across models, regions, and seasons and the ability of the RCMs to provide improved seasonal mean performance relative to their parent GCMs seems limited in both RCM ensembles. Additionally, the ability of the RCMs to simulate historical climates is not strongly related to their ability to simulate climate change across the ensemble. This finding suggests limited use of models' historical performance to constrain their projections. Given these challenges in dynamical downscaling, the RCM results should not be used in isolation. Information on how well the RCM ensembles represent known uncertainties in regional climate change projections discussed here needs to be communicated clearly to inform maagement decisions.
NASA Astrophysics Data System (ADS)
Warner, Thomas T.; Sheu, Rong-Shyang; Bowers, James F.; Sykes, R. Ian; Dodd, Gregory C.; Henn, Douglas S.
2002-05-01
Ensemble simulations made using a coupled atmospheric dynamic model and a probabilistic Lagrangian puff dispersion model were employed in a forensic analysis of the transport and dispersion of a toxic gas that may have been released near Al Muthanna, Iraq, during the Gulf War. The ensemble study had two objectives, the first of which was to determine the sensitivity of the calculated dosage fields to the choices that must be made about the configuration of the atmospheric dynamic model. In this test, various choices were used for model physics representations and for the large-scale analyses that were used to construct the model initial and boundary conditions. The second study objective was to examine the dispersion model's ability to use ensemble inputs to predict dosage probability distributions. Here, the dispersion model was used with the ensemble mean fields from the individual atmospheric dynamic model runs, including the variability in the individual wind fields, to generate dosage probabilities. These are compared with the explicit dosage probabilities derived from the individual runs of the coupled modeling system. The results demonstrate that the specific choices made about the dynamic-model configuration and the large-scale analyses can have a large impact on the simulated dosages. For example, the area near the source that is exposed to a selected dosage threshold varies by up to a factor of 4 among members of the ensemble. The agreement between the explicit and ensemble dosage probabilities is relatively good for both low and high dosage levels. Although only one ensemble was considered in this study, the encouraging results suggest that a probabilistic dispersion model may be of value in quantifying the effects of uncertainties in a dynamic-model ensemble on dispersion model predictions of atmospheric transport and dispersion.
Selecting a Classification Ensemble and Detecting Process Drift in an Evolving Data Stream
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heredia-Langner, Alejandro; Rodriguez, Luke R.; Lin, Andy
2015-09-30
We characterize the commercial behavior of a group of companies in a common line of business using a small ensemble of classifiers on a stream of records containing commercial activity information. This approach is able to effectively find a subset of classifiers that can be used to predict company labels with reasonable accuracy. Performance of the ensemble, its error rate under stable conditions, can be characterized using an exponentially weighted moving average (EWMA) statistic. The behavior of the EWMA statistic can be used to monitor a record stream from the commercial network and determine when significant changes have occurred. Resultsmore » indicate that larger classification ensembles may not necessarily be optimal, pointing to the need to search the combinatorial classifier space in a systematic way. Results also show that current and past performance of an ensemble can be used to detect when statistically significant changes in the activity of the network have occurred. The dataset used in this work contains tens of thousands of high level commercial activity records with continuous and categorical variables and hundreds of labels, making classification challenging.« less
The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosis.
Koziol, James A; Feng, Anne C; Jia, Zhenyu; Wang, Yipeng; Goodison, Seven; McClelland, Michael; Mercola, Dan
2009-01-01
Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors.
NASA Astrophysics Data System (ADS)
Cao, Kunlin; Bhagalia, Roshni; Sood, Anup; Brogi, Edi; Mellinghoff, Ingo K.; Larson, Steven M.
2015-03-01
Positron emission tomography (PET) using uorodeoxyglucose (18F-FDG) is commonly used in the assessment of breast lesions by computing voxel-wise standardized uptake value (SUV) maps. Simple metrics derived from ensemble properties of SUVs within each identified breast lesion are routinely used for disease diagnosis. The maximum SUV within the lesion (SUVmax) is the most popular of these metrics. However these simple metrics are known to be error-prone and are susceptible to image noise. Finding reliable SUV map-based features that correlate to established molecular phenotypes of breast cancer (viz. estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER2) expression) will enable non-invasive disease management. This study investigated 36 SUV features based on first and second order statistics, local histograms and texture of segmented lesions to predict ER and PR expression in 51 breast cancer patients. True ER and PR expression was obtained via immunohistochemistry (IHC) of tissue samples from each lesion. A supervised learning, adaptive boosting-support vector machine (AdaBoost-SVM), framework was used to select a subset of features to classify breast lesions into distinct phenotypes. Performance of the trained multi-feature classifier was compared against the baseline single-feature SUVmax classifier using receiver operating characteristic (ROC) curves. Results show that texture features encoding local lesion homogeneity extracted from gray-level co-occurrence matrices are the strongest discriminator of lesion ER expression. In particular, classifiers including these features increased prediction accuracy from 0.75 (baseline) to 0.82 and the area under the ROC curve from 0.64 (baseline) to 0.75.
Update on SU(2) gauge theory with NF = 2 fundamental flavours.
NASA Astrophysics Data System (ADS)
Drach, Vincent; Janowski, Tadeusz; Pica, Claudio
2018-03-01
We present a non perturbative study of SU(2) gauge theory with two fundamental Dirac flavours. This theory provides a minimal template which is ideal for a wide class of Standard Model extensions featuring novel strong dynamics, such as a minimal realization of composite Higgs models. We present an update on the status of the meson spectrum and decay constants based on increased statistics on our existing ensembles and the inclusion of new ensembles with lighter pion masses, resulting in a more reliable chiral extrapolation. Preprint: CP3-Origins-2017-048 DNRF90
Dynamic Metabolic Model Building Based on the Ensemble Modeling Approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liao, James C.
2016-10-01
Ensemble modeling of kinetic systems addresses the challenges of kinetic model construction, with respect to parameter value selection, and still allows for the rich insights possible from kinetic models. This project aimed to show that constructing, implementing, and analyzing such models is a useful tool for the metabolic engineering toolkit, and that they can result in actionable insights from models. Key concepts are developed and deliverable publications and results are presented.
Identifying Optimal Measurement Subspace for the Ensemble Kalman Filter
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Ning; Huang, Zhenyu; Welch, Greg
2012-05-24
To reduce the computational load of the ensemble Kalman filter while maintaining its efficacy, an optimization algorithm based on the generalized eigenvalue decomposition method is proposed for identifying the most informative measurement subspace. When the number of measurements is large, the proposed algorithm can be used to make an effective tradeoff between computational complexity and estimation accuracy. This algorithm also can be extended to other Kalman filters for measurement subspace selection.
Prediction of plant lncRNA by ensemble machine learning classifiers.
Simopoulos, Caitlin M A; Weretilnyk, Elizabeth A; Golding, G Brian
2018-05-02
In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation. Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from Arabidopsis thaliana, Oryza sativa and Eutrema salsugineum ranged from 51 to 83% with the highest agreement in Eutrema salsugineum. Most of the highest ranking predictions from Arabidopsis thaliana were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are identified and functionally verified. This ensemble classifier is an accurate tool that can be used to rank long non-protein coding RNA predictions for use in conjunction with gene expression studies. Selection of plant transcripts with a high potential for regulatory roles as long non-protein coding RNAs will advance research in the elucidation of long non-protein coding RNA function.
A Kolmogorov-Smirnov test for the molecular clock based on Bayesian ensembles of phylogenies
Antoneli, Fernando; Passos, Fernando M.; Lopes, Luciano R.
2018-01-01
Divergence date estimates are central to understand evolutionary processes and depend, in the case of molecular phylogenies, on tests of molecular clocks. Here we propose two non-parametric tests of strict and relaxed molecular clocks built upon a framework that uses the empirical cumulative distribution (ECD) of branch lengths obtained from an ensemble of Bayesian trees and well known non-parametric (one-sample and two-sample) Kolmogorov-Smirnov (KS) goodness-of-fit test. In the strict clock case, the method consists in using the one-sample Kolmogorov-Smirnov (KS) test to directly test if the phylogeny is clock-like, in other words, if it follows a Poisson law. The ECD is computed from the discretized branch lengths and the parameter λ of the expected Poisson distribution is calculated as the average branch length over the ensemble of trees. To compensate for the auto-correlation in the ensemble of trees and pseudo-replication we take advantage of thinning and effective sample size, two features provided by Bayesian inference MCMC samplers. Finally, it is observed that tree topologies with very long or very short branches lead to Poisson mixtures and in this case we propose the use of the two-sample KS test with samples from two continuous branch length distributions, one obtained from an ensemble of clock-constrained trees and the other from an ensemble of unconstrained trees. Moreover, in this second form the test can also be applied to test for relaxed clock models. The use of a statistically equivalent ensemble of phylogenies to obtain the branch lengths ECD, instead of one consensus tree, yields considerable reduction of the effects of small sample size and provides a gain of power. PMID:29300759
NASA Astrophysics Data System (ADS)
Yuan, J.; Kopp, R. E.
2017-12-01
Quantitative risk analysis of regional climate change is crucial for risk management and impact assessment of climate change. Two major challenges to assessing the risks of climate change are: CMIP5 model runs, which drive EURO-CODEX downscaling runs, do not cover the full range of uncertainty of future projections; Climate models may underestimate the probability of tail risks (i.e. extreme events). To overcome the difficulties, this study offers a viable avenue, where a set of probabilistic climate ensemble is generated using the Surrogate/Model Mixed Ensemble (SMME) method. The probabilistic ensembles for temperature and precipitation are used to assess the range of uncertainty covered by five bias-corrected simulations from the high-resolution (0.11º) EURO-CODEX database, which are selected by the PESETA (The Projection of Economic impacts of climate change in Sectors of the European Union based on bottom-up Analysis) III project. Results show that the distribution of SMME ensemble is notably wider than both distribution of raw ensemble of GCMs and the spread of the five EURO-CORDEX in RCP8.5. Tail risks are well presented by the SMME ensemble. Both SMME ensemble and EURO-CORDEX projections are aggregated to administrative level, and are integrated into impact functions of PESETA III to assess climate risks in Europe. To further evaluate the uncertainties introduced by the downscaling process, we compare the 5 runs from EURO-CORDEX with runs from the corresponding GCMs. Time series of regional mean, spatial patterns, and climate indices are examined for the future climate (2080-2099) deviating from the present climate (1981-2010). The downscaling processes do not appear to be trend-preserving, e.g. the increase in regional mean temperature from EURO-CORDEX is slower than that from the corresponding GCM. The spatial pattern comparison reveals that the differences between each pair of GCM and EURO-CORDEX are small in winter. In summer, the temperatures of EURO-CORDEX are generally lower than those of GCMs, while the drying trends in precipitation of EURO-CORDEX are smaller than those of GCMs. Climate indices are significantly affected by bias-correction and downscaling process. Our study provides valuable information for selecting climate indices in different regions over Europe.
Si, Lei; Wang, Zhongbin; Liu, Xinhua; Tan, Chao; Xu, Jing; Zheng, Kehong
2015-01-01
In order to efficiently and accurately identify the cutting condition of a shearer, this paper proposed an intelligent multi-sensor data fusion identification method using the parallel quasi-Newton neural network (PQN-NN) and the Dempster-Shafer (DS) theory. The vibration acceleration signals and current signal of six cutting conditions were collected from a self-designed experimental system and some special state features were extracted from the intrinsic mode functions (IMFs) based on the ensemble empirical mode decomposition (EEMD). In the experiment, three classifiers were trained and tested by the selected features of the measured data, and the DS theory was used to combine the identification results of three single classifiers. Furthermore, some comparisons with other methods were carried out. The experimental results indicate that the proposed method performs with higher detection accuracy and credibility than the competing algorithms. Finally, an industrial application example in the fully mechanized coal mining face was demonstrated to specify the effect of the proposed system. PMID:26580620
Kunz, Ralf; Timpmann, Kõu; Southall, June; Cogdell, Richard J; Freiberg, Arvi; Köhler, Jürgen
2014-05-06
We have recorded fluorescence-excitation and emission spectra from single LH2 complexes from Rhodopseudomonas (Rps.) acidophila. Both types of spectra show strong temporal spectral fluctuations that can be visualized as spectral diffusion plots. Comparison of the excitation and emission spectra reveals that for most of the complexes the lowest exciton transition is not observable in the excitation spectra due to the cutoff of the detection filter characteristics. However, from the spectral diffusion plots we have the full spectral and temporal information at hand and can select those complexes for which the excitation spectra are complete. Correlating the red most spectral feature of the excitation spectrum with the blue most spectral feature of the emission spectrum allows an unambiguous assignment of the lowest exciton state. Hence, application of fluorescence-excitation and emission spectroscopy on the same individual LH2 complex allows us to decipher spectral subtleties that are usually hidden in traditional ensemble spectroscopy. Copyright © 2014 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Warren, Brandon L.; Mendoza, Michael P.; Cruz, Fabio C.; Leao, Rodrigo M.; Caprioli, Daniele; Rubio, F. Javier; Whitaker, Leslie R.; McPherson, Kylie B.; Bossert, Jennifer M.; Shaham, Yavin
2016-01-01
In operant learning, initial reward-associated memories are thought to be distinct from subsequent extinction-associated memories. Memories formed during operant learning are thought to be stored in “neuronal ensembles.” Thus, we hypothesize that different neuronal ensembles encode reward- and extinction-associated memories. Here, we examined prefrontal cortex neuronal ensembles involved in the recall of reward and extinction memories of food self-administration. We first trained rats to lever press for palatable food pellets for 7 d (1 h/d) and then exposed them to 0, 2, or 7 daily extinction sessions in which lever presses were not reinforced. Twenty-four hours after the last training or extinction session, we exposed the rats to either a short 15 min extinction test session or left them in their homecage (a control condition). We found maximal Fos (a neuronal activity marker) immunoreactivity in the ventral medial prefrontal cortex of rats that previously received 2 extinction sessions, suggesting that neuronal ensembles in this area encode extinction memories. We then used the Daun02 inactivation procedure to selectively disrupt ventral medial prefrontal cortex neuronal ensembles that were activated during the 15 min extinction session following 0 (no extinction) or 2 prior extinction sessions to determine the effects of inactivating the putative food reward and extinction ensembles, respectively, on subsequent nonreinforced food seeking 2 d later. Inactivation of the food reward ensembles decreased food seeking, whereas inactivation of the extinction ensembles increased food seeking. Our results indicate that distinct neuronal ensembles encoding operant reward and extinction memories intermingle within the same cortical area. SIGNIFICANCE STATEMENT A current popular hypothesis is that neuronal ensembles in different prefrontal cortex areas control reward-associated versus extinction-associated memories: the dorsal medial prefrontal cortex (mPFC) promotes reward seeking, whereas the ventral mPFC inhibits reward seeking. In this paper, we use the Daun02 chemogenetic inactivation procedure to demonstrate that Fos-expressing neuronal ensembles mediating both food reward and extinction memories intermingle within the same ventral mPFC area. PMID:27335401
Warren, Brandon L; Mendoza, Michael P; Cruz, Fabio C; Leao, Rodrigo M; Caprioli, Daniele; Rubio, F Javier; Whitaker, Leslie R; McPherson, Kylie B; Bossert, Jennifer M; Shaham, Yavin; Hope, Bruce T
2016-06-22
In operant learning, initial reward-associated memories are thought to be distinct from subsequent extinction-associated memories. Memories formed during operant learning are thought to be stored in "neuronal ensembles." Thus, we hypothesize that different neuronal ensembles encode reward- and extinction-associated memories. Here, we examined prefrontal cortex neuronal ensembles involved in the recall of reward and extinction memories of food self-administration. We first trained rats to lever press for palatable food pellets for 7 d (1 h/d) and then exposed them to 0, 2, or 7 daily extinction sessions in which lever presses were not reinforced. Twenty-four hours after the last training or extinction session, we exposed the rats to either a short 15 min extinction test session or left them in their homecage (a control condition). We found maximal Fos (a neuronal activity marker) immunoreactivity in the ventral medial prefrontal cortex of rats that previously received 2 extinction sessions, suggesting that neuronal ensembles in this area encode extinction memories. We then used the Daun02 inactivation procedure to selectively disrupt ventral medial prefrontal cortex neuronal ensembles that were activated during the 15 min extinction session following 0 (no extinction) or 2 prior extinction sessions to determine the effects of inactivating the putative food reward and extinction ensembles, respectively, on subsequent nonreinforced food seeking 2 d later. Inactivation of the food reward ensembles decreased food seeking, whereas inactivation of the extinction ensembles increased food seeking. Our results indicate that distinct neuronal ensembles encoding operant reward and extinction memories intermingle within the same cortical area. A current popular hypothesis is that neuronal ensembles in different prefrontal cortex areas control reward-associated versus extinction-associated memories: the dorsal medial prefrontal cortex (mPFC) promotes reward seeking, whereas the ventral mPFC inhibits reward seeking. In this paper, we use the Daun02 chemogenetic inactivation procedure to demonstrate that Fos-expressing neuronal ensembles mediating both food reward and extinction memories intermingle within the same ventral mPFC area. Copyright © 2016 the authors 0270-6474/16/366691-13$15.00/0.
A novel method for predicting kidney stone type using ensemble learning.
Kazemi, Yassaman; Mirroshandel, Seyed Abolghasem
2018-01-01
The high morbidity rate associated with kidney stone disease, which is a silent killer, is one of the main concerns in healthcare systems all over the world. Advanced data mining techniques such as classification can help in the early prediction of this disease and reduce its incidence and associated costs. The objective of the present study is to derive a model for the early detection of the type of kidney stone and the most influential parameters with the aim of providing a decision-support system. Information was collected from 936 patients with nephrolithiasis at the kidney center of the Razi Hospital in Rasht from 2012 through 2016. The prepared dataset included 42 features. Data pre-processing was the first step toward extracting the relevant features. The collected data was analyzed with Weka software, and various data mining models were used to prepare a predictive model. Various data mining algorithms such as the Bayesian model, different types of Decision Trees, Artificial Neural Networks, and Rule-based classifiers were used in these models. We also proposed four models based on ensemble learning to improve the accuracy of each learning algorithm. In addition, a novel technique for combining individual classifiers in ensemble learning was proposed. In this technique, for each individual classifier, a weight is assigned based on our proposed genetic algorithm based method. The generated knowledge was evaluated using a 10-fold cross-validation technique based on standard measures. However, the assessment of each feature for building a predictive model was another significant challenge. The predictive strength of each feature for creating a reproducible outcome was also investigated. Regarding the applied models, parameters such as sex, acid uric condition, calcium level, hypertension, diabetes, nausea and vomiting, flank pain, and urinary tract infection (UTI) were the most vital parameters for predicting the chance of nephrolithiasis. The final ensemble-based model (with an accuracy of 97.1%) was a robust one and could be safely applied to future studies to predict the chances of developing nephrolithiasis. This model provides a novel way to study stone disease by deciphering the complex interaction among different biological variables, thus helping in an early identification and reduction in diagnosis time. Copyright © 2017 Elsevier B.V. All rights reserved.
Mastwal, Surjeet; Cao, Vania; Wang, Kuan Hong
2016-01-01
Mental functions involve coordinated activities of specific neuronal ensembles that are embedded in complex brain circuits. Aberrant neuronal ensemble dynamics is thought to form the neurobiological basis of mental disorders. A major challenge in mental health research is to identify these cellular ensembles and determine what molecular mechanisms constrain their emergence and consolidation during development and learning. Here, we provide a perspective based on recent studies that use activity-dependent gene Arc/Arg3.1 as a cellular marker to identify neuronal ensembles and a molecular probe to modulate circuit functions. These studies have demonstrated that the transcription of Arc is activated in selective groups of frontal cortical neurons in response to specific behavioral tasks. Arc expression regulates the persistent firing of individual neurons and predicts the consolidation of neuronal ensembles during repeated learning. Therefore, the Arc pathway represents a prototypical example of activity-dependent genetic feedback regulation of neuronal ensembles. The activation of this pathway in the frontal cortex starts during early postnatal development and requires dopaminergic (DA) input. Conversely, genetic disruption of Arc leads to a hypoactive mesofrontal dopamine circuit and its related cognitive deficit. This mutual interaction suggests an auto-regulatory mechanism to amplify the impact of neuromodulators and activity-regulated genes during postnatal development. Such a mechanism may contribute to the association of mutations in dopamine and Arc pathways with neurodevelopmental psychiatric disorders. As the mesofrontal dopamine circuit shows extensive activity-dependent developmental plasticity, activity-guided modulation of DA projections or Arc ensembles during development may help to repair circuit deficits related to neuropsychiatric disorders.
Knowledge-Based Methods To Train and Optimize Virtual Screening Ensembles
2016-01-01
Ensemble docking can be a successful virtual screening technique that addresses the innate conformational heterogeneity of macromolecular drug targets. Yet, lacking a method to identify a subset of conformational states that effectively segregates active and inactive small molecules, ensemble docking may result in the recommendation of a large number of false positives. Here, three knowledge-based methods that construct structural ensembles for virtual screening are presented. Each method selects ensembles by optimizing an objective function calculated using the receiver operating characteristic (ROC) curve: either the area under the ROC curve (AUC) or a ROC enrichment factor (EF). As the number of receptor conformations, N, becomes large, the methods differ in their asymptotic scaling. Given a set of small molecules with known activities and a collection of target conformations, the most resource intense method is guaranteed to find the optimal ensemble but scales as O(2N). A recursive approximation to the optimal solution scales as O(N2), and a more severe approximation leads to a faster method that scales linearly, O(N). The techniques are generally applicable to any system, and we demonstrate their effectiveness on the androgen nuclear hormone receptor (AR), cyclin-dependent kinase 2 (CDK2), and the peroxisome proliferator-activated receptor δ (PPAR-δ) drug targets. Conformations that consisted of a crystal structure and molecular dynamics simulation cluster centroids were used to form AR and CDK2 ensembles. Multiple available crystal structures were used to form PPAR-δ ensembles. For each target, we show that the three methods perform similarly to one another on both the training and test sets. PMID:27097522
NASA Astrophysics Data System (ADS)
Szunyogh, Istvan; Kostelich, Eric J.; Gyarmati, G.; Patil, D. J.; Hunt, Brian R.; Kalnay, Eugenia; Ott, Edward; Yorke, James A.
2005-08-01
The accuracy and computational efficiency of the recently proposed local ensemble Kalman filter (LEKF) data assimilation scheme is investigated on a state-of-the-art operational numerical weather prediction model using simulated observations. The model selected for this purpose is the T62 horizontal- and 28-level vertical-resolution version of the Global Forecast System (GFS) of the National Center for Environmental Prediction. The performance of the data assimilation system is assessed for different configurations of the LEKF scheme. It is shown that a modest size (40-member) ensemble is sufficient to track the evolution of the atmospheric state with high accuracy. For this ensemble size, the computational time per analysis is less than 9 min on a cluster of PCs. The analyses are extremely accurate in the mid-latitude storm track regions. The largest analysis errors, which are typically much smaller than the observational errors, occur where parametrized physical processes play important roles. Because these are also the regions where model errors are expected to be the largest, limitations of a real-data implementation of the ensemble-based Kalman filter may be easily mistaken for model errors. In light of these results, the importance of testing the ensemble-based Kalman filter data assimilation systems on simulated observations is stressed.
Uncertainty Quantification in Alchemical Free Energy Methods.
Bhati, Agastya P; Wan, Shunzhou; Hu, Yuan; Sherborne, Brad; Coveney, Peter V
2018-06-12
Alchemical free energy methods have gained much importance recently from several reports of improved ligand-protein binding affinity predictions based on their implementation using molecular dynamics simulations. A large number of variants of such methods implementing different accelerated sampling techniques and free energy estimators are available, each claimed to be better than the others in its own way. However, the key features of reproducibility and quantification of associated uncertainties in such methods have barely been discussed. Here, we apply a systematic protocol for uncertainty quantification to a number of popular alchemical free energy methods, covering both absolute and relative free energy predictions. We show that a reliable measure of error estimation is provided by ensemble simulation-an ensemble of independent MD simulations-which applies irrespective of the free energy method. The need to use ensemble methods is fundamental and holds regardless of the duration of time of the molecular dynamics simulations performed.
Hsieh, Nan-Chen; Hung, Lun-Ping; Shih, Chun-Che; Keh, Huan-Chao; Chan, Chien-Hui
2012-06-01
Endovascular aneurysm repair (EVAR) is an advanced minimally invasive surgical technology that is helpful for reducing patients' recovery time, postoperative morbidity and mortality. This study proposes an ensemble model to predict postoperative morbidity after EVAR. The ensemble model was developed using a training set of consecutive patients who underwent EVAR between 2000 and 2009. All data required for prediction modeling, including patient demographics, preoperative, co-morbidities, and complication as outcome variables, was collected prospectively and entered into a clinical database. A discretization approach was used to categorize numerical values into informative feature space. Then, the Bayesian network (BN), artificial neural network (ANN), and support vector machine (SVM) were adopted as base models, and stacking combined multiple models. The research outcomes consisted of an ensemble model to predict postoperative morbidity after EVAR, the occurrence of postoperative complications prospectively recorded, and the causal effect knowledge by BNs with Markov blanket concept.
CLS 2+1 flavor simulations at physical light-and strange-quark masses
NASA Astrophysics Data System (ADS)
Mohler, Daniel; Schaefer, Stefan; Simeth, Jakob
2018-03-01
We report recent efforts by CLS to generate an ensemble with physical lightand strange-quark masses in a lattice volume of 192 × 963 at β = 3:55 corresponding to a lattice spacing of 0:064 fm. This ensemble is being generated as part of the CLS 2+1 flavor effort with improved Wilson fermions. Our simulations currently cover 5 lattice spacings ranging from 0:039 fm to 0:086 fm at various pion masses along chiral trajectories with either the sum of the quark masses kept fixed, or with the strange-quark mass at the physical value. The current status of simulations is briefly reviewed, including a short discussion of measured autocorrelation times and of the main features of the simulations. We then proceed to discuss the thermalization strategy employed for the generation of the physical quark-mass ensemble and present first results for some simple observables. Challenges encountered in the simulation are highlighted.
Optical and structural properties of ensembles of colloidal Ag{sub 2}S quantum dots in gelatin
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ovchinnikov, O. V., E-mail: Ovchinnikov-O-V@rambler.ru; Smirnov, M. S.; Shapiro, B. I.
2015-03-15
The size dependences of the absorption and luminescence spectra of ensembles of hydrophilic colloidal Ag{sub 2}S quantum dots produced by the sol-gel method and dispersed in gelatin are analyzed. By X-ray diffraction analysis and transmission electron microscopy, the formation of core/shell nanoparticles is detected. The characteristic feature of the nanoparticles is the formation of crystalline cores, 1.5–2.0 nm in dimensions, and shells of gelatin and its complexes with the components of synthesis. The observed slight size dependence of the position of infrared photoluminescence bands (in the range 1000–1400 nm) in the ensembles of hydrophilic colloidal Ag{sub 2}S quantum dots ismore » explained within the context of the model of the radiative recombination of electrons localized at structural and impurity defects with free holes.« less
Gestalt Effects in Visual Working Memory.
Kałamała, Patrycja; Sadowska, Aleksandra; Ordziniak, Wawrzyniec; Chuderski, Adam
2017-01-01
Four experiments investigated whether conforming to Gestalt principles, well known to drive visual perception, also facilitates the active maintenance of information in visual working memory (VWM). We used the change detection task, which required the memorization of visual patterns composed of several shapes. We observed no effects of symmetry of visual patterns on VWM performance. However, there was a moderate positive effect when a particular shape that was probed matched the shape of the whole pattern (the whole-part similarity effect). Data support the models assuming that VWM encodes not only particular objects of the perceptual scene but also the spatial relations between them (the ensemble representation). The ensemble representation may prime objects similar to its shape and thereby boost access to them. In contrast, the null effect of symmetry relates the fact that this very feature of an ensemble does not yield any useful additional information for VWM.
Caetano dos Santos, Florentino Luciano; Skottman, Heli; Juuti-Uusitalo, Kati; Hyttinen, Jari
2016-01-01
Aims A fast, non-invasive and observer-independent method to analyze the homogeneity and maturity of human pluripotent stem cell (hPSC) derived retinal pigment epithelial (RPE) cells is warranted to assess the suitability of hPSC-RPE cells for implantation or in vitro use. The aim of this work was to develop and validate methods to create ensembles of state-of-the-art texture descriptors and to provide a robust classification tool to separate three different maturation stages of RPE cells by using phase contrast microscopy images. The same methods were also validated on a wide variety of biological image classification problems, such as histological or virus image classification. Methods For image classification we used different texture descriptors, descriptor ensembles and preprocessing techniques. Also, three new methods were tested. The first approach was an ensemble of preprocessing methods, to create an additional set of images. The second was the region-based approach, where saliency detection and wavelet decomposition divide each image in two different regions, from which features were extracted through different descriptors. The third method was an ensemble of Binarized Statistical Image Features, based on different sizes and thresholds. A Support Vector Machine (SVM) was trained for each descriptor histogram and the set of SVMs combined by sum rule. The accuracy of the computer vision tool was verified in classifying the hPSC-RPE cell maturation level. Dataset and Results The RPE dataset contains 1862 subwindows from 195 phase contrast images. The final descriptor ensemble outperformed the most recent stand-alone texture descriptors, obtaining, for the RPE dataset, an area under ROC curve (AUC) of 86.49% with the 10-fold cross validation and 91.98% with the leave-one-image-out protocol. The generality of the three proposed approaches was ascertained with 10 more biological image datasets, obtaining an average AUC greater than 97%. Conclusions Here we showed that the developed ensembles of texture descriptors are able to classify the RPE cell maturation stage. Moreover, we proved that preprocessing and region-based decomposition improves many descriptors’ accuracy in biological dataset classification. Finally, we built the first public dataset of stem cell-derived RPE cells, which is publicly available to the scientific community for classification studies. The proposed tool is available at https://www.dei.unipd.it/node/2357 and the RPE dataset at http://www.biomeditech.fi/data/RPE_dataset/. Both are available at https://figshare.com/s/d6fb591f1beb4f8efa6f. PMID:26895509
Chen, Peng; Li, Jinyan; Wong, Limsoon; Kuwahara, Hiroyuki; Huang, Jianhua Z; Gao, Xin
2013-08-01
Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. Copyright © 2013 Wiley Periodicals, Inc.
Stochastic Approaches Within a High Resolution Rapid Refresh Ensemble
NASA Astrophysics Data System (ADS)
Jankov, I.
2017-12-01
It is well known that global and regional numerical weather prediction (NWP) ensemble systems are under-dispersive, producing unreliable and overconfident ensemble forecasts. Typical approaches to alleviate this problem include the use of multiple dynamic cores, multiple physics suite configurations, or a combination of the two. While these approaches may produce desirable results, they have practical and theoretical deficiencies and are more difficult and costly to maintain. An active area of research that promotes a more unified and sustainable system is the use of stochastic physics. Stochastic approaches include Stochastic Parameter Perturbations (SPP), Stochastic Kinetic Energy Backscatter (SKEB), and Stochastic Perturbation of Physics Tendencies (SPPT). The focus of this study is to assess model performance within a convection-permitting ensemble at 3-km grid spacing across the Contiguous United States (CONUS) using a variety of stochastic approaches. A single physics suite configuration based on the operational High-Resolution Rapid Refresh (HRRR) model was utilized and ensemble members produced by employing stochastic methods. Parameter perturbations (using SPP) for select fields were employed in the Rapid Update Cycle (RUC) land surface model (LSM) and Mellor-Yamada-Nakanishi-Niino (MYNN) Planetary Boundary Layer (PBL) schemes. Within MYNN, SPP was applied to sub-grid cloud fraction, mixing length, roughness length, mass fluxes and Prandtl number. In the RUC LSM, SPP was applied to hydraulic conductivity and tested perturbing soil moisture at initial time. First iterative testing was conducted to assess the initial performance of several configuration settings (e.g. variety of spatial and temporal de-correlation lengths). Upon selection of the most promising candidate configurations using SPP, a 10-day time period was run and more robust statistics were gathered. SKEB and SPPT were included in additional retrospective tests to assess the impact of using all three stochastic approaches to address model uncertainty. Results from the stochastic perturbation testing were compared to a baseline multi-physics control ensemble. For probabilistic forecast performance the Model Evaluation Tools (MET) verification package was used.
Conformational and functional analysis of molecular dynamics trajectories by Self-Organising Maps
2011-01-01
Background Molecular dynamics (MD) simulations are powerful tools to investigate the conformational dynamics of proteins that is often a critical element of their function. Identification of functionally relevant conformations is generally done clustering the large ensemble of structures that are generated. Recently, Self-Organising Maps (SOMs) were reported performing more accurately and providing more consistent results than traditional clustering algorithms in various data mining problems. We present a novel strategy to analyse and compare conformational ensembles of protein domains using a two-level approach that combines SOMs and hierarchical clustering. Results The conformational dynamics of the α-spectrin SH3 protein domain and six single mutants were analysed by MD simulations. The Cα's Cartesian coordinates of conformations sampled in the essential space were used as input data vectors for SOM training, then complete linkage clustering was performed on the SOM prototype vectors. A specific protocol to optimize a SOM for structural ensembles was proposed: the optimal SOM was selected by means of a Taguchi experimental design plan applied to different data sets, and the optimal sampling rate of the MD trajectory was selected. The proposed two-level approach was applied to single trajectories of the SH3 domain independently as well as to groups of them at the same time. The results demonstrated the potential of this approach in the analysis of large ensembles of molecular structures: the possibility of producing a topological mapping of the conformational space in a simple 2D visualisation, as well as of effectively highlighting differences in the conformational dynamics directly related to biological functions. Conclusions The use of a two-level approach combining SOMs and hierarchical clustering for conformational analysis of structural ensembles of proteins was proposed. It can easily be extended to other study cases and to conformational ensembles from other sources. PMID:21569575
A Corpus-Based Approach for Automatic Thai Unknown Word Recognition Using Boosting Techniques
NASA Astrophysics Data System (ADS)
Techo, Jakkrit; Nattee, Cholwich; Theeramunkong, Thanaruk
While classification techniques can be applied for automatic unknown word recognition in a language without word boundary, it faces with the problem of unbalanced datasets where the number of positive unknown word candidates is dominantly smaller than that of negative candidates. To solve this problem, this paper presents a corpus-based approach that introduces a so-called group-based ranking evaluation technique into ensemble learning in order to generate a sequence of classification models that later collaborate to select the most probable unknown word from multiple candidates. Given a classification model, the group-based ranking evaluation (GRE) is applied to construct a training dataset for learning the succeeding model, by weighing each of its candidates according to their ranks and correctness when the candidates of an unknown word are considered as one group. A number of experiments have been conducted on a large Thai medical text to evaluate performance of the proposed group-based ranking evaluation approach, namely V-GRE, compared to the conventional naïve Bayes classifier and our vanilla version without ensemble learning. As the result, the proposed method achieves an accuracy of 90.93±0.50% when the first rank is selected while it gains 97.26±0.26% when the top-ten candidates are considered, that is 8.45% and 6.79% improvement over the conventional record-based naïve Bayes classifier and the vanilla version. Another result on applying only best features show 93.93±0.22% and up to 98.85±0.15% accuracy for top-1 and top-10, respectively. They are 3.97% and 9.78% improvement over naive Bayes and the vanilla version. Finally, an error analysis is given.
NASA Astrophysics Data System (ADS)
Lazar, Dora; Ihasz, Istvan
2013-04-01
The short and medium range operational forecasts, warning and alarm of the severe weather are one of the most important activities of the Hungarian Meteorological Service. Our study provides comprehensive summary of newly developed methods based on ECMWF ensemble forecasts to assist successful prediction of the convective weather situations. . In the first part of the study a brief overview is given about the components of atmospheric convection, which are the atmospheric lifting force, convergence and vertical wind shear. The atmospheric instability is often used to characterize the so-called instability index; one of the most popular and often used indexes is the convective available potential energy. Heavy convective events, like intensive storms, supercells and tornadoes are needed the vertical instability, adequate moisture and vertical wind shear. As a first step statistical studies of these three parameters are based on nine years time series of 51-member ensemble forecasting model based on convective summer time period, various statistical analyses were performed. Relationship of the rate of the convective and total precipitation and above three parameters was studied by different statistical methods. Four new visualization methods were applied for supporting successful forecasts of severe weathers. Two of the four visualization methods the ensemble meteogram and the ensemble vertical profiles had been available at the beginning of our work. Both methods show probability of the meteorological parameters for the selected location. Additionally two new methods have been developed. First method provides probability map of the event exceeding predefined values, so the incident of the spatial uncertainty is well-defined. The convective weather events are characterized by the incident of space often rhapsodic occurs rather have expected the event area can be selected so that the ensemble forecasts give very good support. Another new visualization tool shows time evolution of predefined multiple thresholds in graphical form for any selected location. With applying this tool degree of the dangerous weather conditions can be well estimated. Besides intensive convective periods are clearly marked during the forecasting period. Developments were done by MAGICS++ software under UNIX operating system. The third part of the study usefulness of these tools is demonstrated in three interesting cases studies of last summer.
Bassen, David M; Vilkhovoy, Michael; Minot, Mason; Butcher, Jonathan T; Varner, Jeffrey D
2017-01-25
Ensemble modeling is a promising approach for obtaining robust predictions and coarse grained population behavior in deterministic mathematical models. Ensemble approaches address model uncertainty by using parameter or model families instead of single best-fit parameters or fixed model structures. Parameter ensembles can be selected based upon simulation error, along with other criteria such as diversity or steady-state performance. Simulations using parameter ensembles can estimate confidence intervals on model variables, and robustly constrain model predictions, despite having many poorly constrained parameters. In this software note, we present a multiobjective based technique to estimate parameter or models ensembles, the Pareto Optimal Ensemble Technique in the Julia programming language (JuPOETs). JuPOETs integrates simulated annealing with Pareto optimality to estimate ensembles on or near the optimal tradeoff surface between competing training objectives. We demonstrate JuPOETs on a suite of multiobjective problems, including test functions with parameter bounds and system constraints as well as for the identification of a proof-of-concept biochemical model with four conflicting training objectives. JuPOETs identified optimal or near optimal solutions approximately six-fold faster than a corresponding implementation in Octave for the suite of test functions. For the proof-of-concept biochemical model, JuPOETs produced an ensemble of parameters that gave both the mean of the training data for conflicting data sets, while simultaneously estimating parameter sets that performed well on each of the individual objective functions. JuPOETs is a promising approach for the estimation of parameter and model ensembles using multiobjective optimization. JuPOETs can be adapted to solve many problem types, including mixed binary and continuous variable types, bilevel optimization problems and constrained problems without altering the base algorithm. JuPOETs is open source, available under an MIT license, and can be installed using the Julia package manager from the JuPOETs GitHub repository.
Predicting students’ happiness from physiology, phone, mobility, and behavioral data
Jaques, Natasha; Taylor, Sara; Azaria, Asaph; Ghandeharioun, Asma; Sano, Akane; Picard, Rosalind
2017-01-01
In order to model students’ happiness, we apply machine learning methods to data collected from undergrad students monitored over the course of one month each. The data collected include physiological signals, location, smartphone logs, and survey responses to behavioral questions. Each day, participants reported their wellbeing on measures including stress, health, and happiness. Because of the relationship between happiness and depression, modeling happiness may help us to detect individuals who are at risk of depression and guide interventions to help them. We are also interested in how behavioral factors (such as sleep and social activity) affect happiness positively and negatively. A variety of machine learning and feature selection techniques are compared, including Gaussian Mixture Models and ensemble classification. We achieve 70% classification accuracy of self-reported happiness on held-out test data. PMID:28515966
Georgiadis, Pantelis; Cavouras, Dionisis; Kalatzis, Ioannis; Glotsos, Dimitris; Athanasiadis, Emmanouil; Kostopoulos, Spiros; Sifaki, Koralia; Malamas, Menelaos; Nikiforidis, George; Solomou, Ekaterini
2009-01-01
Three-dimensional (3D) texture analysis of volumetric brain magnetic resonance (MR) images has been identified as an important indicator for discriminating among different brain pathologies. The purpose of this study was to evaluate the efficiency of 3D textural features using a pattern recognition system in the task of discriminating benign, malignant and metastatic brain tissues on T1 postcontrast MR imaging (MRI) series. The dataset consisted of 67 brain MRI series obtained from patients with verified and untreated intracranial tumors. The pattern recognition system was designed as an ensemble classification scheme employing a support vector machine classifier, specially modified in order to integrate the least squares features transformation logic in its kernel function. The latter, in conjunction with using 3D textural features, enabled boosting up the performance of the system in discriminating metastatic, malignant and benign brain tumors with 77.14%, 89.19% and 93.33% accuracy, respectively. The method was evaluated using an external cross-validation process; thus, results might be considered indicative of the generalization performance of the system to "unseen" cases. The proposed system might be used as an assisting tool for brain tumor characterization on volumetric MRI series.
Ensemble theory for slightly deformable granular matter.
Tejada, Ignacio G
2014-09-01
Given a granular system of slightly deformable particles, it is possible to obtain different static and jammed packings subjected to the same macroscopic constraints. These microstates can be compared in a mathematical space defined by the components of the force-moment tensor (i.e. the product of the equivalent stress by the volume of the Voronoi cell). In order to explain the statistical distributions observed there, an athermal ensemble theory can be used. This work proposes a formalism (based on developments of the original theory of Edwards and collaborators) that considers both the internal and the external constraints of the problem. The former give the density of states of the points of this space, and the latter give their statistical weight. The internal constraints are those caused by the intrinsic features of the system (e.g. size distribution, friction, cohesion). They, together with the force-balance condition, determine which the possible local states of equilibrium of a particle are. Under the principle of equal a priori probabilities, and when no other constraints are imposed, it can be assumed that particles are equally likely to be found in any one of these local states of equilibrium. Then a flat sampling over all these local states turns into a non-uniform distribution in the force-moment space that can be represented with density of states functions. Although these functions can be measured, some of their features are explored in this paper. The external constraints are those macroscopic quantities that define the ensemble and are fixed by the protocol. The force-moment, the volume, the elastic potential energy and the stress are some examples of quantities that can be expressed as functions of the force-moment. The associated ensembles are included in the formalism presented here.
Statistical Methods in Ai: Rare Event Learning Using Associative Rules and Higher-Order Statistics
NASA Astrophysics Data System (ADS)
Iyer, V.; Shetty, S.; Iyengar, S. S.
2015-07-01
Rare event learning has not been actively researched since lately due to the unavailability of algorithms which deal with big samples. The research addresses spatio-temporal streams from multi-resolution sensors to find actionable items from a perspective of real-time algorithms. This computing framework is independent of the number of input samples, application domain, labelled or label-less streams. A sampling overlap algorithm such as Brooks-Iyengar is used for dealing with noisy sensor streams. We extend the existing noise pre-processing algorithms using Data-Cleaning trees. Pre-processing using ensemble of trees using bagging and multi-target regression showed robustness to random noise and missing data. As spatio-temporal streams are highly statistically correlated, we prove that a temporal window based sampling from sensor data streams converges after n samples using Hoeffding bounds. Which can be used for fast prediction of new samples in real-time. The Data-cleaning tree model uses a nonparametric node splitting technique, which can be learned in an iterative way which scales linearly in memory consumption for any size input stream. The improved task based ensemble extraction is compared with non-linear computation models using various SVM kernels for speed and accuracy. We show using empirical datasets the explicit rule learning computation is linear in time and is only dependent on the number of leafs present in the tree ensemble. The use of unpruned trees (t) in our proposed ensemble always yields minimum number (m) of leafs keeping pre-processing computation to n × t log m compared to N2 for Gram Matrix. We also show that the task based feature induction yields higher Qualify of Data (QoD) in the feature space compared to kernel methods using Gram Matrix.
NASA Astrophysics Data System (ADS)
Tito Arandia Martinez, Fabian
2014-05-01
Adequate uncertainty assessment is an important issue in hydrological modelling. An important issue for hydropower producers is to obtain ensemble forecasts which truly grasp the uncertainty linked to upcoming streamflows. If properly assessed, this uncertainty can lead to optimal reservoir management and energy production (ex. [1]). The meteorological inputs to the hydrological model accounts for an important part of the total uncertainty in streamflow forecasting. Since the creation of the THORPEX initiative and the TIGGE database, access to meteorological ensemble forecasts from nine agencies throughout the world have been made available. This allows for hydrological ensemble forecasts based on multiple meteorological ensemble forecasts. Consequently, both the uncertainty linked to the architecture of the meteorological model and the uncertainty linked to the initial condition of the atmosphere can be accounted for. The main objective of this work is to show that a weighted combination of meteorological ensemble forecasts based on different atmospheric models can lead to improved hydrological ensemble forecasts, for horizons from one to ten days. This experiment is performed for the Baskatong watershed, a head subcatchment of the Gatineau watershed in the province of Quebec, in Canada. Baskatong watershed is of great importance for hydro-power production, as it comprises the main reservoir for the Gatineau watershed, on which there are six hydropower plants managed by Hydro-Québec. Since the 70's, they have been using pseudo ensemble forecast based on deterministic meteorological forecasts to which variability derived from past forecasting errors is added. We use a combination of meteorological ensemble forecasts from different models (precipitation and temperature) as the main inputs for hydrological model HSAMI ([2]). The meteorological ensembles from eight of the nine agencies available through TIGGE are weighted according to their individual performance and combined to form a grand ensemble. Results show that the hydrological forecasts derived from the grand ensemble perform better than the pseudo ensemble forecasts actually used operationally at Hydro-Québec. References: [1] M. Verbunt, A. Walser, J. Gurtz et al., "Probabilistic flood forecasting with a limited-area ensemble prediction system: Selected case studies," Journal of Hydrometeorology, vol. 8, no. 4, pp. 897-909, Aug, 2007. [2] N. Evora, Valorisation des prévisions météorologiques d'ensemble, Institu de recherceh d'Hydro-Québec 2005. [3] V. Fortin, Le modèle météo-apport HSAMI: historique, théorie et application, Institut de recherche d'Hydro-Québec, 2000.
Correlated variability modifies working memory fidelity in primate prefrontal neuronal ensembles
Leavitt, Matthew L.; Pieper, Florian; Sachs, Adam J.; Martinez-Trujillo, Julio C.
2017-01-01
Neurons in the primate lateral prefrontal cortex (LPFC) encode working memory (WM) representations via sustained firing, a phenomenon hypothesized to arise from recurrent dynamics within ensembles of interconnected neurons. Here, we tested this hypothesis by using microelectrode arrays to examine spike count correlations (rsc) in LPFC neuronal ensembles during a spatial WM task. We found a pattern of pairwise rsc during WM maintenance indicative of stronger coupling between similarly tuned neurons and increased inhibition between dissimilarly tuned neurons. We then used a linear decoder to quantify the effects of the high-dimensional rsc structure on information coding in the neuronal ensembles. We found that the rsc structure could facilitate or impair coding, depending on the size of the ensemble and tuning properties of its constituent neurons. A simple optimization procedure demonstrated that near-maximum decoding performance could be achieved using a relatively small number of neurons. These WM-optimized subensembles were more signal correlation (rsignal)-diverse and anatomically dispersed than predicted by the statistics of the full recorded population of neurons, and they often contained neurons that were poorly WM-selective, yet enhanced coding fidelity by shaping the ensemble’s rsc structure. We observed a pattern of rsc between LPFC neurons indicative of recurrent dynamics as a mechanism for WM-related activity and that the rsc structure can increase the fidelity of WM representations. Thus, WM coding in LPFC neuronal ensembles arises from a complex synergy between single neuron coding properties and multidimensional, ensemble-level phenomena. PMID:28275096
Global ensemble texture representations are critical to rapid scene perception.
Brady, Timothy F; Shafer-Skelton, Anna; Alvarez, George A
2017-06-01
Traditionally, recognizing the objects within a scene has been treated as a prerequisite to recognizing the scene itself. However, research now suggests that the ability to rapidly recognize visual scenes could be supported by global properties of the scene itself rather than the objects within the scene. Here, we argue for a particular instantiation of this view: That scenes are recognized by treating them as a global texture and processing the pattern of orientations and spatial frequencies across different areas of the scene without recognizing any objects. To test this model, we asked whether there is a link between how proficient individuals are at rapid scene perception and how proficiently they represent simple spatial patterns of orientation information (global ensemble texture). We find a significant and selective correlation between these tasks, suggesting a link between scene perception and spatial ensemble tasks but not nonspatial summary statistics In a second and third experiment, we additionally show that global ensemble texture information is not only associated with scene recognition, but that preserving only global ensemble texture information from scenes is sufficient to support rapid scene perception; however, preserving the same information is not sufficient for object recognition. Thus, global ensemble texture alone is sufficient to allow activation of scene representations but not object representations. Together, these results provide evidence for a view of scene recognition based on global ensemble texture rather than a view based purely on objects or on nonspatially localized global properties. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Bayesian Parameter Inference and Model Selection by Population Annealing in Systems Biology
Murakami, Yohei
2014-01-01
Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named “posterior parameter ensemble”. We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor. PMID:25089832
Pan, Yuliang; Wang, Zixiang; Zhan, Weihua; Deng, Lei
2018-05-01
Identifying RNA-binding residues, especially energetically favored hot spots, can provide valuable clues for understanding the mechanisms and functional importance of protein-RNA interactions. Yet, limited availability of experimentally recognized energy hot spots in protein-RNA crystal structures leads to the difficulties in developing empirical identification approaches. Computational prediction of RNA-binding hot spot residues is still in its infant stage. Here, we describe a computational method, PrabHot (Prediction of protein-RNA binding hot spots), that can effectively detect hot spot residues on protein-RNA binding interfaces using an ensemble of conceptually different machine learning classifiers. Residue interaction network features and new solvent exposure characteristics are combined together and selected for classification with the Boruta algorithm. In particular, two new reference datasets (benchmark and independent) have been generated containing 107 hot spots from 47 known protein-RNA complex structures. In 10-fold cross-validation on the training dataset, PrabHot achieves promising performances with an AUC score of 0.86 and a sensitivity of 0.78, which are significantly better than that of the pioneer RNA-binding hot spot prediction method HotSPRing. We also demonstrate the capability of our proposed method on the independent test dataset and gain a competitive advantage as a result. The PrabHot webserver is freely available at http://denglab.org/PrabHot/. leideng@csu.edu.cn. Supplementary data are available at Bioinformatics online.
Gupta, S; Basant, N; Mohan, D; Singh, K P
2016-07-01
Experimental determinations of the rate constants of the reaction of NO3 with a large number of organic chemicals are tedious, and time and resource intensive; and the development of computational methods has widely been advocated. In this study, we have developed room-temperature (298 K) and temperature-dependent quantitative structure-reactivity relationship (QSRR) models based on the ensemble learning approaches (decision tree forest (DTF) and decision treeboost (DTB)) for predicting the rate constant of the reaction of NO3 radicals with diverse organic chemicals, under OECD guidelines. Predictive powers of the developed models were established in terms of statistical coefficients. In the test phase, the QSRR models yielded a correlation (r(2)) of >0.94 between experimental and predicted rate constants. The applicability domains of the constructed models were determined. An attempt has been made to provide the mechanistic interpretation of the selected features for QSRR development. The proposed QSRR models outperformed the previous reports, and the temperature-dependent models offered a much wider applicability domain. This is the first report presenting a temperature-dependent QSRR model for predicting the nitrate radical reaction rate constant at different temperatures. The proposed models can be useful tools in predicting the reactivities of chemicals towards NO3 radicals in the atmosphere, hence, their persistence and exposure risk assessment.
Global Maps of Temporal Streamflow Characteristics Based on Observations from Many Small Catchments
NASA Astrophysics Data System (ADS)
Beck, H.; van Dijk, A.; de Roo, A.
2014-12-01
Streamflow (Q) estimation in ungauged catchments is one of the greatest challenges facing hydrologists. We used observed Q from approximately 7500 small catchments (<10,000 km2) around the globe to train neural network ensembles to estimate temporal Q distribution characteristics from climate and physiographic characteristics of the catchments. In total 17 Q characteristics were selected, including mean annual Q, baseflow index, and a number of flow percentiles. Training coefficients of determination for the estimation of the Q characteristics ranged from 0.56 for the baseflow recession constant to 0.93 for the Q timing. Overall, climate indices dominated among the predictors. Predictors related to soils and geology were the least important, perhaps due to data quality. The trained neural network ensembles were subsequently applied spatially over the ice-free land surface including ungauged regions, resulting in global maps of the Q characteristics (0.125° spatial resolution). These maps possess several unique features: 1) they represent purely observation-driven estimates; 2) are based on an unprecedentedly large set of catchments; and 3) have associated uncertainty estimates. The maps can be used for various hydrological applications, including the diagnosis of macro-scale hydrological models. To demonstrate this, the produced maps were compared to equivalent maps derived from the simulated daily Q of five macro-scale hydrological models, highlighting various opportunities for improvement in model Q behavior. The produced dataset is available for download.
Global maps of streamflow characteristics based on observations from several thousand catchments
NASA Astrophysics Data System (ADS)
Beck, Hylke; van Dijk, Albert; de Roo, Ad
2015-04-01
Streamflow (Q) estimation in ungauged catchments is one of the greatest challenges facing hydrologists. Observed Q from three to four thousand small-to-medium sized catchments (10-10000 km2) around the globe were used to train neural network ensembles to estimate Q characteristics based on climate and physiographic characteristics of the catchments. In total 17 Q characteristics were selected, including mean annual Q, baseflow index, and a number of flow percentiles. Testing coefficients of determination for the estimation of the Q characteristics ranged from 0.55 for the baseflow recession constant to 0.93 for the Q timing. Overall, climate indices dominated among the predictors. Predictors related to soils and geology were relatively unimportant, perhaps due to their data quality. The trained neural network ensembles were subsequently applied spatially over the entire ice-free land surface, resulting in global maps of the Q characteristics (0.125° resolution). These maps possess several unique features: they represent observation-driven estimates; are based on an unprecedentedly large set of catchments; and have associated uncertainty estimates. The maps can be used for various hydrological applications, including the diagnosis of macro-scale hydrological models. To demonstrate this, the produced maps were compared to equivalent maps derived from the simulated daily Q of four macro-scale hydrological models, highlighting various opportunities for improvement in model Q behavior. The produced dataset is available via http://water.jrc.ec.europa.eu.
Johnson, David K.; Karanicolas, John
2013-01-01
Despite intense interest and considerable effort via high-throughput screening, there are few examples of small molecules that directly inhibit protein-protein interactions. This suggests that many protein interaction surfaces may not be intrinsically “druggable” by small molecules, and elevates in importance the few successful examples as model systems for improving our fundamental understanding of druggability. Here we describe an approach for exploring protein fluctuations enriched in conformations containing surface pockets suitable for small molecule binding. Starting from a set of seven unbound protein structures, we find that the presence of low-energy pocket-containing conformations is indeed a signature of druggable protein interaction sites and that analogous surface pockets are not formed elsewhere on the protein. We further find that ensembles of conformations generated with this biased approach structurally resemble known inhibitor-bound structures more closely than equivalent ensembles of unbiased conformations. Collectively these results suggest that “druggability” is a property encoded on a protein surface through its propensity to form pockets, and inspire a model in which the crude features of the predisposed pocket(s) restrict the range of complementary ligands; additional smaller conformational changes then respond to details of a particular ligand. We anticipate that the insights described here will prove useful in selecting protein targets for therapeutic intervention. PMID:23505360
Role of orbitofrontal cortex neuronal ensembles in the expression of incubation of heroin craving
Fanous, Sanya; Goldart, Evan M.; Theberge, Florence R.M.; Bossert, Jennifer M.; Shaham, Yavin; Hope, Bruce T.
2012-01-01
In humans, exposure to cues previously associated with heroin use often provokes relapse after prolonged withdrawal periods. In rats, cue-induced heroin-seeking progressively increases after withdrawal (incubation of heroin craving). Here, we examined the role of orbitofrontal cortex (OFC) neuronal ensembles in the enhanced response to heroin cues after prolonged withdrawal or the expression of incubation of heroin craving. We trained rats to self-administer heroin (6-h/d for 10 d) and assessed cue-induced heroin-seeking in extinction tests after 1 or 14 withdrawal days. Cue-induced heroin-seeking increased from 1 day to 14 days and was accompanied by increased Fos expression in ~12% of OFC neurons. Non-selective inactivation of OFC neurons with the GABA agonists baclofen+muscimol decreased cue-induced heroin-seeking on withdrawal day 14 but not day 1. We then used the Daun02 inactivation procedure to assess a causal role of the minority of selectively activated Fos-expressing OFC neurons (that presumably form cue-encoding neuronal ensembles) in cue-induced heroin-seeking after 14 withdrawal days. We trained cfos-lacZ transgenic rats to self-administer heroin and 11 days later re-exposed them to heroin-associated cues or novel cues for 15 min (induction day) followed by OFC Daun02 or vehicle injections 90 min later; we then tested the rats in extinction tests 3 days later. Daun02 selectively decreased cue-induced heroin-seeking in rats previously re-exposed to the heroin-associated cues on induction day, but not in rats previously exposed to novel cues. Results suggest that heroin-cue-activated OFC neuronal ensembles contribute to the expression of incubation of heroin craving. PMID:22915104
Morales, M M; Giannini, N P
2013-05-01
Morphology of extant felids is regarded as highly conservative. Most previous studies have focussed on skull morphology, so a vacuum exists about morphofunctional variation in postcranium and its role in structuring ensembles of felids in different continents. The African felid ensemble is particularly rich in ecologically specialized felids. We studied the ecomorphology of this ensemble using 31 cranial and 93 postcranial morphometric variables measured in 49 specimens of all 10 African species. We took a multivariate approach controlling for phylogeny, with and without body size correction. Postcranial and skull + postcranial analyses (but not skull-only analyses) allowed for a complete segregation of species in morphospace. Morphofunctional factors segregating species included body size, bite force, zeugopodial lengths and osteological features related to parasagittal leg movement. A general gradient of bodily proportions was recovered: lightly built, long-legged felids with small heads and weak bite forces vs. the opposite. Three loose groups were recognized: small terrestrial felids, mid-to-large sized scansorial felids and specialized Acinonyx jubatus and Leptailurus serval. As predicted from a previous study, the assembling of the African felid ensemble during the Plio-Pleistocene occurred by the arrival of distinct felid lineages that occupied then vacant areas of morphospace, later diversifying in the continent. © 2013 The Authors. Journal of Evolutionary Biology © 2013 European Society For Evolutionary Biology.
NASA Astrophysics Data System (ADS)
Hristova-Veleva, S. M.; Chen, H.; Gopalakrishnan, S.; Haddad, Z. S.
2017-12-01
Tropical cyclones (TCs) are the product of complex multi-scale processes and interactions. The role of the environment has long been recognized. However, recent research has shown that convective-scale processes in the hurricane core might also play a crucial role in determining TCs intensity and size. Several studies have linked Rapid Intensification to the characteristics of the convective clouds (shallow versus deep), their organization (isolated versus wide-spread) and their location with respect to dynamical controls (the vertical shear, the radius of maximum wind). Yet a third set of controls signifies the interaction between the storm-scale and large-scale processes. Our goal is to use observations and models to advance the still-lacking understanding of these processes. Recently, hurricane models have improved significantly. However, deterministic forecasts have limitations due to the uncertainty in the representation of the physical processes and initial conditions. A crucial step forward is the use of high-resolution ensembles. We adopt the following approach: i) generate a high resolution ensemble forecast using HWRF; ii) produce synthetic data (e.g. brightness temperature) from the model fields for direct comparison to satellite observations; iii) develop metrics to allow us to sub-select the realistic members of the ensemble, based on objective measures of the similarity between observed and forecasted structures; iv) for these most-realistic members, determine the skill in forecasting TCs to provide"guidance on guidance"; v) use the members with the best predictive skill to untangle the complex multi-scale interactions. We will report on the first three goals of our research, using forecasts and observations of hurricane Edouard (2014), focusing on RI. We will focus on describing the metrics for the selection of the most appropriate ensemble members, based on applying low-wave number analysis (WNA - Hristova-Veleva et al., 2016) to the observed and forecasted 2D fields to develop objective criteria for consistency. We investigate the WNA cartoons of environmental moisture, precipitation structure and surface convergence. We will present the preliminary selection of most skillful members and will outline our future goals - analyzing the multi-scale interactions using these members
The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosis
Koziol, James A.; Feng, Anne C.; Jia, Zhenyu; Wang, Yipeng; Goodison, Seven; McClelland, Michael; Mercola, Dan
2009-01-01
Motivation: Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. Results: Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors. Contact: dmercola@uci.edu PMID:18628288
Smith, Morgan E; Singh, Brajendra K; Irvine, Michael A; Stolk, Wilma A; Subramanian, Swaminathan; Hollingsworth, T Déirdre; Michael, Edwin
2017-03-01
Mathematical models of parasite transmission provide powerful tools for assessing the impacts of interventions. Owing to complexity and uncertainty, no single model may capture all features of transmission and elimination dynamics. Multi-model ensemble modelling offers a framework to help overcome biases of single models. We report on the development of a first multi-model ensemble of three lymphatic filariasis (LF) models (EPIFIL, LYMFASIM, and TRANSFIL), and evaluate its predictive performance in comparison with that of the constituents using calibration and validation data from three case study sites, one each from the three major LF endemic regions: Africa, Southeast Asia and Papua New Guinea (PNG). We assessed the performance of the respective models for predicting the outcomes of annual MDA strategies for various baseline scenarios thought to exemplify the current endemic conditions in the three regions. The results show that the constructed multi-model ensemble outperformed the single models when evaluated across all sites. Single models that best fitted calibration data tended to do less well in simulating the out-of-sample, or validation, intervention data. Scenario modelling results demonstrate that the multi-model ensemble is able to compensate for variance between single models in order to produce more plausible predictions of intervention impacts. Our results highlight the value of an ensemble approach to modelling parasite control dynamics. However, its optimal use will require further methodological improvements as well as consideration of the organizational mechanisms required to ensure that modelling results and data are shared effectively between all stakeholders. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Abawajy, Jemal; Kelarev, Andrei; Chowdhury, Morshed U; Jelinek, Herbert F
2016-01-01
Blood biochemistry attributes form an important class of tests, routinely collected several times per year for many patients with diabetes. The objective of this study is to investigate the role of blood biochemistry for improving the predictive accuracy of the diagnosis of cardiac autonomic neuropathy (CAN) progression. Blood biochemistry contributes to CAN, and so it is a causative factor that can provide additional power for the diagnosis of CAN especially in the absence of a complete set of Ewing tests. We introduce automated iterative multitier ensembles (AIME) and investigate their performance in comparison to base classifiers and standard ensemble classifiers for blood biochemistry attributes. AIME incorporate diverse ensembles into several tiers simultaneously and combine them into one automatically generated integrated system so that one ensemble acts as an integral part of another ensemble. We carried out extensive experimental analysis using large datasets from the diabetes screening research initiative (DiScRi) project. The results of our experiments show that several blood biochemistry attributes can be used to supplement the Ewing battery for the detection of CAN in situations where one or more of the Ewing tests cannot be completed because of the individual difficulties faced by each patient in performing the tests. The results show that AIME provide higher accuracy as a multitier CAN classification paradigm. The best predictive accuracy of 99.57% has been obtained by the AIME combining decorate on top tier with bagging on middle tier based on random forest. Practitioners can use these findings to increase the accuracy of CAN diagnosis.
NASA Astrophysics Data System (ADS)
Courdent, Vianney; Grum, Morten; Mikkelsen, Peter Steen
2018-01-01
Precipitation constitutes a major contribution to the flow in urban storm- and wastewater systems. Forecasts of the anticipated runoff flows, created from radar extrapolation and/or numerical weather predictions, can potentially be used to optimize operation in both wet and dry weather periods. However, flow forecasts are inevitably uncertain and their use will ultimately require a trade-off between the value of knowing what will happen in the future and the probability and consequence of being wrong. In this study we examine how ensemble forecasts from the HIRLAM-DMI-S05 numerical weather prediction (NWP) model subject to three different ensemble post-processing approaches can be used to forecast flow exceedance in a combined sewer for a wide range of ratios between the probability of detection (POD) and the probability of false detection (POFD). We use a hydrological rainfall-runoff model to transform the forecasted rainfall into forecasted flow series and evaluate three different approaches to establishing the relative operating characteristics (ROC) diagram of the forecast, which is a plot of POD against POFD for each fraction of concordant ensemble members and can be used to select the weight of evidence that matches the desired trade-off between POD and POFD. In the first approach, the rainfall input to the model is calculated for each of 25 ensemble members as a weighted average of rainfall from the NWP cells over the catchment where the weights are proportional to the areal intersection between the catchment and the NWP cells. In the second approach, a total of 2825 flow ensembles are generated using rainfall input from the neighbouring NWP cells up to approximately 6 cells in all directions from the catchment. In the third approach, the first approach is extended spatially by successively increasing the area covered and for each spatial increase and each time step selecting only the cell with the highest intensity resulting in a total of 175 ensemble members. While the first and second approaches have the disadvantage of not covering the full range of the ROC diagram and being computationally heavy, respectively, the third approach leads to both a broad coverage of the ROC diagram range at a relatively low computational cost. A broad coverage of the ROC diagram offers a larger selection of prediction skill to choose from to best match to the prediction purpose. The study distinguishes itself from earlier research in being the first application to urban hydrology, with fast runoff and small catchments that are highly sensitive to local extremes. Furthermore, no earlier reference has been found on the highly efficient third approach using only neighbouring cells with the highest threat to expand the range of the ROC diagram. This study provides an efficient and robust approach to using ensemble rainfall forecasts affected by bias and misplacement errors for predicting flow threshold exceedance in urban drainage systems.
Molecular dynamic simulations of selective self-diffusion of CH4/CO2/H2O/N2 in coal
NASA Astrophysics Data System (ADS)
Song, Y.; Jiang, B.; Li, F. L.
2017-06-01
The self-diffusion coefficients (D) of CH4/CO2/H2O/N2 at a relatively broad range of temperatures(298.15∼ 458.15K)and pressures (1∼6MPa) under the NPT, NPH, NVE, and NVT ensembles were obtained after the calculations of molecular mechanics(MM), annealing kinetics(AK), giant canonical Monte Carlo(GCMC), and molecular dynamics (MD) based on Wiser bituminous coal model (WM). The Ds of the adsorbates at the saturated adsorption configurations are D CH4
Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang
2016-11-16
The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.
Bayesian quantitative precipitation forecasts in terms of quantiles
NASA Astrophysics Data System (ADS)
Bentzien, Sabrina; Friederichs, Petra
2014-05-01
Ensemble prediction systems (EPS) for numerical weather predictions on the mesoscale are particularly developed to obtain probabilistic guidance for high impact weather. An EPS not only issues a deterministic future state of the atmosphere but a sample of possible future states. Ensemble postprocessing then translates such a sample of forecasts into probabilistic measures. This study focus on probabilistic quantitative precipitation forecasts in terms of quantiles. Quantiles are particular suitable to describe precipitation at various locations, since no assumption is required on the distribution of precipitation. The focus is on the prediction during high-impact events and related to the Volkswagen Stiftung funded project WEX-MOP (Mesoscale Weather Extremes - Theory, Spatial Modeling and Prediction). Quantile forecasts are derived from the raw ensemble and via quantile regression. Neighborhood method and time-lagging are effective tools to inexpensively increase the ensemble spread, which results in more reliable forecasts especially for extreme precipitation events. Since an EPS provides a large amount of potentially informative predictors, a variable selection is required in order to obtain a stable statistical model. A Bayesian formulation of quantile regression allows for inference about the selection of predictive covariates by the use of appropriate prior distributions. Moreover, the implementation of an additional process layer for the regression parameters accounts for spatial variations of the parameters. Bayesian quantile regression and its spatially adaptive extension is illustrated for the German-focused mesoscale weather prediction ensemble COSMO-DE-EPS, which runs (pre)operationally since December 2010 at the German Meteorological Service (DWD). Objective out-of-sample verification uses the quantile score (QS), a weighted absolute error between quantile forecasts and observations. The QS is a proper scoring function and can be decomposed into reliability, resolutions and uncertainty parts. A quantile reliability plot gives detailed insights in the predictive performance of the quantile forecasts.
Accelerated iterative beam angle selection in IMRT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bangert, Mark, E-mail: m.bangert@dkfz.de; Unkelbach, Jan
2016-03-15
Purpose: Iterative methods for beam angle selection (BAS) for intensity-modulated radiation therapy (IMRT) planning sequentially construct a beneficial ensemble of beam directions. In a naïve implementation, the nth beam is selected by adding beam orientations one-by-one from a discrete set of candidates to an existing ensemble of (n − 1) beams. The best beam orientation is identified in a time consuming process by solving the fluence map optimization (FMO) problem for every candidate beam and selecting the beam that yields the largest improvement to the objective function value. This paper evaluates two alternative methods to accelerate iterative BAS based onmore » surrogates for the FMO objective function value. Methods: We suggest to select candidate beams not based on the FMO objective function value after convergence but (1) based on the objective function value after five FMO iterations of a gradient based algorithm and (2) based on a projected gradient of the FMO problem in the first iteration. The performance of the objective function surrogates is evaluated based on the resulting objective function values and dose statistics in a treatment planning study comprising three intracranial, three pancreas, and three prostate cases. Furthermore, iterative BAS is evaluated for an application in which a small number of noncoplanar beams complement a set of coplanar beam orientations. This scenario is of practical interest as noncoplanar setups may require additional attention of the treatment personnel for every couch rotation. Results: Iterative BAS relying on objective function surrogates yields similar results compared to naïve BAS with regard to the objective function values and dose statistics. At the same time, early stopping of the FMO and using the projected gradient during the first iteration enable reductions in computation time by approximately one to two orders of magnitude. With regard to the clinical delivery of noncoplanar IMRT treatments, we could show that optimized beam ensembles using only a few noncoplanar beam orientations often approach the plan quality of fully noncoplanar ensembles. Conclusions: We conclude that iterative BAS in combination with objective function surrogates can be a viable option to implement automated BAS at clinically acceptable computation times.« less
Accelerated iterative beam angle selection in IMRT.
Bangert, Mark; Unkelbach, Jan
2016-03-01
Iterative methods for beam angle selection (BAS) for intensity-modulated radiation therapy (IMRT) planning sequentially construct a beneficial ensemble of beam directions. In a naïve implementation, the nth beam is selected by adding beam orientations one-by-one from a discrete set of candidates to an existing ensemble of (n - 1) beams. The best beam orientation is identified in a time consuming process by solving the fluence map optimization (FMO) problem for every candidate beam and selecting the beam that yields the largest improvement to the objective function value. This paper evaluates two alternative methods to accelerate iterative BAS based on surrogates for the FMO objective function value. We suggest to select candidate beams not based on the FMO objective function value after convergence but (1) based on the objective function value after five FMO iterations of a gradient based algorithm and (2) based on a projected gradient of the FMO problem in the first iteration. The performance of the objective function surrogates is evaluated based on the resulting objective function values and dose statistics in a treatment planning study comprising three intracranial, three pancreas, and three prostate cases. Furthermore, iterative BAS is evaluated for an application in which a small number of noncoplanar beams complement a set of coplanar beam orientations. This scenario is of practical interest as noncoplanar setups may require additional attention of the treatment personnel for every couch rotation. Iterative BAS relying on objective function surrogates yields similar results compared to naïve BAS with regard to the objective function values and dose statistics. At the same time, early stopping of the FMO and using the projected gradient during the first iteration enable reductions in computation time by approximately one to two orders of magnitude. With regard to the clinical delivery of noncoplanar IMRT treatments, we could show that optimized beam ensembles using only a few noncoplanar beam orientations often approach the plan quality of fully noncoplanar ensembles. We conclude that iterative BAS in combination with objective function surrogates can be a viable option to implement automated BAS at clinically acceptable computation times.
Coherent Rabi Dynamics of a Superradiant Spin Ensemble in a Microwave Cavity
NASA Astrophysics Data System (ADS)
Rose, B. C.; Tyryshkin, A. M.; Riemann, H.; Abrosimov, N. V.; Becker, P.; Pohl, H.-J.; Thewalt, M. L. W.; Itoh, K. M.; Lyon, S. A.
2017-07-01
We achieve the strong-coupling regime between an ensemble of phosphorus donor spins in a highly enriched 28Si crystal and a 3D dielectric resonator. Spins are polarized beyond Boltzmann equilibrium using spin-selective optical excitation of the no-phonon bound exciton transition resulting in N =3.6 ×1 013 unpaired spins in the ensemble. We observe a normal mode splitting of the spin-ensemble-cavity polariton resonances of 2 g √{N }=580 kHz (where each spin is coupled with strength g ) in a cavity with a quality factor of 75 000 (γ ≪κ ≈60 kHz , where γ and κ are the spin dephasing and cavity loss rates, respectively). The spin ensemble has a long dephasing time (T2*=9 μ s ) providing a wide window for viewing the dynamics of the coupled spin-ensemble-cavity system. The free-induction decay shows up to a dozen collapses and revivals revealing a coherent exchange of excitations between the superradiant state of the spin ensemble and the cavity at the rate g √{N }. The ensemble is found to evolve as a single large pseudospin according to the Tavis-Cummings model due to minimal inhomogeneous broadening and uniform spin-cavity coupling. We demonstrate independent control of the total spin and the initial Z projection of the psuedospin using optical excitation and microwave manipulation, respectively. We vary the microwave excitation power to rotate the pseudospin on the Bloch sphere and observe a long delay in the onset of the superradiant emission as the pseudospin approaches full inversion. This delay is accompanied by an abrupt π -phase shift in the peusdospin microwave emission. The scaling of this delay with the initial angle and the sudden phase shift are explained by the Tavis-Cummings model.
Extreme Value Analysis of hydro meteorological extremes in the ClimEx Large-Ensemble
NASA Astrophysics Data System (ADS)
Wood, R. R.; Martel, J. L.; Willkofer, F.; von Trentini, F.; Schmid, F. J.; Leduc, M.; Frigon, A.; Ludwig, R.
2017-12-01
Many studies show an increase in the magnitude and frequency of hydrological extreme events in the course of climate change. However the contribution of natural variability to the magnitude and frequency of hydrological extreme events is not yet settled. A reliable estimate of extreme events is from great interest for water management and public safety. In the course of the ClimEx Project (www.climex-project.org) a new single-model large-ensemble was created by dynamically downscaling the CanESM2 large-ensemble with the Canadian Regional Climate Model version 5 (CRCM5) for an European Domain and a Northeastern North-American domain. By utilizing the ClimEx 50-Member Large-Ensemble (CRCM5 driven by CanESM2 Large-Ensemble) a thorough analysis of natural variability in extreme events is possible. Are the current extreme value statistical methods able to account for natural variability? How large is the natural variability for e.g. a 1/100 year return period derived from a 50-Member Large-Ensemble for Europe and Northeastern North-America? These questions should be answered by applying various generalized extreme value distributions (GEV) to the ClimEx Large-Ensemble. Hereby various return levels (5-, 10-, 20-, 30-, 60- and 100-years) based on various lengths of time series (20-, 30-, 50-, 100- and 1500-years) should be analyzed for the maximum one day precipitation (RX1d), the maximum three hourly precipitation (RX3h) and the streamflow for selected catchments in Europe. The long time series of the ClimEx Ensemble (7500 years) allows us to give a first reliable estimate of the magnitude and frequency of certain extreme events.
Learning about and through Picturebook Artwork
ERIC Educational Resources Information Center
Pantaleo, Sylvia
2018-01-01
Picturebooks are highly sophisticated multimodal ensembles. Understanding the semiotic resources and affordances of the visual mode to represent and communicate meaning is fundamental to appreciating the artistry and complexity of picturebooks. A recent classroom-based research project with grade 2 students featured explicit instruction on…
NASA Astrophysics Data System (ADS)
Li, Tao; Deng, Fu-Guo
2015-10-01
Quantum repeater is one of the important building blocks for long distance quantum communication network. The previous quantum repeaters based on atomic ensembles and linear optical elements can only be performed with a maximal success probability of 1/2 during the entanglement creation and entanglement swapping procedures. Meanwhile, the polarization noise during the entanglement distribution process is harmful to the entangled channel created. Here we introduce a general interface between a polarized photon and an atomic ensemble trapped in a single-sided optical cavity, and with which we propose a high-efficiency quantum repeater protocol in which the robust entanglement distribution is accomplished by the stable spatial-temporal entanglement and it can in principle create the deterministic entanglement between neighboring atomic ensembles in a heralded way as a result of cavity quantum electrodynamics. Meanwhile, the simplified parity-check gate makes the entanglement swapping be completed with unity efficiency, other than 1/2 with linear optics. We detail the performance of our protocol with current experimental parameters and show its robustness to the imperfections, i.e., detuning and coupling variation, involved in the reflection process. These good features make it a useful building block in long distance quantum communication.
Zhou, Shenghan; Qian, Silin; Chang, Wenbing; Xiao, Yiyong; Cheng, Yang
2018-06-14
Timely and accurate state detection and fault diagnosis of rolling element bearings are very critical to ensuring the reliability of rotating machinery. This paper proposes a novel method of rolling bearing fault diagnosis based on a combination of ensemble empirical mode decomposition (EEMD), weighted permutation entropy (WPE) and an improved support vector machine (SVM) ensemble classifier. A hybrid voting (HV) strategy that combines SVM-based classifiers and cloud similarity measurement (CSM) was employed to improve the classification accuracy. First, the WPE value of the bearing vibration signal was calculated to detect the fault. Secondly, if a bearing fault occurred, the vibration signal was decomposed into a set of intrinsic mode functions (IMFs) by EEMD. The WPE values of the first several IMFs were calculated to form the fault feature vectors. Then, the SVM ensemble classifier was composed of binary SVM and the HV strategy to identify the bearing multi-fault types. Finally, the proposed model was fully evaluated by experiments and comparative studies. The results demonstrate that the proposed method can effectively detect bearing faults and maintain a high accuracy rate of fault recognition when a small number of training samples are available.
NASA Astrophysics Data System (ADS)
Natalello, Antonino; Santambrogio, Carlo; Grandori, Rita
2017-01-01
Native mass spectrometry (MS) has become a central tool of structural proteomics, but its applicability to the peculiar class of intrinsically disordered proteins (IDPs) is still object of debate. IDPs lack an ordered tridimensional structure and are characterized by high conformational plasticity. Since they represent valuable targets for cancer and neurodegeneration research, there is an urgent need of methodological advances for description of the conformational ensembles populated by these proteins in solution. However, structural rearrangements during electrospray-ionization (ESI) or after the transfer to the gas phase could affect data obtained by native ESI-MS. In particular, charge-state distributions (CSDs) are affected by protein conformation inside ESI droplets, while ion mobility (IM) reflects protein conformation in the gas phase. This review focuses on the available evidence relating IDP solution ensembles with CSDs, trying to summarize cases of apparent consistency or discrepancy. The protein-specificity of ionization patterns and their responses to ligands and buffer conditions suggests that CSDs are imprinted to protein structural features also in the case of IDPs. Nevertheless, it seems that these proteins are more easily affected by electrospray conditions, leading in some cases to rearrangements of the conformational ensembles.
Li, Tao; Deng, Fu-Guo
2015-10-27
Quantum repeater is one of the important building blocks for long distance quantum communication network. The previous quantum repeaters based on atomic ensembles and linear optical elements can only be performed with a maximal success probability of 1/2 during the entanglement creation and entanglement swapping procedures. Meanwhile, the polarization noise during the entanglement distribution process is harmful to the entangled channel created. Here we introduce a general interface between a polarized photon and an atomic ensemble trapped in a single-sided optical cavity, and with which we propose a high-efficiency quantum repeater protocol in which the robust entanglement distribution is accomplished by the stable spatial-temporal entanglement and it can in principle create the deterministic entanglement between neighboring atomic ensembles in a heralded way as a result of cavity quantum electrodynamics. Meanwhile, the simplified parity-check gate makes the entanglement swapping be completed with unity efficiency, other than 1/2 with linear optics. We detail the performance of our protocol with current experimental parameters and show its robustness to the imperfections, i.e., detuning and coupling variation, involved in the reflection process. These good features make it a useful building block in long distance quantum communication.
Li, Tao; Deng, Fu-Guo
2015-01-01
Quantum repeater is one of the important building blocks for long distance quantum communication network. The previous quantum repeaters based on atomic ensembles and linear optical elements can only be performed with a maximal success probability of 1/2 during the entanglement creation and entanglement swapping procedures. Meanwhile, the polarization noise during the entanglement distribution process is harmful to the entangled channel created. Here we introduce a general interface between a polarized photon and an atomic ensemble trapped in a single-sided optical cavity, and with which we propose a high-efficiency quantum repeater protocol in which the robust entanglement distribution is accomplished by the stable spatial-temporal entanglement and it can in principle create the deterministic entanglement between neighboring atomic ensembles in a heralded way as a result of cavity quantum electrodynamics. Meanwhile, the simplified parity-check gate makes the entanglement swapping be completed with unity efficiency, other than 1/2 with linear optics. We detail the performance of our protocol with current experimental parameters and show its robustness to the imperfections, i.e., detuning and coupling variation, involved in the reflection process. These good features make it a useful building block in long distance quantum communication. PMID:26502993
Natalello, Antonino; Santambrogio, Carlo; Grandori, Rita
2017-01-01
Native mass spectrometry (MS) has become a central tool of structural proteomics, but its applicability to the peculiar class of intrinsically disordered proteins (IDPs) is still object of debate. IDPs lack an ordered tridimensional structure and are characterized by high conformational plasticity. Since they represent valuable targets for cancer and neurodegeneration research, there is an urgent need of methodological advances for description of the conformational ensembles populated by these proteins in solution. However, structural rearrangements during electrospray-ionization (ESI) or after the transfer to the gas phase could affect data obtained by native ESI-MS. In particular, charge-state distributions (CSDs) are affected by protein conformation inside ESI droplets, while ion mobility (IM) reflects protein conformation in the gas phase. This review focuses on the available evidence relating IDP solution ensembles with CSDs, trying to summarize cases of apparent consistency or discrepancy. The protein-specificity of ionization patterns and their responses to ligands and buffer conditions suggests that CSDs are imprinted to protein structural features also in the case of IDPs. Nevertheless, it seems that these proteins are more easily affected by electrospray conditions, leading in some cases to rearrangements of the conformational ensembles. Graphical Abstract ᅟ.
Photodeposited Pd Nanoparticles with Disordered Structure for Phenylacetylene Semihydrogenation
Fan, Qining; He, Sha; Hao, Lin; Liu, Xin; Zhu, Yue; Xu, Sailong; Zhang, Fazhi
2017-01-01
Developing effective heterogeneous metal catalysts with high selectivity and satisfactory activity for chemoselective hydrogenation of alkyne to alkene is of great importance in the chemical industry. Herein, we report our efforts to fabricate TiO2-supported Pd catalysts by a photodeposition method at room temperature for phenylacetylene semihydrogenation to styrene. The resulting Pd/TiO2 catalyst, possessing smaller Pd ensembles with ambiguous lattice fringes and more low coordination Pd sites, exhibits higher styrene selectivity compared to two contrastive Pd/TiO2 samples with larger ensembles and well-organized crystal structure fabricated by deposition-precipitation or photodeposition with subsequent thermal treatment at 300 °C. The sample derived from photodeposition exhibits greatly slow styrene hydrogenation in kinetic evaluation because the disordered structure of Pd particles in photodeposited Pd/TiO2 may prevent the formation of β-hydride phases and probably produce more surface H atoms, which may favor high styrene selectivity. PMID:28176843
Photodeposited Pd Nanoparticles with Disordered Structure for Phenylacetylene Semihydrogenation
NASA Astrophysics Data System (ADS)
Fan, Qining; He, Sha; Hao, Lin; Liu, Xin; Zhu, Yue; Xu, Sailong; Zhang, Fazhi
2017-02-01
Developing effective heterogeneous metal catalysts with high selectivity and satisfactory activity for chemoselective hydrogenation of alkyne to alkene is of great importance in the chemical industry. Herein, we report our efforts to fabricate TiO2-supported Pd catalysts by a photodeposition method at room temperature for phenylacetylene semihydrogenation to styrene. The resulting Pd/TiO2 catalyst, possessing smaller Pd ensembles with ambiguous lattice fringes and more low coordination Pd sites, exhibits higher styrene selectivity compared to two contrastive Pd/TiO2 samples with larger ensembles and well-organized crystal structure fabricated by deposition-precipitation or photodeposition with subsequent thermal treatment at 300 °C. The sample derived from photodeposition exhibits greatly slow styrene hydrogenation in kinetic evaluation because the disordered structure of Pd particles in photodeposited Pd/TiO2 may prevent the formation of β-hydride phases and probably produce more surface H atoms, which may favor high styrene selectivity.
Zhang, Lijun; Huang, Xinyan; Cao, Yuan; Xin, Yunhong; Ding, Liping
2017-12-22
Enormous effort has been put to the detection and recognition of various heavy metal ions due to their involvement in serious environmental pollution and many major diseases. The present work has developed a single fluorescent sensor ensemble that can distinguish and identify a variety of heavy metal ions. A pyrene-based fluorophore (PB) containing a metal ion receptor group was specially designed and synthesized. Anionic surfactant sodium dodecyl sulfate (SDS) assemblies can effectively adjust its fluorescence behavior. The selected binary ensemble based on PB/SDS assemblies can exhibit multiple emission bands and provide wavelength-based cross-reactive responses to a series of metal ions to realize pattern recognition ability. The combination of surfactant assembly modulation and the receptor for metal ions empowers the present sensor ensemble with strong discrimination power, which could well differentiate 13 metal ions, including Cu 2+ , Co 2+ , Ni 2+ , Cr 3+ , Hg 2+ , Fe 3+ , Zn 2+ , Cd 2+ , Al 3+ , Pb 2+ , Ca 2+ , Mg 2+ , and Ba 2+ . Moreover, this single sensing ensemble could be further applied for identifying different brands of drinking water.
Ensemble: a web-based system for psychology survey and experiment management.
Tomic, Stefan T; Janata, Petr
2007-08-01
We provide a description of Ensemble, a suite of Web-integrated modules for managing and analyzing data associated with psychology experiments in a small research lab. The system delivers interfaces via a Web browser for creating and presenting simple surveys without the need to author Web pages and with little or no programming effort. The surveys may be extended by selecting and presenting auditory and/or visual stimuli with MATLAB and Flash to enable a wide range of psychophysical and cognitive experiments which do not require the recording of precise reaction times. Additionally, one is provided with the ability to administer and present experiments remotely. The software technologies employed by the various modules of Ensemble are MySQL, PHP, MATLAB, and Flash. The code for Ensemble is open source and available to the public, so that its functions can be readily extended by users. We describe the architecture of the system, the functionality of each module, and provide basic examples of the interfaces.
A robust activity marking system for exploring active neuronal ensembles
Sørensen, Andreas T; Cooper, Yonatan A; Baratta, Michael V; Weng, Feng-Ju; Zhang, Yuxiang; Ramamoorthi, Kartik; Fropf, Robin; LaVerriere, Emily; Xue, Jian; Young, Andrew; Schneider, Colleen; Gøtzsche, Casper René; Hemberg, Martin; Yin, Jerry CP; Maier, Steven F; Lin, Yingxi
2016-01-01
Understanding how the brain captures transient experience and converts it into long lasting changes in neural circuits requires the identification and investigation of the specific ensembles of neurons that are responsible for the encoding of each experience. We have developed a Robust Activity Marking (RAM) system that allows for the identification and interrogation of ensembles of neurons. The RAM system provides unprecedented high sensitivity and selectivity through the use of an optimized synthetic activity-regulated promoter that is strongly induced by neuronal activity and a modified Tet-Off system that achieves improved temporal control. Due to its compact design, RAM can be packaged into a single adeno-associated virus (AAV), providing great versatility and ease of use, including application to mice, rats, flies, and potentially many other species. Cre-dependent RAM, CRAM, allows for the study of active ensembles of a specific cell type and anatomical connectivity, further expanding the RAM system’s versatility. DOI: http://dx.doi.org/10.7554/eLife.13918.001 PMID:27661450
Ensembles of novelty detection classifiers for structural health monitoring using guided waves
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dib, Gerges; Karpenko, Oleksii; Koricho, Ermias
Guided wave structural health monitoring uses sparse sensor networks embedded in sophisticated structures for defect detection and characterization. The biggest challenge of those sensor networks is developing robust techniques for reliable damage detection under changing environmental and operating conditions. To address this challenge, we develop a novelty classifier for damage detection based on one class support vector machines. We identify appropriate features for damage detection and introduce a feature aggregation method which quadratically increases the number of available training observations.We adopt a two-level voting scheme by using an ensemble of classifiers and predictions. Each classifier is trained on a differentmore » segment of the guided wave signal, and each classifier makes an ensemble of predictions based on a single observation. Using this approach, the classifier can be trained using a small number of baseline signals. We study the performance using monte-carlo simulations of an analytical model and data from impact damage experiments on a glass fiber composite plate.We also demonstrate the classifier performance using two types of baseline signals: fixed and rolling baseline training set. The former requires prior knowledge of baseline signals from all environmental and operating conditions, while the latter does not and leverages the fact that environmental and operating conditions vary slowly over time and can be modeled as a Gaussian process.« less