Curve Boxplot: Generalization of Boxplot for Ensembles of Curves.
Mirzargar, Mahsa; Whitaker, Ross T; Kirby, Robert M
2014-12-01
In simulation science, computational scientists often study the behavior of their simulations by repeated solutions with variations in parameters and/or boundary values or initial conditions. Through such simulation ensembles, one can try to understand or quantify the variability or uncertainty in a solution as a function of the various inputs or model assumptions. In response to a growing interest in simulation ensembles, the visualization community has developed a suite of methods for allowing users to observe and understand the properties of these ensembles in an efficient and effective manner. An important aspect of visualizing simulations is the analysis of derived features, often represented as points, surfaces, or curves. In this paper, we present a novel, nonparametric method for summarizing ensembles of 2D and 3D curves. We propose an extension of a method from descriptive statistics, data depth, to curves. We also demonstrate a set of rendering and visualization strategies for showing rank statistics of an ensemble of curves, which is a generalization of traditional whisker plots or boxplots to multidimensional curves. Results are presented for applications in neuroimaging, hurricane forecasting and fluid dynamics.
Generalized ensemble method applied to study systems with strong first order transitions
Malolepsza, E.; Kim, J.; Keyes, T.
2015-09-28
At strong first-order phase transitions, the entropy versus energy or, at constant pressure, enthalpy, exhibits convex behavior, and the statistical temperature curve correspondingly exhibits an S-loop or back-bending. In the canonical and isothermal-isobaric ensembles, with temperature as the control variable, the probability density functions become bimodal with peaks localized outside of the S-loop region. Inside, states are unstable, and as a result simulation of equilibrium phase coexistence becomes impossible. To overcome this problem, a method was proposed by Kim, Keyes and Straub, where optimally designed generalized ensemble sampling was combined with replica exchange, and denoted generalized replica exchange method (gREM).more » This new technique uses parametrized effective sampling weights that lead to a unimodal energy distribution, transforming unstable states into stable ones. In the present study, the gREM, originally developed as a Monte Carlo algorithm, was implemented to work with molecular dynamics in an isobaric ensemble and coded into LAMMPS, a highly optimized open source molecular simulation package. Lastly, the method is illustrated in a study of the very strong solid/liquid transition in water.« less
Generalized ensemble method applied to study systems with strong first order transitions
NASA Astrophysics Data System (ADS)
Małolepsza, E.; Kim, J.; Keyes, T.
2015-09-01
At strong first-order phase transitions, the entropy versus energy or, at constant pressure, enthalpy, exhibits convex behavior, and the statistical temperature curve correspondingly exhibits an S-loop or back-bending. In the canonical and isothermal-isobaric ensembles, with temperature as the control variable, the probability density functions become bimodal with peaks localized outside of the S-loop region. Inside, states are unstable, and as a result simulation of equilibrium phase coexistence becomes impossible. To overcome this problem, a method was proposed by Kim, Keyes and Straub [1], where optimally designed generalized ensemble sampling was combined with replica exchange, and denoted generalized replica exchange method (gREM). This new technique uses parametrized effective sampling weights that lead to a unimodal energy distribution, transforming unstable states into stable ones. In the present study, the gREM, originally developed as a Monte Carlo algorithm, was implemented to work with molecular dynamics in an isobaric ensemble and coded into LAMMPS, a highly optimized open source molecular simulation package. The method is illustrated in a study of the very strong solid/liquid transition in water.
NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.
Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan
2014-01-01
One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.
Landsgesell, Jonas; Holm, Christian; Smiatek, Jens
2017-02-14
We present a novel method for the study of weak polyelectrolytes and general acid-base reactions in molecular dynamics and Monte Carlo simulations. The approach combines the advantages of the reaction ensemble and the Wang-Landau sampling method. Deprotonation and protonation reactions are simulated explicitly with the help of the reaction ensemble method, while the accurate sampling of the corresponding phase space is achieved by the Wang-Landau approach. The combination of both techniques provides a sufficient statistical accuracy such that meaningful estimates for the density of states and the partition sum can be obtained. With regard to these estimates, several thermodynamic observables like the heat capacity or reaction free energies can be calculated. We demonstrate that the computation times for the calculation of titration curves with a high statistical accuracy can be significantly decreased when compared to the original reaction ensemble method. The applicability of our approach is validated by the study of weak polyelectrolytes and their thermodynamic properties.
Korean Percussion Ensemble ("Samulnori") in the General Music Classroom
ERIC Educational Resources Information Center
Kang, Sangmi; Yoo, Hyesoo
2016-01-01
This article introduces "samulnori" (Korean percussion ensemble), its cultural background, and instructional methods as parts of a classroom approach to teaching upper-level general music. We introduce five of eight sections from "youngnam nong-ak" (a style of samulnori) as a repertoire for teaching Korean percussion music to…
Huisman, J.A.; Breuer, L.; Bormann, H.; Bronstert, A.; Croke, B.F.W.; Frede, H.-G.; Graff, T.; Hubrechts, L.; Jakeman, A.J.; Kite, G.; Lanini, J.; Leavesley, G.; Lettenmaier, D.P.; Lindstrom, G.; Seibert, J.; Sivapalan, M.; Viney, N.R.; Willems, P.
2009-01-01
An ensemble of 10 hydrological models was applied to the same set of land use change scenarios. There was general agreement about the direction of changes in the mean annual discharge and 90% discharge percentile predicted by the ensemble members, although a considerable range in the magnitude of predictions for the scenarios and catchments under consideration was obvious. Differences in the magnitude of the increase were attributed to the different mean annual actual evapotranspiration rates for each land use type. The ensemble of model runs was further analyzed with deterministic and probabilistic ensemble methods. The deterministic ensemble method based on a trimmed mean resulted in a single somewhat more reliable scenario prediction. The probabilistic reliability ensemble averaging (REA) method allowed a quantification of the model structure uncertainty in the scenario predictions. It was concluded that the use of a model ensemble has greatly increased our confidence in the reliability of the model predictions. ?? 2008 Elsevier Ltd.
Ensemble Deep Learning for Biomedical Time Series Classification
2016-01-01
Ensemble learning has been proved to improve the generalization ability effectively in both theory and practice. In this paper, we briefly outline the current status of research on it first. Then, a new deep neural network-based ensemble method that integrates filtering views, local views, distorted views, explicit training, implicit training, subview prediction, and Simple Average is proposed for biomedical time series classification. Finally, we validate its effectiveness on the Chinese Cardiovascular Disease Database containing a large number of electrocardiogram recordings. The experimental results show that the proposed method has certain advantages compared to some well-known ensemble methods, such as Bagging and AdaBoost. PMID:27725828
Malolepsza, Edyta; Secor, Maxim; Keyes, Tom
2015-09-23
A prescription for sampling isobaric generalized ensembles with molecular dynamics is presented and applied to the generalized replica exchange method (gREM), which was designed for simulating first-order phase transitions. The properties of the isobaric gREM ensemble are discussed and a study is presented of the liquid-vapor equilibrium of the guest molecules given for gas hydrate formation with the mW water model. As a result, phase diagrams, critical parameters, and a law of corresponding states are obtained.
NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms
Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan
2014-01-01
One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available. PMID:24667482
NASA Astrophysics Data System (ADS)
Watanabe, S.; Kim, H.; Utsumi, N.
2017-12-01
This study aims to develop a new approach which projects hydrology under climate change using super ensemble experiments. The use of multiple ensemble is essential for the estimation of extreme, which is a major issue in the impact assessment of climate change. Hence, the super ensemble experiments are recently conducted by some research programs. While it is necessary to use multiple ensemble, the multiple calculations of hydrological simulation for each output of ensemble simulations needs considerable calculation costs. To effectively use the super ensemble experiments, we adopt a strategy to use runoff projected by climate models directly. The general approach of hydrological projection is to conduct hydrological model simulations which include land-surface and river routing process using atmospheric boundary conditions projected by climate models as inputs. This study, on the other hand, simulates only river routing model using runoff projected by climate models. In general, the climate model output is systematically biased so that a preprocessing which corrects such bias is necessary for impact assessments. Various bias correction methods have been proposed, but, to the best of our knowledge, no method has proposed for variables other than surface meteorology. Here, we newly propose a method for utilizing the projected future runoff directly. The developed method estimates and corrects the bias based on the pseudo-observation which is a result of retrospective offline simulation. We show an application of this approach to the super ensemble experiments conducted under the program of Half a degree Additional warming, Prognosis and Projected Impacts (HAPPI). More than 400 ensemble experiments from multiple climate models are available. The results of the validation using historical simulations by HAPPI indicates that the output of this approach can effectively reproduce retrospective runoff variability. Likewise, the bias of runoff from super ensemble climate projections is corrected, and the impact of climate change on hydrologic extremes is assessed in a cost-efficient way.
Multiple-instance ensemble learning for hyperspectral images
NASA Astrophysics Data System (ADS)
Ergul, Ugur; Bilgin, Gokhan
2017-10-01
An ensemble framework for multiple-instance (MI) learning (MIL) is introduced for use in hyperspectral images (HSIs) by inspiring the bagging (bootstrap aggregation) method in ensemble learning. Ensemble-based bagging is performed by a small percentage of training samples, and MI bags are formed by a local windowing process with variable window sizes on selected instances. In addition to bootstrap aggregation, random subspace is another method used to diversify base classifiers. The proposed method is implemented using four MIL classification algorithms. The classifier model learning phase is carried out with MI bags, and the estimation phase is performed over single-test instances. In the experimental part of the study, two different HSIs that have ground-truth information are used, and comparative results are demonstrated with state-of-the-art classification methods. In general, the MI ensemble approach produces more compact results in terms of both diversity and error compared to equipollent non-MIL algorithms.
On the statistical equivalence of restrained-ensemble simulations with the maximum entropy method
Roux, Benoît; Weare, Jonathan
2013-01-01
An issue of general interest in computer simulations is to incorporate information from experiments into a structural model. An important caveat in pursuing this goal is to avoid corrupting the resulting model with spurious and arbitrary biases. While the problem of biasing thermodynamic ensembles can be formulated rigorously using the maximum entropy method introduced by Jaynes, the approach can be cumbersome in practical applications with the need to determine multiple unknown coefficients iteratively. A popular alternative strategy to incorporate the information from experiments is to rely on restrained-ensemble molecular dynamics simulations. However, the fundamental validity of this computational strategy remains in question. Here, it is demonstrated that the statistical distribution produced by restrained-ensemble simulations is formally consistent with the maximum entropy method of Jaynes. This clarifies the underlying conditions under which restrained-ensemble simulations will yield results that are consistent with the maximum entropy method. PMID:23464140
A Maximum Entropy Method for Particle Filtering
NASA Astrophysics Data System (ADS)
Eyink, Gregory L.; Kim, Sangil
2006-06-01
Standard ensemble or particle filtering schemes do not properly represent states of low priori probability when the number of available samples is too small, as is often the case in practical applications. We introduce here a set of parametric resampling methods to solve this problem. Motivated by a general H-theorem for relative entropy, we construct parametric models for the filter distributions as maximum-entropy/minimum-information models consistent with moments of the particle ensemble. When the prior distributions are modeled as mixtures of Gaussians, our method naturally generalizes the ensemble Kalman filter to systems with highly non-Gaussian statistics. We apply the new particle filters presented here to two simple test cases: a one-dimensional diffusion process in a double-well potential and the three-dimensional chaotic dynamical system of Lorenz.
Zheng, Lianqing; Chen, Mengen; Yang, Wei
2009-06-21
To overcome the pseudoergodicity problem, conformational sampling can be accelerated via generalized ensemble methods, e.g., through the realization of random walks along prechosen collective variables, such as spatial order parameters, energy scaling parameters, or even system temperatures or pressures, etc. As usually observed, in generalized ensemble simulations, hidden barriers are likely to exist in the space perpendicular to the collective variable direction and these residual free energy barriers could greatly abolish the sampling efficiency. This sampling issue is particularly severe when the collective variable is defined in a low-dimension subset of the target system; then the "Hamiltonian lagging" problem, which reveals the fact that necessary structural relaxation falls behind the move of the collective variable, may be likely to occur. To overcome this problem in equilibrium conformational sampling, we adopted the orthogonal space random walk (OSRW) strategy, which was originally developed in the context of free energy simulation [L. Zheng, M. Chen, and W. Yang, Proc. Natl. Acad. Sci. U.S.A. 105, 20227 (2008)]. Thereby, generalized ensemble simulations can simultaneously escape both the explicit barriers along the collective variable direction and the hidden barriers that are strongly coupled with the collective variable move. As demonstrated in our model studies, the present OSRW based generalized ensemble treatments show improved sampling capability over the corresponding classical generalized ensemble treatments.
Tatinati, Sivanagaraja; Nazarpour, Kianoush; Tech Ang, Wei; Veluvolu, Kalyana C
2016-08-01
Successful treatment of tumors with motion-adaptive radiotherapy requires accurate prediction of respiratory motion, ideally with a prediction horizon larger than the latency in radiotherapy system. Accurate prediction of respiratory motion is however a non-trivial task due to the presence of irregularities and intra-trace variabilities, such as baseline drift and temporal changes in fundamental frequency pattern. In this paper, to enhance the accuracy of the respiratory motion prediction, we propose a stacked regression ensemble framework that integrates heterogeneous respiratory motion prediction algorithms. We further address two crucial issues for developing a successful ensemble framework: (1) selection of appropriate prediction methods to ensemble (level-0 methods) among the best existing prediction methods; and (2) finding a suitable generalization approach that can successfully exploit the relative advantages of the chosen level-0 methods. The efficacy of the developed ensemble framework is assessed with real respiratory motion traces acquired from 31 patients undergoing treatment. Results show that the developed ensemble framework improves the prediction performance significantly compared to the best existing methods. Copyright © 2016 IPEM. Published by Elsevier Ltd. All rights reserved.
Generalized Ensemble Sampling of Enzyme Reaction Free Energy Pathways
Wu, Dongsheng; Fajer, Mikolai I.; Cao, Liaoran; Cheng, Xiaolin; Yang, Wei
2016-01-01
Free energy path sampling plays an essential role in computational understanding of chemical reactions, particularly those occurring in enzymatic environments. Among a variety of molecular dynamics simulation approaches, the generalized ensemble sampling strategy is uniquely attractive for the fact that it not only can enhance the sampling of rare chemical events but also can naturally ensure consistent exploration of environmental degrees of freedom. In this review, we plan to provide a tutorial-like tour on an emerging topic: generalized ensemble sampling of enzyme reaction free energy path. The discussion is largely focused on our own studies, particularly ones based on the metadynamics free energy sampling method and the on-the-path random walk path sampling method. We hope that this mini presentation will provide interested practitioners some meaningful guidance for future algorithm formulation and application study. PMID:27498634
Crossover ensembles of random matrices and skew-orthogonal polynomials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kumar, Santosh, E-mail: skumar.physics@gmail.com; Pandey, Akhilesh, E-mail: ap0700@mail.jnu.ac.in
2011-08-15
Highlights: > We study crossover ensembles of Jacobi family of random matrices. > We consider correlations for orthogonal-unitary and symplectic-unitary crossovers. > We use the method of skew-orthogonal polynomials and quaternion determinants. > We prove universality of spectral correlations in crossover ensembles. > We discuss applications to quantum conductance and communication theory problems. - Abstract: In a recent paper (S. Kumar, A. Pandey, Phys. Rev. E, 79, 2009, p. 026211) we considered Jacobi family (including Laguerre and Gaussian cases) of random matrix ensembles and reported exact solutions of crossover problems involving time-reversal symmetry breaking. In the present paper we givemore » details of the work. We start with Dyson's Brownian motion description of random matrix ensembles and obtain universal hierarchic relations among the unfolded correlation functions. For arbitrary dimensions we derive the joint probability density (jpd) of eigenvalues for all transitions leading to unitary ensembles as equilibrium ensembles. We focus on the orthogonal-unitary and symplectic-unitary crossovers and give generic expressions for jpd of eigenvalues, two-point kernels and n-level correlation functions. This involves generalization of the theory of skew-orthogonal polynomials to crossover ensembles. We also consider crossovers in the circular ensembles to show the generality of our method. In the large dimensionality limit, correlations in spectra with arbitrary initial density are shown to be universal when expressed in terms of a rescaled symmetry breaking parameter. Applications of our crossover results to communication theory and quantum conductance problems are also briefly discussed.« less
Enhanced conformational sampling to visualize a free-energy landscape of protein complex formation
Iida, Shinji; Nakamura, Haruki; Higo, Junichi
2016-01-01
We introduce various, recently developed, generalized ensemble methods, which are useful to sample various molecular configurations emerging in the process of protein–protein or protein–ligand binding. The methods introduced here are those that have been or will be applied to biomolecular binding, where the biomolecules are treated as flexible molecules expressed by an all-atom model in an explicit solvent. Sampling produces an ensemble of conformations (snapshots) that are thermodynamically probable at room temperature. Then, projection of those conformations to an abstract low-dimensional space generates a free-energy landscape. As an example, we show a landscape of homo-dimer formation of an endothelin-1-like molecule computed using a generalized ensemble method. The lowest free-energy cluster at room temperature coincided precisely with the experimentally determined complex structure. Two minor clusters were also found in the landscape, which were largely different from the native complex form. Although those clusters were isolated at room temperature, with rising temperature a pathway emerged linking the lowest and second-lowest free-energy clusters, and a further temperature increment connected all the clusters. This exemplifies that the generalized ensemble method is a powerful tool for computing the free-energy landscape, by which one can discuss the thermodynamic stability of clusters and the temperature dependence of the cluster networks. PMID:27288028
Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo
2016-01-01
Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble's output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) - k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer's disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.
Ling, Qing-Hua; Song, Yu-Qing; Han, Fei; Yang, Dan; Huang, De-Shuang
2016-01-01
For ensemble learning, how to select and combine the candidate classifiers are two key issues which influence the performance of the ensemble system dramatically. Random vector functional link networks (RVFL) without direct input-to-output links is one of suitable base-classifiers for ensemble systems because of its fast learning speed, simple structure and good generalization performance. In this paper, to obtain a more compact ensemble system with improved convergence performance, an improved ensemble of RVFL based on attractive and repulsive particle swarm optimization (ARPSO) with double optimization strategy is proposed. In the proposed method, ARPSO is applied to select and combine the candidate RVFL. As for using ARPSO to select the optimal base RVFL, ARPSO considers both the convergence accuracy on the validation data and the diversity of the candidate ensemble system to build the RVFL ensembles. In the process of combining RVFL, the ensemble weights corresponding to the base RVFL are initialized by the minimum norm least-square method and then further optimized by ARPSO. Finally, a few redundant RVFL is pruned, and thus the more compact ensemble of RVFL is obtained. Moreover, in this paper, theoretical analysis and justification on how to prune the base classifiers on classification problem is presented, and a simple and practically feasible strategy for pruning redundant base classifiers on both classification and regression problems is proposed. Since the double optimization is performed on the basis of the single optimization, the ensemble of RVFL built by the proposed method outperforms that built by some single optimization methods. Experiment results on function approximation and classification problems verify that the proposed method could improve its convergence accuracy as well as reduce the complexity of the ensemble system. PMID:27835638
Ling, Qing-Hua; Song, Yu-Qing; Han, Fei; Yang, Dan; Huang, De-Shuang
2016-01-01
For ensemble learning, how to select and combine the candidate classifiers are two key issues which influence the performance of the ensemble system dramatically. Random vector functional link networks (RVFL) without direct input-to-output links is one of suitable base-classifiers for ensemble systems because of its fast learning speed, simple structure and good generalization performance. In this paper, to obtain a more compact ensemble system with improved convergence performance, an improved ensemble of RVFL based on attractive and repulsive particle swarm optimization (ARPSO) with double optimization strategy is proposed. In the proposed method, ARPSO is applied to select and combine the candidate RVFL. As for using ARPSO to select the optimal base RVFL, ARPSO considers both the convergence accuracy on the validation data and the diversity of the candidate ensemble system to build the RVFL ensembles. In the process of combining RVFL, the ensemble weights corresponding to the base RVFL are initialized by the minimum norm least-square method and then further optimized by ARPSO. Finally, a few redundant RVFL is pruned, and thus the more compact ensemble of RVFL is obtained. Moreover, in this paper, theoretical analysis and justification on how to prune the base classifiers on classification problem is presented, and a simple and practically feasible strategy for pruning redundant base classifiers on both classification and regression problems is proposed. Since the double optimization is performed on the basis of the single optimization, the ensemble of RVFL built by the proposed method outperforms that built by some single optimization methods. Experiment results on function approximation and classification problems verify that the proposed method could improve its convergence accuracy as well as reduce the complexity of the ensemble system.
Knowledge-Based Methods To Train and Optimize Virtual Screening Ensembles
2016-01-01
Ensemble docking can be a successful virtual screening technique that addresses the innate conformational heterogeneity of macromolecular drug targets. Yet, lacking a method to identify a subset of conformational states that effectively segregates active and inactive small molecules, ensemble docking may result in the recommendation of a large number of false positives. Here, three knowledge-based methods that construct structural ensembles for virtual screening are presented. Each method selects ensembles by optimizing an objective function calculated using the receiver operating characteristic (ROC) curve: either the area under the ROC curve (AUC) or a ROC enrichment factor (EF). As the number of receptor conformations, N, becomes large, the methods differ in their asymptotic scaling. Given a set of small molecules with known activities and a collection of target conformations, the most resource intense method is guaranteed to find the optimal ensemble but scales as O(2N). A recursive approximation to the optimal solution scales as O(N2), and a more severe approximation leads to a faster method that scales linearly, O(N). The techniques are generally applicable to any system, and we demonstrate their effectiveness on the androgen nuclear hormone receptor (AR), cyclin-dependent kinase 2 (CDK2), and the peroxisome proliferator-activated receptor δ (PPAR-δ) drug targets. Conformations that consisted of a crystal structure and molecular dynamics simulation cluster centroids were used to form AR and CDK2 ensembles. Multiple available crystal structures were used to form PPAR-δ ensembles. For each target, we show that the three methods perform similarly to one another on both the training and test sets. PMID:27097522
Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo
2016-01-01
Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble’s output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) − k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer’s disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases. PMID:26764911
Enhanced conformational sampling to visualize a free-energy landscape of protein complex formation.
Iida, Shinji; Nakamura, Haruki; Higo, Junichi
2016-06-15
We introduce various, recently developed, generalized ensemble methods, which are useful to sample various molecular configurations emerging in the process of protein-protein or protein-ligand binding. The methods introduced here are those that have been or will be applied to biomolecular binding, where the biomolecules are treated as flexible molecules expressed by an all-atom model in an explicit solvent. Sampling produces an ensemble of conformations (snapshots) that are thermodynamically probable at room temperature. Then, projection of those conformations to an abstract low-dimensional space generates a free-energy landscape. As an example, we show a landscape of homo-dimer formation of an endothelin-1-like molecule computed using a generalized ensemble method. The lowest free-energy cluster at room temperature coincided precisely with the experimentally determined complex structure. Two minor clusters were also found in the landscape, which were largely different from the native complex form. Although those clusters were isolated at room temperature, with rising temperature a pathway emerged linking the lowest and second-lowest free-energy clusters, and a further temperature increment connected all the clusters. This exemplifies that the generalized ensemble method is a powerful tool for computing the free-energy landscape, by which one can discuss the thermodynamic stability of clusters and the temperature dependence of the cluster networks. © 2016 The Author(s).
NASA Astrophysics Data System (ADS)
Oh, Seok-Geun; Suh, Myoung-Seok
2017-07-01
The projection skills of five ensemble methods were analyzed according to simulation skills, training period, and ensemble members, using 198 sets of pseudo-simulation data (PSD) produced by random number generation assuming the simulated temperature of regional climate models. The PSD sets were classified into 18 categories according to the relative magnitude of bias, variance ratio, and correlation coefficient, where each category had 11 sets (including 1 truth set) with 50 samples. The ensemble methods used were as follows: equal weighted averaging without bias correction (EWA_NBC), EWA with bias correction (EWA_WBC), weighted ensemble averaging based on root mean square errors and correlation (WEA_RAC), WEA based on the Taylor score (WEA_Tay), and multivariate linear regression (Mul_Reg). The projection skills of the ensemble methods improved generally as compared with the best member for each category. However, their projection skills are significantly affected by the simulation skills of the ensemble member. The weighted ensemble methods showed better projection skills than non-weighted methods, in particular, for the PSD categories having systematic biases and various correlation coefficients. The EWA_NBC showed considerably lower projection skills than the other methods, in particular, for the PSD categories with systematic biases. Although Mul_Reg showed relatively good skills, it showed strong sensitivity to the PSD categories, training periods, and number of members. On the other hand, the WEA_Tay and WEA_RAC showed relatively superior skills in both the accuracy and reliability for all the sensitivity experiments. This indicates that WEA_Tay and WEA_RAC are applicable even for simulation data with systematic biases, a short training period, and a small number of ensemble members.
Generalized canonical ensembles and ensemble equivalence
DOE Office of Scientific and Technical Information (OSTI.GOV)
Costeniuc, M.; Ellis, R.S.; Turkington, B.
2006-02-15
This paper is a companion piece to our previous work [J. Stat. Phys. 119, 1283 (2005)], which introduced a generalized canonical ensemble obtained by multiplying the usual Boltzmann weight factor e{sup -{beta}}{sup H} of the canonical ensemble with an exponential factor involving a continuous function g of the Hamiltonian H. We provide here a simplified introduction to our previous work, focusing now on a number of physical rather than mathematical aspects of the generalized canonical ensemble. The main result discussed is that, for suitable choices of g, the generalized canonical ensemble reproduces, in the thermodynamic limit, all the microcanonical equilibriummore » properties of the many-body system represented by H even if this system has a nonconcave microcanonical entropy function. This is something that in general the standard (g=0) canonical ensemble cannot achieve. Thus a virtue of the generalized canonical ensemble is that it can often be made equivalent to the microcanonical ensemble in cases in which the canonical ensemble cannot. The case of quadratic g functions is discussed in detail; it leads to the so-called Gaussian ensemble.« less
Pairwise Classifier Ensemble with Adaptive Sub-Classifiers for fMRI Pattern Analysis.
Kim, Eunwoo; Park, HyunWook
2017-02-01
The multi-voxel pattern analysis technique is applied to fMRI data for classification of high-level brain functions using pattern information distributed over multiple voxels. In this paper, we propose a classifier ensemble for multiclass classification in fMRI analysis, exploiting the fact that specific neighboring voxels can contain spatial pattern information. The proposed method converts the multiclass classification to a pairwise classifier ensemble, and each pairwise classifier consists of multiple sub-classifiers using an adaptive feature set for each class-pair. Simulated and real fMRI data were used to verify the proposed method. Intra- and inter-subject analyses were performed to compare the proposed method with several well-known classifiers, including single and ensemble classifiers. The comparison results showed that the proposed method can be generally applied to multiclass classification in both simulations and real fMRI analyses.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Timofeev, Andrey V.; Egorov, Dmitry V.
This paper presents new results concerning selection of an optimal information fusion formula for an ensemble of Lipschitz classifiers. The goal of information fusion is to create an integral classificatory which could provide better generalization ability of the ensemble while achieving a practically acceptable level of effectiveness. The problem of information fusion is very relevant for data processing in multi-channel C-OTDR-monitoring systems. In this case we have to effectively classify targeted events which appear in the vicinity of the monitored object. Solution of this problem is based on usage of an ensemble of Lipschitz classifiers each of which corresponds tomore » a respective channel. We suggest a brand new method for information fusion in case of ensemble of Lipschitz classifiers. This method is called “The Weighing of Inversely as Lipschitz Constants” (WILC). Results of WILC-method practical usage in multichannel C-OTDR monitoring systems are presented.« less
Post-processing method for wind speed ensemble forecast using wind speed and direction
NASA Astrophysics Data System (ADS)
Sofie Eide, Siri; Bjørnar Bremnes, John; Steinsland, Ingelin
2017-04-01
Statistical methods are widely applied to enhance the quality of both deterministic and ensemble NWP forecasts. In many situations, like wind speed forecasting, most of the predictive information is contained in one variable in the NWP models. However, in statistical calibration of deterministic forecasts it is often seen that including more variables can further improve forecast skill. For ensembles this is rarely taken advantage of, mainly due to that it is generally not straightforward how to include multiple variables. In this study, it is demonstrated how multiple variables can be included in Bayesian model averaging (BMA) by using a flexible regression method for estimating the conditional means. The method is applied to wind speed forecasting at 204 Norwegian stations based on wind speed and direction forecasts from the ECMWF ensemble system. At about 85 % of the sites the ensemble forecasts were improved in terms of CRPS by adding wind direction as predictor compared to only using wind speed. On average the improvements were about 5 %, but mainly for moderate to strong wind situations. For weak wind speeds adding wind direction had more or less neutral impact.
Decimated Input Ensembles for Improved Generalization
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Oza, Nikunj C.; Norvig, Peter (Technical Monitor)
1999-01-01
Recently, many researchers have demonstrated that using classifier ensembles (e.g., averaging the outputs of multiple classifiers before reaching a classification decision) leads to improved performance for many difficult generalization problems. However, in many domains there are serious impediments to such "turnkey" classification accuracy improvements. Most notable among these is the deleterious effect of highly correlated classifiers on the ensemble performance. One particular solution to this problem is generating "new" training sets by sampling the original one. However, with finite number of patterns, this causes a reduction in the training patterns each classifier sees, often resulting in considerably worsened generalization performance (particularly for high dimensional data domains) for each individual classifier. Generally, this drop in the accuracy of the individual classifier performance more than offsets any potential gains due to combining, unless diversity among classifiers is actively promoted. In this work, we introduce a method that: (1) reduces the correlation among the classifiers; (2) reduces the dimensionality of the data, thus lessening the impact of the 'curse of dimensionality'; and (3) improves the classification performance of the ensemble.
NASA Astrophysics Data System (ADS)
Liu, Li; Xu, Yue-Ping
2017-04-01
Ensemble flood forecasting driven by numerical weather prediction products is becoming more commonly used in operational flood forecasting applications.In this study, a hydrological ensemble flood forecasting system based on Variable Infiltration Capacity (VIC) model and quantitative precipitation forecasts from TIGGE dataset is constructed for Lanjiang Basin, Southeast China. The impacts of calibration strategies and ensemble methods on the performance of the system are then evaluated.The hydrological model is optimized by parallel programmed ɛ-NSGAII multi-objective algorithm and two respectively parameterized models are determined to simulate daily flows and peak flows coupled with a modular approach.The results indicatethat the ɛ-NSGAII algorithm permits more efficient optimization and rational determination on parameter setting.It is demonstrated that the multimodel ensemble streamflow mean have better skills than the best singlemodel ensemble mean (ECMWF) and the multimodel ensembles weighted on members and skill scores outperform other multimodel ensembles. For typical flood event, it is proved that the flood can be predicted 3-4 days in advance, but the flows in rising limb can be captured with only 1-2 days ahead due to the flash feature. With respect to peak flows selected by Peaks Over Threshold approach, the ensemble means from either singlemodel or multimodels are generally underestimated as the extreme values are smoothed out by ensemble process.
Generalized ensemble theory with non-extensive statistics
NASA Astrophysics Data System (ADS)
Shen, Ke-Ming; Zhang, Ben-Wei; Wang, En-Ke
2017-12-01
The non-extensive canonical ensemble theory is reconsidered with the method of Lagrange multipliers by maximizing Tsallis entropy, with the constraint that the normalized term of Tsallis' q -average of physical quantities, the sum ∑ pjq, is independent of the probability pi for Tsallis parameter q. The self-referential problem in the deduced probability and thermal quantities in non-extensive statistics is thus avoided, and thermodynamical relationships are obtained in a consistent and natural way. We also extend the study to the non-extensive grand canonical ensemble theory and obtain the q-deformed Bose-Einstein distribution as well as the q-deformed Fermi-Dirac distribution. The theory is further applied to the generalized Planck law to demonstrate the distinct behaviors of the various generalized q-distribution functions discussed in literature.
Mathematical foundations of hybrid data assimilation from a synchronization perspective
NASA Astrophysics Data System (ADS)
Penny, Stephen G.
2017-12-01
The state-of-the-art data assimilation methods used today in operational weather prediction centers around the world can be classified as generalized one-way coupled impulsive synchronization. This classification permits the investigation of hybrid data assimilation methods, which combine dynamic error estimates of the system state with long time-averaged (climatological) error estimates, from a synchronization perspective. Illustrative results show how dynamically informed formulations of the coupling matrix (via an Ensemble Kalman Filter, EnKF) can lead to synchronization when observing networks are sparse and how hybrid methods can lead to synchronization when those dynamic formulations are inadequate (due to small ensemble sizes). A large-scale application with a global ocean general circulation model is also presented. Results indicate that the hybrid methods also have useful applications in generalized synchronization, in particular, for correcting systematic model errors.
Mathematical foundations of hybrid data assimilation from a synchronization perspective.
Penny, Stephen G
2017-12-01
The state-of-the-art data assimilation methods used today in operational weather prediction centers around the world can be classified as generalized one-way coupled impulsive synchronization. This classification permits the investigation of hybrid data assimilation methods, which combine dynamic error estimates of the system state with long time-averaged (climatological) error estimates, from a synchronization perspective. Illustrative results show how dynamically informed formulations of the coupling matrix (via an Ensemble Kalman Filter, EnKF) can lead to synchronization when observing networks are sparse and how hybrid methods can lead to synchronization when those dynamic formulations are inadequate (due to small ensemble sizes). A large-scale application with a global ocean general circulation model is also presented. Results indicate that the hybrid methods also have useful applications in generalized synchronization, in particular, for correcting systematic model errors.
NASA Astrophysics Data System (ADS)
Liu, Li; Gao, Chao; Xuan, Weidong; Xu, Yue-Ping
2017-11-01
Ensemble flood forecasts by hydrological models using numerical weather prediction products as forcing data are becoming more commonly used in operational flood forecasting applications. In this study, a hydrological ensemble flood forecasting system comprised of an automatically calibrated Variable Infiltration Capacity model and quantitative precipitation forecasts from TIGGE dataset is constructed for Lanjiang Basin, Southeast China. The impacts of calibration strategies and ensemble methods on the performance of the system are then evaluated. The hydrological model is optimized by the parallel programmed ε-NSGA II multi-objective algorithm. According to the solutions by ε-NSGA II, two differently parameterized models are determined to simulate daily flows and peak flows at each of the three hydrological stations. Then a simple yet effective modular approach is proposed to combine these daily and peak flows at the same station into one composite series. Five ensemble methods and various evaluation metrics are adopted. The results show that ε-NSGA II can provide an objective determination on parameter estimation, and the parallel program permits a more efficient simulation. It is also demonstrated that the forecasts from ECMWF have more favorable skill scores than other Ensemble Prediction Systems. The multimodel ensembles have advantages over all the single model ensembles and the multimodel methods weighted on members and skill scores outperform other methods. Furthermore, the overall performance at three stations can be satisfactory up to ten days, however the hydrological errors can degrade the skill score by approximately 2 days, and the influence persists until a lead time of 10 days with a weakening trend. With respect to peak flows selected by the Peaks Over Threshold approach, the ensemble means from single models or multimodels are generally underestimated, indicating that the ensemble mean can bring overall improvement in forecasting of flows. For peak values taking flood forecasts from each individual member into account is more appropriate.
Reliable probabilities through statistical post-processing of ensemble predictions
NASA Astrophysics Data System (ADS)
Van Schaeybroeck, Bert; Vannitsem, Stéphane
2013-04-01
We develop post-processing or calibration approaches based on linear regression that make ensemble forecasts more reliable. We enforce climatological reliability in the sense that the total variability of the prediction is equal to the variability of the observations. Second, we impose ensemble reliability such that the spread around the ensemble mean of the observation coincides with the one of the ensemble members. In general the attractors of the model and reality are inhomogeneous. Therefore ensemble spread displays a variability not taken into account in standard post-processing methods. We overcome this by weighting the ensemble by a variable error. The approaches are tested in the context of the Lorenz 96 model (Lorenz 1996). The forecasts become more reliable at short lead times as reflected by a flatter rank histogram. Our best method turns out to be superior to well-established methods like EVMOS (Van Schaeybroeck and Vannitsem, 2011) and Nonhomogeneous Gaussian Regression (Gneiting et al., 2005). References [1] Gneiting, T., Raftery, A. E., Westveld, A., Goldman, T., 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133, 1098-1118. [2] Lorenz, E. N., 1996: Predictability - a problem partly solved. Proceedings, Seminar on Predictability ECMWF. 1, 1-18. [3] Van Schaeybroeck, B., and S. Vannitsem, 2011: Post-processing through linear regression, Nonlin. Processes Geophys., 18, 147.
NASA Astrophysics Data System (ADS)
Siripatana, Adil; Mayo, Talea; Sraj, Ihab; Knio, Omar; Dawson, Clint; Le Maitre, Olivier; Hoteit, Ibrahim
2017-08-01
Bayesian estimation/inversion is commonly used to quantify and reduce modeling uncertainties in coastal ocean model, especially in the framework of parameter estimation. Based on Bayes rule, the posterior probability distribution function (pdf) of the estimated quantities is obtained conditioned on available data. It can be computed either directly, using a Markov chain Monte Carlo (MCMC) approach, or by sequentially processing the data following a data assimilation approach, which is heavily exploited in large dimensional state estimation problems. The advantage of data assimilation schemes over MCMC-type methods arises from the ability to algorithmically accommodate a large number of uncertain quantities without significant increase in the computational requirements. However, only approximate estimates are generally obtained by this approach due to the restricted Gaussian prior and noise assumptions that are generally imposed in these methods. This contribution aims at evaluating the effectiveness of utilizing an ensemble Kalman-based data assimilation method for parameter estimation of a coastal ocean model against an MCMC polynomial chaos (PC)-based scheme. We focus on quantifying the uncertainties of a coastal ocean ADvanced CIRCulation (ADCIRC) model with respect to the Manning's n coefficients. Based on a realistic framework of observation system simulation experiments (OSSEs), we apply an ensemble Kalman filter and the MCMC method employing a surrogate of ADCIRC constructed by a non-intrusive PC expansion for evaluating the likelihood, and test both approaches under identical scenarios. We study the sensitivity of the estimated posteriors with respect to the parameters of the inference methods, including ensemble size, inflation factor, and PC order. A full analysis of both methods, in the context of coastal ocean model, suggests that an ensemble Kalman filter with appropriate ensemble size and well-tuned inflation provides reliable mean estimates and uncertainties of Manning's n coefficients compared to the full posterior distributions inferred by MCMC.
NASA Astrophysics Data System (ADS)
Re, Matteo; Valentini, Giorgio
2012-03-01
Ensemble methods are statistical and computational learning procedures reminiscent of the human social learning behavior of seeking several opinions before making any crucial decision. The idea of combining the opinions of different "experts" to obtain an overall “ensemble” decision is rooted in our culture at least from the classical age of ancient Greece, and it has been formalized during the Enlightenment with the Condorcet Jury Theorem[45]), which proved that the judgment of a committee is superior to those of individuals, provided the individuals have reasonable competence. Ensembles are sets of learning machines that combine in some way their decisions, or their learning algorithms, or different views of data, or other specific characteristics to obtain more reliable and more accurate predictions in supervised and unsupervised learning problems [48,116]. A simple example is represented by the majority vote ensemble, by which the decisions of different learning machines are combined, and the class that receives the majority of “votes” (i.e., the class predicted by the majority of the learning machines) is the class predicted by the overall ensemble [158]. In the literature, a plethora of terms other than ensembles has been used, such as fusion, combination, aggregation, and committee, to indicate sets of learning machines that work together to solve a machine learning problem [19,40,56,66,99,108,123], but in this chapter we maintain the term ensemble in its widest meaning, in order to include the whole range of combination methods. Nowadays, ensemble methods represent one of the main current research lines in machine learning [48,116], and the interest of the research community on ensemble methods is witnessed by conferences and workshops specifically devoted to ensembles, first of all the multiple classifier systems (MCS) conference organized by Roli, Kittler, Windeatt, and other researchers of this area [14,62,85,149,173]. Several theories have been proposed to explain the characteristics and the successful application of ensembles to different application domains. For instance, Allwein, Schapire, and Singer interpreted the improved generalization capabilities of ensembles of learning machines in the framework of large margin classifiers [4,177], Kleinberg in the context of stochastic discrimination theory [112], and Breiman and Friedman in the light of the bias-variance analysis borrowed from classical statistics [21,70]. Empirical studies showed that both in classification and regression problems, ensembles improve on single learning machines, and moreover large experimental studies compared the effectiveness of different ensemble methods on benchmark data sets [10,11,49,188]. The interest in this research area is motivated also by the availability of very fast computers and networks of workstations at a relatively low cost that allow the implementation and the experimentation of complex ensemble methods using off-the-shelf computer platforms. However, as explained in Section 26.2 there are deeper reasons to use ensembles of learning machines, motivated by the intrinsic characteristics of the ensemble methods. The main aim of this chapter is to introduce ensemble methods and to provide an overview and a bibliography of the main areas of research, without pretending to be exhaustive or to explain the detailed characteristics of each ensemble method. The paper is organized as follows. In the next section, the main theoretical and practical reasons for combining multiple learners are introduced. Section 26.3 depicts the main taxonomies on ensemble methods proposed in the literature. In Section 26.4 and 26.5, we present an overview of the main supervised ensemble methods reported in the literature, adopting a simple taxonomy, originally proposed in Ref. [201]. Applications of ensemble methods are only marginally considered, but a specific section on some relevant applications of ensemble methods in astronomy and astrophysics has been added (Section 26.6). The conclusion (Section 26.7) ends this paper and lists some issues not covered in this work.
Hierarchical Ensemble Methods for Protein Function Prediction
2014-01-01
Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware “flat” prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a “consensus” ensemble decision, taking into account the hierarchical relationships between classes. The main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research. PMID:25937954
Biased Metropolis Sampling for Rugged Free Energy Landscapes
NASA Astrophysics Data System (ADS)
Berg, Bernd A.
2003-11-01
Metropolis simulations of all-atom models of peptides (i.e. small proteins) are considered. Inspired by the funnel picture of Bryngelson and Wolyness, a transformation of the updating probabilities of the dihedral angles is defined, which uses probability densities from a higher temperature to improve the algorithmic performance at a lower temperature. The method is suitable for canonical as well as for generalized ensemble simulations. A simple approximation to the full transformation is tested at room temperature for Met-Enkephalin in vacuum. Integrated autocorrelation times are found to be reduced by factors close to two and a similar improvement due to generalized ensemble methods enters multiplicatively.
Evolutionary Ensemble for In Silico Prediction of Ames Test Mutagenicity
NASA Astrophysics Data System (ADS)
Chen, Huanhuan; Yao, Xin
Driven by new regulations and animal welfare, the need to develop in silico models has increased recently as alternative approaches to safety assessment of chemicals without animal testing. This paper describes a novel machine learning ensemble approach to building an in silico model for the prediction of the Ames test mutagenicity, one of a battery of the most commonly used experimental in vitro and in vivo genotoxicity tests for safety evaluation of chemicals. Evolutionary random neural ensemble with negative correlation learning (ERNE) [1] was developed based on neural networks and evolutionary algorithms. ERNE combines the method of bootstrap sampling on training data with the method of random subspace feature selection to ensure diversity in creating individuals within an initial ensemble. Furthermore, while evolving individuals within the ensemble, it makes use of the negative correlation learning, enabling individual NNs to be trained as accurate as possible while still manage to maintain them as diverse as possible. Therefore, the resulting individuals in the final ensemble are capable of cooperating collectively to achieve better generalization of prediction. The empirical experiment suggest that ERNE is an effective ensemble approach for predicting the Ames test mutagenicity of chemicals.
NASA Astrophysics Data System (ADS)
Rooper, Christopher N.; Zimmermann, Mark; Prescott, Megan M.
2017-08-01
Deep-sea coral and sponge ecosystems are widespread throughout most of Alaska's marine waters, and are associated with many different species of fishes and invertebrates. These ecosystems are vulnerable to the effects of commercial fishing activities and climate change. We compared four commonly used species distribution models (general linear models, generalized additive models, boosted regression trees and random forest models) and an ensemble model to predict the presence or absence and abundance of six groups of benthic invertebrate taxa in the Gulf of Alaska. All four model types performed adequately on training data for predicting presence and absence, with regression forest models having the best overall performance measured by the area under the receiver-operating-curve (AUC). The models also performed well on the test data for presence and absence with average AUCs ranging from 0.66 to 0.82. For the test data, ensemble models performed the best. For abundance data, there was an obvious demarcation in performance between the two regression-based methods (general linear models and generalized additive models), and the tree-based models. The boosted regression tree and random forest models out-performed the other models by a wide margin on both the training and testing data. However, there was a significant drop-off in performance for all models of invertebrate abundance ( 50%) when moving from the training data to the testing data. Ensemble model performance was between the tree-based and regression-based methods. The maps of predictions from the models for both presence and abundance agreed very well across model types, with an increase in variability in predictions for the abundance data. We conclude that where data conforms well to the modeled distribution (such as the presence-absence data and binomial distribution in this study), the four types of models will provide similar results, although the regression-type models may be more consistent with biological theory. For data with highly zero-inflated distributions and non-normal distributions such as the abundance data from this study, the tree-based methods performed better. Ensemble models that averaged predictions across the four model types, performed better than the GLM or GAM models but slightly poorer than the tree-based methods, suggesting ensemble models might be more robust to overfitting than tree methods, while mitigating some of the disadvantages in predictive performance of regression methods.
Effects of ensemble and summary displays on interpretations of geospatial uncertainty data.
Padilla, Lace M; Ruginski, Ian T; Creem-Regehr, Sarah H
2017-01-01
Ensemble and summary displays are two widely used methods to represent visual-spatial uncertainty; however, there is disagreement about which is the most effective technique to communicate uncertainty to the general public. Visualization scientists create ensemble displays by plotting multiple data points on the same Cartesian coordinate plane. Despite their use in scientific practice, it is more common in public presentations to use visualizations of summary displays, which scientists create by plotting statistical parameters of the ensemble members. While prior work has demonstrated that viewers make different decisions when viewing summary and ensemble displays, it is unclear what components of the displays lead to diverging judgments. This study aims to compare the salience of visual features - or visual elements that attract bottom-up attention - as one possible source of diverging judgments made with ensemble and summary displays in the context of hurricane track forecasts. We report that salient visual features of both ensemble and summary displays influence participant judgment. Specifically, we find that salient features of summary displays of geospatial uncertainty can be misunderstood as displaying size information. Further, salient features of ensemble displays evoke judgments that are indicative of accurate interpretations of the underlying probability distribution of the ensemble data. However, when participants use ensemble displays to make point-based judgments, they may overweight individual ensemble members in their decision-making process. We propose that ensemble displays are a promising alternative to summary displays in a geospatial context but that decisions about visualization methods should be informed by the viewer's task.
Locally Weighted Ensemble Clustering.
Huang, Dong; Wang, Chang-Dong; Lai, Jian-Huang
2018-05-01
Due to its ability to combine multiple base clusterings into a probably better and more robust clustering, the ensemble clustering technique has been attracting increasing attention in recent years. Despite the significant success, one limitation to most of the existing ensemble clustering methods is that they generally treat all base clusterings equally regardless of their reliability, which makes them vulnerable to low-quality base clusterings. Although some efforts have been made to (globally) evaluate and weight the base clusterings, yet these methods tend to view each base clustering as an individual and neglect the local diversity of clusters inside the same base clustering. It remains an open problem how to evaluate the reliability of clusters and exploit the local diversity in the ensemble to enhance the consensus performance, especially, in the case when there is no access to data features or specific assumptions on data distribution. To address this, in this paper, we propose a novel ensemble clustering approach based on ensemble-driven cluster uncertainty estimation and local weighting strategy. In particular, the uncertainty of each cluster is estimated by considering the cluster labels in the entire ensemble via an entropic criterion. A novel ensemble-driven cluster validity measure is introduced, and a locally weighted co-association matrix is presented to serve as a summary for the ensemble of diverse clusters. With the local diversity in ensembles exploited, two novel consensus functions are further proposed. Extensive experiments on a variety of real-world datasets demonstrate the superiority of the proposed approach over the state-of-the-art.
Ren, Fulong; Cao, Peng; Li, Wei; Zhao, Dazhe; Zaiane, Osmar
2017-01-01
Diabetic retinopathy (DR) is a progressive disease, and its detection at an early stage is crucial for saving a patient's vision. An automated screening system for DR can help in reduce the chances of complete blindness due to DR along with lowering the work load on ophthalmologists. Among the earliest signs of DR are microaneurysms (MAs). However, current schemes for MA detection appear to report many false positives because detection algorithms have high sensitivity. Inevitably some non-MAs structures are labeled as MAs in the initial MAs identification step. This is a typical "class imbalance problem". Class imbalanced data has detrimental effects on the performance of conventional classifiers. In this work, we propose an ensemble based adaptive over-sampling algorithm for overcoming the class imbalance problem in the false positive reduction, and we use Boosting, Bagging, Random subspace as the ensemble framework to improve microaneurysm detection. The ensemble based over-sampling methods we proposed combine the strength of adaptive over-sampling and ensemble. The objective of the amalgamation of ensemble and adaptive over-sampling is to reduce the induction biases introduced from imbalanced data and to enhance the generalization classification performance of extreme learning machines (ELM). Experimental results show that our ASOBoost method has higher area under the ROC curve (AUC) and G-mean values than many existing class imbalance learning methods. Copyright © 2016 Elsevier Ltd. All rights reserved.
The total probabilities from high-resolution ensemble forecasting of floods
NASA Astrophysics Data System (ADS)
Olav Skøien, Jon; Bogner, Konrad; Salamon, Peter; Smith, Paul; Pappenberger, Florian
2015-04-01
Ensemble forecasting has for a long time been used in meteorological modelling, to give an indication of the uncertainty of the forecasts. As meteorological ensemble forecasts often show some bias and dispersion errors, there is a need for calibration and post-processing of the ensembles. Typical methods for this are Bayesian Model Averaging (Raftery et al., 2005) and Ensemble Model Output Statistics (EMOS) (Gneiting et al., 2005). There are also methods for regionalizing these methods (Berrocal et al., 2007) and for incorporating the correlation between lead times (Hemri et al., 2013). To make optimal predictions of floods along the stream network in hydrology, we can easily use the ensemble members as input to the hydrological models. However, some of the post-processing methods will need modifications when regionalizing the forecasts outside the calibration locations, as done by Hemri et al. (2013). We present a method for spatial regionalization of the post-processed forecasts based on EMOS and top-kriging (Skøien et al., 2006). We will also look into different methods for handling the non-normality of runoff and the effect on forecasts skills in general and for floods in particular. Berrocal, V. J., Raftery, A. E. and Gneiting, T.: Combining Spatial Statistical and Ensemble Information in Probabilistic Weather Forecasts, Mon. Weather Rev., 135(4), 1386-1402, doi:10.1175/MWR3341.1, 2007. Gneiting, T., Raftery, A. E., Westveld, A. H. and Goldman, T.: Calibrated Probabilistic Forecasting Using Ensemble Model Output Statistics and Minimum CRPS Estimation, Mon. Weather Rev., 133(5), 1098-1118, doi:10.1175/MWR2904.1, 2005. Hemri, S., Fundel, F. and Zappa, M.: Simultaneous calibration of ensemble river flow predictions over an entire range of lead times, Water Resour. Res., 49(10), 6744-6755, doi:10.1002/wrcr.20542, 2013. Raftery, A. E., Gneiting, T., Balabdaoui, F. and Polakowski, M.: Using Bayesian Model Averaging to Calibrate Forecast Ensembles, Mon. Weather Rev., 133(5), 1155-1174, doi:10.1175/MWR2906.1, 2005. Skøien, J. O., Merz, R. and Blöschl, G.: Top-kriging - Geostatistics on stream networks, Hydrol. Earth Syst. Sci., 10(2), 277-287, 2006.
Estimating Convection Parameters in the GFDL CM2.1 Model Using Ensemble Data Assimilation
NASA Astrophysics Data System (ADS)
Li, Shan; Zhang, Shaoqing; Liu, Zhengyu; Lu, Lv; Zhu, Jiang; Zhang, Xuefeng; Wu, Xinrong; Zhao, Ming; Vecchi, Gabriel A.; Zhang, Rong-Hua; Lin, Xiaopei
2018-04-01
Parametric uncertainty in convection parameterization is one major source of model errors that cause model climate drift. Convection parameter tuning has been widely studied in atmospheric models to help mitigate the problem. However, in a fully coupled general circulation model (CGCM), convection parameters which impact the ocean as well as the climate simulation may have different optimal values. This study explores the possibility of estimating convection parameters with an ensemble coupled data assimilation method in a CGCM. Impacts of the convection parameter estimation on climate analysis and forecast are analyzed. In a twin experiment framework, five convection parameters in the GFDL coupled model CM2.1 are estimated individually and simultaneously under both perfect and imperfect model regimes. Results show that the ensemble data assimilation method can help reduce the bias in convection parameters. With estimated convection parameters, the analyses and forecasts for both the atmosphere and the ocean are generally improved. It is also found that information in low latitudes is relatively more important for estimating convection parameters. This study further suggests that when important parameters in appropriate physical parameterizations are identified, incorporating their estimation into traditional ensemble data assimilation procedure could improve the final analysis and climate prediction.
Dynamical predictive power of the generalized Gibbs ensemble revealed in a second quench.
Zhang, J M; Cui, F C; Hu, Jiangping
2012-04-01
We show that a quenched and relaxed completely integrable system is hardly distinguishable from the corresponding generalized Gibbs ensemble in a dynamical sense. To be specific, the response of the quenched and relaxed system to a second quench can be accurately reproduced by using the generalized Gibbs ensemble as a substitute. Remarkably, as demonstrated with the transverse Ising model and the hard-core bosons in one dimension, not only the steady values but even the transient, relaxation dynamics of the physical variables can be accurately reproduced by using the generalized Gibbs ensemble as a pseudoinitial state. This result is an important complement to the previously established result that a quenched and relaxed system is hardly distinguishable from the generalized Gibbs ensemble in a static sense. The relevance of the generalized Gibbs ensemble in the nonequilibrium dynamics of completely integrable systems is then greatly strengthened.
Ocean state and uncertainty forecasts using HYCOM with Local Ensemble Transfer Kalman Filter (LETKF)
NASA Astrophysics Data System (ADS)
Wei, Mozheng; Hogan, Pat; Rowley, Clark; Smedstad, Ole-Martin; Wallcraft, Alan; Penny, Steve
2017-04-01
An ensemble forecast system based on the US Navy's operational HYCOM using Local Ensemble Transfer Kalman Filter (LETKF) technology has been developed for ocean state and uncertainty forecasts. One of the advantages is that the best possible initial analysis states for the HYCOM forecasts are provided by the LETKF which assimilates the operational observations using ensemble method. The background covariance during this assimilation process is supplied with the ensemble, thus it avoids the difficulty of developing tangent linear and adjoint models for 4D-VAR from the complicated hybrid isopycnal vertical coordinate in HYCOM. Another advantage is that the ensemble system provides the valuable uncertainty estimate corresponding to every state forecast from HYCOM. Uncertainty forecasts have been proven to be critical for the downstream users and managers to make more scientifically sound decisions in numerical prediction community. In addition, ensemble mean is generally more accurate and skilful than the single traditional deterministic forecast with the same resolution. We will introduce the ensemble system design and setup, present some results from 30-member ensemble experiment, and discuss scientific, technical and computational issues and challenges, such as covariance localization, inflation, model related uncertainties and sensitivity to the ensemble size.
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Oza, Nikunj C.; Clancy, Daniel (Technical Monitor)
2001-01-01
Using an ensemble of classifiers instead of a single classifier has been shown to improve generalization performance in many pattern recognition problems. However, the extent of such improvement depends greatly on the amount of correlation among the errors of the base classifiers. Therefore, reducing those correlations while keeping the classifiers' performance levels high is an important area of research. In this article, we explore input decimation (ID), a method which selects feature subsets for their ability to discriminate among the classes and uses them to decouple the base classifiers. We provide a summary of the theoretical benefits of correlation reduction, along with results of our method on two underwater sonar data sets, three benchmarks from the Probenl/UCI repositories, and two synthetic data sets. The results indicate that input decimated ensembles (IDEs) outperform ensembles whose base classifiers use all the input features; randomly selected subsets of features; and features created using principal components analysis, on a wide range of domains.
Viney, N.R.; Bormann, H.; Breuer, L.; Bronstert, A.; Croke, B.F.W.; Frede, H.; Graff, T.; Hubrechts, L.; Huisman, J.A.; Jakeman, A.J.; Kite, G.W.; Lanini, J.; Leavesley, G.; Lettenmaier, D.P.; Lindstrom, G.; Seibert, J.; Sivapalan, M.; Willems, P.
2009-01-01
This paper reports on a project to compare predictions from a range of catchment models applied to a mesoscale river basin in central Germany and to assess various ensemble predictions of catchment streamflow. The models encompass a large range in inherent complexity and input requirements. In approximate order of decreasing complexity, they are DHSVM, MIKE-SHE, TOPLATS, WASIM-ETH, SWAT, PRMS, SLURP, HBV, LASCAM and IHACRES. The models are calibrated twice using different sets of input data. The two predictions from each model are then combined by simple averaging to produce a single-model ensemble. The 10 resulting single-model ensembles are combined in various ways to produce multi-model ensemble predictions. Both the single-model ensembles and the multi-model ensembles are shown to give predictions that are generally superior to those of their respective constituent models, both during a 7-year calibration period and a 9-year validation period. This occurs despite a considerable disparity in performance of the individual models. Even the weakest of models is shown to contribute useful information to the ensembles they are part of. The best model combination methods are a trimmed mean (constructed using the central four or six predictions each day) and a weighted mean ensemble (with weights calculated from calibration performance) that places relatively large weights on the better performing models. Conditional ensembles, in which separate model weights are used in different system states (e.g. summer and winter, high and low flows) generally yield little improvement over the weighted mean ensemble. However a conditional ensemble that discriminates between rising and receding flows shows moderate improvement. An analysis of ensemble predictions shows that the best ensembles are not necessarily those containing the best individual models. Conversely, it appears that some models that predict well individually do not necessarily combine well with other models in multi-model ensembles. The reasons behind these observations may relate to the effects of the weighting schemes, non-stationarity of the climate series and possible cross-correlations between models. Crown Copyright ?? 2008.
Predicting protein function and other biomedical characteristics with heterogeneous ensembles
Whalen, Sean; Pandey, Om Prakash
2015-01-01
Prediction problems in biomedical sciences, including protein function prediction (PFP), are generally quite difficult. This is due in part to incomplete knowledge of the cellular phenomenon of interest, the appropriateness and data quality of the variables and measurements used for prediction, as well as a lack of consensus regarding the ideal predictor for specific problems. In such scenarios, a powerful approach to improving prediction performance is to construct heterogeneous ensemble predictors that combine the output of diverse individual predictors that capture complementary aspects of the problems and/or datasets. In this paper, we demonstrate the potential of such heterogeneous ensembles, derived from stacking and ensemble selection methods, for addressing PFP and other similar biomedical prediction problems. Deeper analysis of these results shows that the superior predictive ability of these methods, especially stacking, can be attributed to their attention to the following aspects of the ensemble learning process: (i) better balance of diversity and performance, (ii) more effective calibration of outputs and (iii) more robust incorporation of additional base predictors. Finally, to make the effective application of heterogeneous ensembles to large complex datasets (big data) feasible, we present DataSink, a distributed ensemble learning framework, and demonstrate its sound scalability using the examined datasets. DataSink is publicly available from https://github.com/shwhalen/datasink. PMID:26342255
Ensemble stacking mitigates biases in inference of synaptic connectivity.
Chambers, Brendan; Levy, Maayan; Dechery, Joseph B; MacLean, Jason N
2018-01-01
A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches.
Numerical weather prediction model tuning via ensemble prediction system
NASA Astrophysics Data System (ADS)
Jarvinen, H.; Laine, M.; Ollinaho, P.; Solonen, A.; Haario, H.
2011-12-01
This paper discusses a novel approach to tune predictive skill of numerical weather prediction (NWP) models. NWP models contain tunable parameters which appear in parameterizations schemes of sub-grid scale physical processes. Currently, numerical values of these parameters are specified manually. In a recent dual manuscript (QJRMS, revised) we developed a new concept and method for on-line estimation of the NWP model parameters. The EPPES ("Ensemble prediction and parameter estimation system") method requires only minimal changes to the existing operational ensemble prediction infra-structure and it seems very cost-effective because practically no new computations are introduced. The approach provides an algorithmic decision making tool for model parameter optimization in operational NWP. In EPPES, statistical inference about the NWP model tunable parameters is made by (i) generating each member of the ensemble of predictions using different model parameter values, drawn from a proposal distribution, and (ii) feeding-back the relative merits of the parameter values to the proposal distribution, based on evaluation of a suitable likelihood function against verifying observations. In the presentation, the method is first illustrated in low-order numerical tests using a stochastic version of the Lorenz-95 model which effectively emulates the principal features of ensemble prediction systems. The EPPES method correctly detects the unknown and wrongly specified parameters values, and leads to an improved forecast skill. Second, results with an atmospheric general circulation model based ensemble prediction system show that the NWP model tuning capacity of EPPES scales up to realistic models and ensemble prediction systems. Finally, a global top-end NWP model tuning exercise with preliminary results is published.
Ensemble-Based Parameter Estimation in a Coupled GCM Using the Adaptive Spatial Average Method
Liu, Y.; Liu, Z.; Zhang, S.; ...
2014-05-29
Ensemble-based parameter estimation for a climate model is emerging as an important topic in climate research. And for a complex system such as a coupled ocean–atmosphere general circulation model, the sensitivity and response of a model variable to a model parameter could vary spatially and temporally. An adaptive spatial average (ASA) algorithm is proposed to increase the efficiency of parameter estimation. Refined from a previous spatial average method, the ASA uses the ensemble spread as the criterion for selecting “good” values from the spatially varying posterior estimated parameter values; these good values are then averaged to give the final globalmore » uniform posterior parameter. In comparison with existing methods, the ASA parameter estimation has a superior performance: faster convergence and enhanced signal-to-noise ratio.« less
A benchmark for reaction coordinates in the transition path ensemble
2016-01-01
The molecular mechanism of a reaction is embedded in its transition path ensemble, the complete collection of reactive trajectories. Utilizing the information in the transition path ensemble alone, we developed a novel metric, which we termed the emergent potential energy, for distinguishing reaction coordinates from the bath modes. The emergent potential energy can be understood as the average energy cost for making a displacement of a coordinate in the transition path ensemble. Where displacing a bath mode invokes essentially no cost, it costs significantly to move the reaction coordinate. Based on some general assumptions of the behaviors of reaction and bath coordinates in the transition path ensemble, we proved theoretically with statistical mechanics that the emergent potential energy could serve as a benchmark of reaction coordinates and demonstrated its effectiveness by applying it to a prototypical system of biomolecular dynamics. Using the emergent potential energy as guidance, we developed a committor-free and intuition-independent method for identifying reaction coordinates in complex systems. We expect this method to be applicable to a wide range of reaction processes in complex biomolecular systems. PMID:27059559
A second-order unconstrained optimization method for canonical-ensemble density-functional methods
NASA Astrophysics Data System (ADS)
Nygaard, Cecilie R.; Olsen, Jeppe
2013-03-01
A second order converging method of ensemble optimization (SOEO) in the framework of Kohn-Sham Density-Functional Theory is presented, where the energy is minimized with respect to an ensemble density matrix. It is general in the sense that the number of fractionally occupied orbitals is not predefined, but rather it is optimized by the algorithm. SOEO is a second order Newton-Raphson method of optimization, where both the form of the orbitals and the occupation numbers are optimized simultaneously. To keep the occupation numbers between zero and two, a set of occupation angles is defined, from which the occupation numbers are expressed as trigonometric functions. The total number of electrons is controlled by a built-in second order restriction of the Newton-Raphson equations, which can be deactivated in the case of a grand-canonical ensemble (where the total number of electrons is allowed to change). To test the optimization method, dissociation curves for diatomic carbon are produced using different functionals for the exchange-correlation energy. These curves show that SOEO favors symmetry broken pure-state solutions when using functionals with exact exchange such as Hartree-Fock and Becke three-parameter Lee-Yang-Parr. This is explained by an unphysical contribution to the exact exchange energy from interactions between fractional occupations. For functionals without exact exchange, such as local density approximation or Becke Lee-Yang-Parr, ensemble solutions are favored at interatomic distances larger than the equilibrium distance. Calculations on the chromium dimer are also discussed. They show that SOEO is able to converge to ensemble solutions for systems that are more complicated than diatomic carbon.
van Diedenhoven, Bastiaan; Ackerman, Andrew S.; Fridlind, Ann M.; Cairns, Brian
2017-01-01
The use of ensemble-average values of aspect ratio and distortion parameter of hexagonal ice prisms for the estimation of ensemble-average scattering asymmetry parameters is evaluated. Using crystal aspect ratios greater than unity generally leads to ensemble-average values of aspect ratio that are inconsistent with the ensemble-average asymmetry parameters. When a definition of aspect ratio is used that limits the aspect ratio to below unity (α≤1) for both hexagonal plates and columns, the effective asymmetry parameters calculated using ensemble-average aspect ratios are generally consistent with ensemble-average asymmetry parameters, especially if aspect ratios are geometrically averaged. Ensemble-average distortion parameters generally also yield effective asymmetry parameters that are largely consistent with ensemble-average asymmetry parameters. In the case of mixtures of plates and columns, it is recommended to geometrically average the α≤1 aspect ratios and to subsequently calculate the effective asymmetry parameter using a column or plate geometry when the contribution by columns to a given mixture’s total projected area is greater or lower than 50%, respectively. In addition, we show that ensemble-average aspect ratios, distortion parameters and asymmetry parameters can generally be retrieved accurately from simulated multi-directional polarization measurements based on mixtures of varying columns and plates. However, such retrievals tend to be somewhat biased toward yielding column-like aspect ratios. Furthermore, generally large retrieval errors can occur for mixtures with approximately equal contributions of columns and plates and for ensembles with strong contributions of thin plates. PMID:28983127
A stochastic diffusion process for Lochner's generalized Dirichlet distribution
Bakosi, J.; Ristorcelli, J. R.
2013-10-01
The method of potential solutions of Fokker-Planck equations is used to develop a transport equation for the joint probability of N stochastic variables with Lochner’s generalized Dirichlet distribution as its asymptotic solution. Individual samples of a discrete ensemble, obtained from the system of stochastic differential equations, equivalent to the Fokker-Planck equation developed here, satisfy a unit-sum constraint at all times and ensure a bounded sample space, similarly to the process developed in for the Dirichlet distribution. Consequently, the generalized Dirichlet diffusion process may be used to represent realizations of a fluctuating ensemble of N variables subject to a conservation principle.more » Compared to the Dirichlet distribution and process, the additional parameters of the generalized Dirichlet distribution allow a more general class of physical processes to be modeled with a more general covariance matrix.« less
Residue-level global and local ensemble-ensemble comparisons of protein domains.
Clark, Sarah A; Tronrud, Dale E; Karplus, P Andrew
2015-09-01
Many methods of protein structure generation such as NMR-based solution structure determination and template-based modeling do not produce a single model, but an ensemble of models consistent with the available information. Current strategies for comparing ensembles lose information because they use only a single representative structure. Here, we describe the ENSEMBLATOR and its novel strategy to directly compare two ensembles containing the same atoms to identify significant global and local backbone differences between them on per-atom and per-residue levels, respectively. The ENSEMBLATOR has four components: eePREP (ee for ensemble-ensemble), which selects atoms common to all models; eeCORE, which identifies atoms belonging to a cutoff-distance dependent common core; eeGLOBAL, which globally superimposes all models using the defined core atoms and calculates for each atom the two intraensemble variations, the interensemble variation, and the closest approach of members of the two ensembles; and eeLOCAL, which performs a local overlay of each dipeptide and, using a novel measure of local backbone similarity, reports the same four variations as eeGLOBAL. The combination of eeGLOBAL and eeLOCAL analyses identifies the most significant differences between ensembles. We illustrate the ENSEMBLATOR's capabilities by showing how using it to analyze NMR ensembles and to compare NMR ensembles with crystal structures provides novel insights compared to published studies. One of these studies leads us to suggest that a "consistency check" of NMR-derived ensembles may be a useful analysis step for NMR-based structure determinations in general. The ENSEMBLATOR 1.0 is available as a first generation tool to carry out ensemble-ensemble comparisons. © 2015 The Protein Society.
Residue-level global and local ensemble-ensemble comparisons of protein domains
Clark, Sarah A; Tronrud, Dale E; Andrew Karplus, P
2015-01-01
Many methods of protein structure generation such as NMR-based solution structure determination and template-based modeling do not produce a single model, but an ensemble of models consistent with the available information. Current strategies for comparing ensembles lose information because they use only a single representative structure. Here, we describe the ENSEMBLATOR and its novel strategy to directly compare two ensembles containing the same atoms to identify significant global and local backbone differences between them on per-atom and per-residue levels, respectively. The ENSEMBLATOR has four components: eePREP (ee for ensemble-ensemble), which selects atoms common to all models; eeCORE, which identifies atoms belonging to a cutoff-distance dependent common core; eeGLOBAL, which globally superimposes all models using the defined core atoms and calculates for each atom the two intraensemble variations, the interensemble variation, and the closest approach of members of the two ensembles; and eeLOCAL, which performs a local overlay of each dipeptide and, using a novel measure of local backbone similarity, reports the same four variations as eeGLOBAL. The combination of eeGLOBAL and eeLOCAL analyses identifies the most significant differences between ensembles. We illustrate the ENSEMBLATOR's capabilities by showing how using it to analyze NMR ensembles and to compare NMR ensembles with crystal structures provides novel insights compared to published studies. One of these studies leads us to suggest that a “consistency check” of NMR-derived ensembles may be a useful analysis step for NMR-based structure determinations in general. The ENSEMBLATOR 1.0 is available as a first generation tool to carry out ensemble-ensemble comparisons. PMID:26032515
Fluctuating observation time ensembles in the thermodynamics of trajectories
NASA Astrophysics Data System (ADS)
Budini, Adrián A.; Turner, Robert M.; Garrahan, Juan P.
2014-03-01
The dynamics of stochastic systems, both classical and quantum, can be studied by analysing the statistical properties of dynamical trajectories. The properties of ensembles of such trajectories for long, but fixed, times are described by large-deviation (LD) rate functions. These LD functions play the role of dynamical free energies: they are cumulant generating functions for time-integrated observables, and their analytic structure encodes dynamical phase behaviour. This ‘thermodynamics of trajectories’ approach is to trajectories and dynamics what the equilibrium ensemble method of statistical mechanics is to configurations and statics. Here we show that, just like in the static case, there are a variety of alternative ensembles of trajectories, each defined by their global constraints, with that of trajectories of fixed total time being just one of these. We show how the LD functions that describe an ensemble of trajectories where some time-extensive quantity is constant (and large) but where total observation time fluctuates can be mapped to those of the fixed-time ensemble. We discuss how the correspondence between generalized ensembles can be exploited in path sampling schemes for generating rare dynamical trajectories.
NASA Astrophysics Data System (ADS)
Saito, Kazuo; Hara, Masahiro; Kunii, Masaru; Seko, Hiromu; Yamaguchi, Munehiko
2011-05-01
Different initial perturbation methods for the mesoscale ensemble prediction were compared by the Meteorological Research Institute (MRI) as a part of the intercomparison of mesoscale ensemble prediction systems (EPSs) of the World Weather Research Programme (WWRP) Beijing 2008 Olympics Research and Development Project (B08RDP). Five initial perturbation methods for mesoscale ensemble prediction were developed for B08RDP and compared at MRI: (1) a downscaling method of the Japan Meteorological Agency (JMA)'s operational one-week EPS (WEP), (2) a targeted global model singular vector (GSV) method, (3) a mesoscale model singular vector (MSV) method based on the adjoint model of the JMA non-hydrostatic model (NHM), (4) a mesoscale breeding growing mode (MBD) method based on the NHM forecast and (5) a local ensemble transform (LET) method based on the local ensemble transform Kalman filter (LETKF) using NHM. These perturbation methods were applied to the preliminary experiments of the B08RDP Tier-1 mesoscale ensemble prediction with a horizontal resolution of 15 km. To make the comparison easier, the same horizontal resolution (40 km) was employed for the three mesoscale model-based initial perturbation methods (MSV, MBD and LET). The GSV method completely outperformed the WEP method, confirming the advantage of targeting in mesoscale EPS. The GSV method generally performed well with regard to root mean square errors of the ensemble mean, large growth rates of ensemble spreads throughout the 36-h forecast period, and high detection rates and high Brier skill scores (BSSs) for weak rains. On the other hand, the mesoscale model-based initial perturbation methods showed good detection rates and BSSs for intense rains. The MSV method showed a rapid growth in the ensemble spread of precipitation up to a forecast time of 6 h, which suggests suitability of the mesoscale SV for short-range EPSs, but the initial large growth of the perturbation did not last long. The performance of the MBD method was good for ensemble prediction of intense rain with a relatively small computing cost. The LET method showed similar characteristics to the MBD method, but the spread and growth rate were slightly smaller and the relative operating characteristic area skill score and BSS did not surpass those of MBD. These characteristic features of the five methods were confirmed by checking the evolution of the total energy norms and their growth rates. Characteristics of the initial perturbations obtained by four methods (GSV, MSV, MBD and LET) were examined for the case of a synoptic low-pressure system passing over eastern China. With GSV and MSV, the regions of large spread were near the low-pressure system, but with MSV, the distribution was more concentrated on the mesoscale disturbance. On the other hand, large-spread areas were observed southwest of the disturbance in MBD and LET. The horizontal pattern of LET perturbation was similar to that of MBD, but the amplitude of the LET perturbation reflected the observation density.
New Aspects of Probabilistic Forecast Verification Using Information Theory
NASA Astrophysics Data System (ADS)
Tödter, Julian; Ahrens, Bodo
2013-04-01
This work deals with information-theoretical methods in probabilistic forecast verification, particularly concerning ensemble forecasts. Recent findings concerning the "Ignorance Score" are shortly reviewed, then a consistent generalization to continuous forecasts is motivated. For ensemble-generated forecasts, the presented measures can be calculated exactly. The Brier Score (BS) and its generalizations to the multi-categorical Ranked Probability Score (RPS) and to the Continuous Ranked Probability Score (CRPS) are prominent verification measures for probabilistic forecasts. Particularly, their decompositions into measures quantifying the reliability, resolution and uncertainty of the forecasts are attractive. Information theory sets up a natural framework for forecast verification. Recently, it has been shown that the BS is a second-order approximation of the information-based Ignorance Score (IGN), which also contains easily interpretable components and can also be generalized to a ranked version (RIGN). Here, the IGN, its generalizations and decompositions are systematically discussed in analogy to the variants of the BS. Additionally, a Continuous Ranked IGN (CRIGN) is introduced in analogy to the CRPS. The useful properties of the conceptually appealing CRIGN are illustrated, together with an algorithm to evaluate its components reliability, resolution, and uncertainty for ensemble-generated forecasts. This algorithm can also be used to calculate the decomposition of the more traditional CRPS exactly. The applicability of the "new" measures is demonstrated in a small evaluation study of ensemble-based precipitation forecasts.
Identifying Optimal Measurement Subspace for the Ensemble Kalman Filter
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Ning; Huang, Zhenyu; Welch, Greg
2012-05-24
To reduce the computational load of the ensemble Kalman filter while maintaining its efficacy, an optimization algorithm based on the generalized eigenvalue decomposition method is proposed for identifying the most informative measurement subspace. When the number of measurements is large, the proposed algorithm can be used to make an effective tradeoff between computational complexity and estimation accuracy. This algorithm also can be extended to other Kalman filters for measurement subspace selection.
Design of an Evolutionary Approach for Intrusion Detection
2013-01-01
A novel evolutionary approach is proposed for effective intrusion detection based on benchmark datasets. The proposed approach can generate a pool of noninferior individual solutions and ensemble solutions thereof. The generated ensembles can be used to detect the intrusions accurately. For intrusion detection problem, the proposed approach could consider conflicting objectives simultaneously like detection rate of each attack class, error rate, accuracy, diversity, and so forth. The proposed approach can generate a pool of noninferior solutions and ensembles thereof having optimized trade-offs values of multiple conflicting objectives. In this paper, a three-phase, approach is proposed to generate solutions to a simple chromosome design in the first phase. In the first phase, a Pareto front of noninferior individual solutions is approximated. In the second phase of the proposed approach, the entire solution set is further refined to determine effective ensemble solutions considering solution interaction. In this phase, another improved Pareto front of ensemble solutions over that of individual solutions is approximated. The ensemble solutions in improved Pareto front reported improved detection results based on benchmark datasets for intrusion detection. In the third phase, a combination method like majority voting method is used to fuse the predictions of individual solutions for determining prediction of ensemble solution. Benchmark datasets, namely, KDD cup 1999 and ISCX 2012 dataset, are used to demonstrate and validate the performance of the proposed approach for intrusion detection. The proposed approach can discover individual solutions and ensemble solutions thereof with a good support and a detection rate from benchmark datasets (in comparison with well-known ensemble methods like bagging and boosting). In addition, the proposed approach is a generalized classification approach that is applicable to the problem of any field having multiple conflicting objectives, and a dataset can be represented in the form of labelled instances in terms of its features. PMID:24376390
Cao, Lushuai; Krönke, Sven; Vendrell, Oriol; Schmelcher, Peter
2013-10-07
We develop the multi-layer multi-configuration time-dependent Hartree method for bosons (ML-MCTDHB), a variational numerically exact ab initio method for studying the quantum dynamics and stationary properties of general bosonic systems. ML-MCTDHB takes advantage of the permutation symmetry of identical bosons, which allows for investigations of the quantum dynamics from few to many-body systems. Moreover, the multi-layer feature enables ML-MCTDHB to describe mixed bosonic systems consisting of arbitrary many species. Multi-dimensional as well as mixed-dimensional systems can be accurately and efficiently simulated via the multi-layer expansion scheme. We provide a detailed account of the underlying theory and the corresponding implementation. We also demonstrate the superior performance by applying the method to the tunneling dynamics of bosonic ensembles in a one-dimensional double well potential, where a single-species bosonic ensemble of various correlation strengths and a weakly interacting two-species bosonic ensemble are considered.
Wu, Xiongwu; Damjanovic, Ana; Brooks, Bernard R.
2013-01-01
This review provides a comprehensive description of the self-guided Langevin dynamics (SGLD) and the self-guided molecular dynamics (SGMD) methods and their applications. Example systems are included to provide guidance on optimal application of these methods in simulation studies. SGMD/SGLD has enhanced ability to overcome energy barriers and accelerate rare events to affordable time scales. It has been demonstrated that with moderate parameters, SGLD can routinely cross energy barriers of 20 kT at a rate that molecular dynamics (MD) or Langevin dynamics (LD) crosses 10 kT barriers. The core of these methods is the use of local averages of forces and momenta in a direct manner that can preserve the canonical ensemble. The use of such local averages results in methods where low frequency motion “borrows” energy from high frequency degrees of freedom when a barrier is approached and then returns that excess energy after a barrier is crossed. This self-guiding effect also results in an accelerated diffusion to enhance conformational sampling efficiency. The resulting ensemble with SGLD deviates in a small way from the canonical ensemble, and that deviation can be corrected with either an on-the-fly or a post processing reweighting procedure that provides an excellent canonical ensemble for systems with a limited number of accelerated degrees of freedom. Since reweighting procedures are generally not size extensive, a newer method, SGLDfp, uses local averages of both momenta and forces to preserve the ensemble without reweighting. The SGLDfp approach is size extensive and can be used to accelerate low frequency motion in large systems, or in systems with explicit solvent where solvent diffusion is also to be enhanced. Since these methods are direct and straightforward, they can be used in conjunction with many other sampling methods or free energy methods by simply replacing the integration of degrees of freedom that are normally sampled by MD or LD. PMID:23913991
From a structural average to the conformational ensemble of a DNA bulge
Shi, Xuesong; Beauchamp, Kyle A.; Harbury, Pehr B.; Herschlag, Daniel
2014-01-01
Direct experimental measurements of conformational ensembles are critical for understanding macromolecular function, but traditional biophysical methods do not directly report the solution ensemble of a macromolecule. Small-angle X-ray scattering interferometry has the potential to overcome this limitation by providing the instantaneous distance distribution between pairs of gold-nanocrystal probes conjugated to a macromolecule in solution. Our X-ray interferometry experiments reveal an increasing bend angle of DNA duplexes with bulges of one, three, and five adenosine residues, consistent with previous FRET measurements, and further reveal an increasingly broad conformational ensemble with increasing bulge length. The distance distributions for the AAA bulge duplex (3A-DNA) with six different Au-Au pairs provide strong evidence against a simple elastic model in which fluctuations occur about a single conformational state. Instead, the measured distance distributions suggest a 3A-DNA ensemble with multiple conformational states predominantly across a region of conformational space with bend angles between 24 and 85 degrees and characteristic bend directions and helical twists and displacements. Additional X-ray interferometry experiments revealed perturbations to the ensemble from changes in ionic conditions and the bulge sequence, effects that can be understood in terms of electrostatic and stacking contributions to the ensemble and that demonstrate the sensitivity of X-ray interferometry. Combining X-ray interferometry ensemble data with molecular dynamics simulations gave atomic-level models of representative conformational states and of the molecular interactions that may shape the ensemble, and fluorescence measurements with 2-aminopurine-substituted 3A-DNA provided initial tests of these atomistic models. More generally, X-ray interferometry will provide powerful benchmarks for testing and developing computational methods. PMID:24706812
Abuassba, Adnan O M; Zhang, Dezheng; Luo, Xiong; Shaheryar, Ahmad; Ali, Hazrat
2017-01-01
Extreme Learning Machine (ELM) is a fast-learning algorithm for a single-hidden layer feedforward neural network (SLFN). It often has good generalization performance. However, there are chances that it might overfit the training data due to having more hidden nodes than needed. To address the generalization performance, we use a heterogeneous ensemble approach. We propose an Advanced ELM Ensemble (AELME) for classification, which includes Regularized-ELM, L 2 -norm-optimized ELM (ELML2), and Kernel-ELM. The ensemble is constructed by training a randomly chosen ELM classifier on a subset of training data selected through random resampling. The proposed AELM-Ensemble is evolved by employing an objective function of increasing diversity and accuracy among the final ensemble. Finally, the class label of unseen data is predicted using majority vote approach. Splitting the training data into subsets and incorporation of heterogeneous ELM classifiers result in higher prediction accuracy, better generalization, and a lower number of base classifiers, as compared to other models (Adaboost, Bagging, Dynamic ELM ensemble, data splitting ELM ensemble, and ELM ensemble). The validity of AELME is confirmed through classification on several real-world benchmark datasets.
Abuassba, Adnan O. M.; Ali, Hazrat
2017-01-01
Extreme Learning Machine (ELM) is a fast-learning algorithm for a single-hidden layer feedforward neural network (SLFN). It often has good generalization performance. However, there are chances that it might overfit the training data due to having more hidden nodes than needed. To address the generalization performance, we use a heterogeneous ensemble approach. We propose an Advanced ELM Ensemble (AELME) for classification, which includes Regularized-ELM, L2-norm-optimized ELM (ELML2), and Kernel-ELM. The ensemble is constructed by training a randomly chosen ELM classifier on a subset of training data selected through random resampling. The proposed AELM-Ensemble is evolved by employing an objective function of increasing diversity and accuracy among the final ensemble. Finally, the class label of unseen data is predicted using majority vote approach. Splitting the training data into subsets and incorporation of heterogeneous ELM classifiers result in higher prediction accuracy, better generalization, and a lower number of base classifiers, as compared to other models (Adaboost, Bagging, Dynamic ELM ensemble, data splitting ELM ensemble, and ELM ensemble). The validity of AELME is confirmed through classification on several real-world benchmark datasets. PMID:28546808
Improving database enrichment through ensemble docking
NASA Astrophysics Data System (ADS)
Rao, Shashidhar; Sanschagrin, Paul C.; Greenwood, Jeremy R.; Repasky, Matthew P.; Sherman, Woody; Farid, Ramy
2008-09-01
While it may seem intuitive that using an ensemble of multiple conformations of a receptor in structure-based virtual screening experiments would necessarily yield improved enrichment of actives relative to using just a single receptor, it turns out that at least in the p38 MAP kinase model system studied here, a very large majority of all possible ensembles do not yield improved enrichment of actives. However, there are combinations of receptor structures that do lead to improved enrichment results. We present here a method to select the ensembles that produce the best enrichments that does not rely on knowledge of active compounds or sophisticated analyses of the 3D receptor structures. In the system studied here, the small fraction of ensembles of up to 3 receptors that do yield good enrichments of actives were identified by selecting ensembles that have the best mean GlideScore for the top 1% of the docked ligands in a database screen of actives and drug-like "decoy" ligands. Ensembles of two receptors identified using this mean GlideScore metric generally outperform single receptors, while ensembles of three receptors identified using this metric consistently give optimal enrichment factors in which, for example, 40% of the known actives outrank all the other ligands in the database.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wan, Hui; Rasch, Philip J.; Zhang, Kai
2014-09-08
This paper explores the feasibility of an experimentation strategy for investigating sensitivities in fast components of atmospheric general circulation models. The basic idea is to replace the traditional serial-in-time long-term climate integrations by representative ensembles of shorter simulations. The key advantage of the proposed method lies in its efficiency: since fewer days of simulation are needed, the computational cost is less, and because individual realizations are independent and can be integrated simultaneously, the new dimension of parallelism can dramatically reduce the turnaround time in benchmark tests, sensitivities studies, and model tuning exercises. The strategy is not appropriate for exploring sensitivitymore » of all model features, but it is very effective in many situations. Two examples are presented using the Community Atmosphere Model version 5. The first example demonstrates that the method is capable of characterizing the model cloud and precipitation sensitivity to time step length. A nudging technique is also applied to an additional set of simulations to help understand the contribution of physics-dynamics interaction to the detected time step sensitivity. In the second example, multiple empirical parameters related to cloud microphysics and aerosol lifecycle are perturbed simultaneously in order to explore which parameters have the largest impact on the simulated global mean top-of-atmosphere radiation balance. Results show that in both examples, short ensembles are able to correctly reproduce the main signals of model sensitivities revealed by traditional long-term climate simulations for fast processes in the climate system. The efficiency of the ensemble method makes it particularly useful for the development of high-resolution, costly and complex climate models.« less
NASA Astrophysics Data System (ADS)
Fillion, Anthony; Bocquet, Marc; Gratton, Serge
2018-04-01
The analysis in nonlinear variational data assimilation is the solution of a non-quadratic minimization. Thus, the analysis efficiency relies on its ability to locate a global minimum of the cost function. If this minimization uses a Gauss-Newton (GN) method, it is critical for the starting point to be in the attraction basin of a global minimum. Otherwise the method may converge to a local extremum, which degrades the analysis. With chaotic models, the number of local extrema often increases with the temporal extent of the data assimilation window, making the former condition harder to satisfy. This is unfortunate because the assimilation performance also increases with this temporal extent. However, a quasi-static (QS) minimization may overcome these local extrema. It accomplishes this by gradually injecting the observations in the cost function. This method was introduced by Pires et al. (1996) in a 4D-Var context. We generalize this approach to four-dimensional strong-constraint nonlinear ensemble variational (EnVar) methods, which are based on both a nonlinear variational analysis and the propagation of dynamical error statistics via an ensemble. This forces one to consider the cost function minimizations in the broader context of cycled data assimilation algorithms. We adapt this QS approach to the iterative ensemble Kalman smoother (IEnKS), an exemplar of nonlinear deterministic four-dimensional EnVar methods. Using low-order models, we quantify the positive impact of the QS approach on the IEnKS, especially for long data assimilation windows. We also examine the computational cost of QS implementations and suggest cheaper algorithms.
NASA Astrophysics Data System (ADS)
Rigosa, J.; Weber, D. J.; Prochazka, A.; Stein, R. B.; Micera, S.
2011-08-01
Functional electrical stimulation (FES) is used to improve motor function after injury to the central nervous system. Some FES systems use artificial sensors to switch between finite control states. To optimize FES control of the complex behavior of the musculo-skeletal system in activities of daily life, it is highly desirable to implement feedback control. In theory, sensory neural signals could provide the required control signals. Recent studies have demonstrated the feasibility of deriving limb-state estimates from the firing rates of primary afferent neurons recorded in dorsal root ganglia (DRG). These studies used multiple linear regression (MLR) methods to generate estimates of limb position and velocity based on a weighted sum of firing rates in an ensemble of simultaneously recorded DRG neurons. The aim of this study was to test whether the use of a neuro-fuzzy (NF) algorithm (the generalized dynamic fuzzy neural networks (GD-FNN)) could improve the performance, robustness and ability to generalize from training to test sets compared to the MLR technique. NF and MLR decoding methods were applied to ensemble DRG recordings obtained during passive and active limb movements in anesthetized and freely moving cats. The GD-FNN model provided more accurate estimates of limb state and generalized better to novel movement patterns. Future efforts will focus on implementing these neural recording and decoding methods in real time to provide closed-loop control of FES using the information extracted from sensory neurons.
Time-dependent generalized Gibbs ensembles in open quantum systems
NASA Astrophysics Data System (ADS)
Lange, Florian; Lenarčič, Zala; Rosch, Achim
2018-04-01
Generalized Gibbs ensembles have been used as powerful tools to describe the steady state of integrable many-particle quantum systems after a sudden change of the Hamiltonian. Here, we demonstrate numerically that they can be used for a much broader class of problems. We consider integrable systems in the presence of weak perturbations which break both integrability and drive the system to a state far from equilibrium. Under these conditions, we show that the steady state and the time evolution on long timescales can be accurately described by a (truncated) generalized Gibbs ensemble with time-dependent Lagrange parameters, determined from simple rate equations. We compare the numerically exact time evolutions of density matrices for small systems with a theory based on block-diagonal density matrices (diagonal ensemble) and a time-dependent generalized Gibbs ensemble containing only a small number of approximately conserved quantities, using the one-dimensional Heisenberg model with perturbations described by Lindblad operators as an example.
Using Support Vector Machine Ensembles for Target Audience Classification on Twitter
Lo, Siaw Ling; Chiong, Raymond; Cornforth, David
2015-01-01
The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space. PMID:25874768
Using support vector machine ensembles for target audience classification on Twitter.
Lo, Siaw Ling; Chiong, Raymond; Cornforth, David
2015-01-01
The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space.
Yu, Hualong; Ni, Jun
2014-01-01
Training classifiers on skewed data can be technically challenging tasks, especially if the data is high-dimensional simultaneously, the tasks can become more difficult. In biomedicine field, skewed data type often appears. In this study, we try to deal with this problem by combining asymmetric bagging ensemble classifier (asBagging) that has been presented in previous work and an improved random subspace (RS) generation strategy that is called feature subspace (FSS). Specifically, FSS is a novel method to promote the balance level between accuracy and diversity of base classifiers in asBagging. In view of the strong generalization capability of support vector machine (SVM), we adopt it to be base classifier. Extensive experiments on four benchmark biomedicine data sets indicate that the proposed ensemble learning method outperforms many baseline approaches in terms of Accuracy, F-measure, G-mean and AUC evaluation criterions, thus it can be regarded as an effective and efficient tool to deal with high-dimensional and imbalanced biomedical data.
Hyper-Parallel Tempering Monte Carlo Method and It's Applications
NASA Astrophysics Data System (ADS)
Yan, Qiliang; de Pablo, Juan
2000-03-01
A new generalized hyper-parallel tempering Monte Carlo molecular simulation method is presented for study of complex fluids. The method is particularly useful for simulation of many-molecule complex systems, where rough energy landscapes and inherently long characteristic relaxation times can pose formidable obstacles to effective sampling of relevant regions of configuration space. The method combines several key elements from expanded ensemble formalisms, parallel-tempering, open ensemble simulations, configurational bias techniques, and histogram reweighting analysis of results. It is found to accelerate significantly the diffusion of a complex system through phase-space. In this presentation, we demonstrate the effectiveness of the new method by implementing it in grand canonical ensembles for a Lennard-Jones fluid, for the restricted primitive model of electrolyte solutions (RPM), and for polymer solutions and blends. Our results indicate that the new algorithm is capable of overcoming the large free energy barriers associated with phase transitions, thereby greatly facilitating the simulation of coexistence properties. It is also shown that the method can be orders of magnitude more efficient than previously available techniques. More importantly, the method is relatively simple and can be incorporated into existing simulation codes with minor efforts.
Multidimensional generalized-ensemble algorithms for complex systems.
Mitsutake, Ayori; Okamoto, Yuko
2009-06-07
We give general formulations of the multidimensional multicanonical algorithm, simulated tempering, and replica-exchange method. We generalize the original potential energy function E(0) by adding any physical quantity V of interest as a new energy term. These multidimensional generalized-ensemble algorithms then perform a random walk not only in E(0) space but also in V space. Among the three algorithms, the replica-exchange method is the easiest to perform because the weight factor is just a product of regular Boltzmann-like factors, while the weight factors for the multicanonical algorithm and simulated tempering are not a priori known. We give a simple procedure for obtaining the weight factors for these two latter algorithms, which uses a short replica-exchange simulation and the multiple-histogram reweighting techniques. As an example of applications of these algorithms, we have performed a two-dimensional replica-exchange simulation and a two-dimensional simulated-tempering simulation using an alpha-helical peptide system. From these simulations, we study the helix-coil transitions of the peptide in gas phase and in aqueous solution.
Implementation of unsteady sampling procedures for the parallel direct simulation Monte Carlo method
NASA Astrophysics Data System (ADS)
Cave, H. M.; Tseng, K.-C.; Wu, J.-S.; Jermy, M. C.; Huang, J.-C.; Krumdieck, S. P.
2008-06-01
An unsteady sampling routine for a general parallel direct simulation Monte Carlo method called PDSC is introduced, allowing the simulation of time-dependent flow problems in the near continuum range. A post-processing procedure called DSMC rapid ensemble averaging method (DREAM) is developed to improve the statistical scatter in the results while minimising both memory and simulation time. This method builds an ensemble average of repeated runs over small number of sampling intervals prior to the sampling point of interest by restarting the flow using either a Maxwellian distribution based on macroscopic properties for near equilibrium flows (DREAM-I) or output instantaneous particle data obtained by the original unsteady sampling of PDSC for strongly non-equilibrium flows (DREAM-II). The method is validated by simulating shock tube flow and the development of simple Couette flow. Unsteady PDSC is found to accurately predict the flow field in both cases with significantly reduced run-times over single processor code and DREAM greatly reduces the statistical scatter in the results while maintaining accurate particle velocity distributions. Simulations are then conducted of two applications involving the interaction of shocks over wedges. The results of these simulations are compared to experimental data and simulations from the literature where there these are available. In general, it was found that 10 ensembled runs of DREAM processing could reduce the statistical uncertainty in the raw PDSC data by 2.5-3.3 times, based on the limited number of cases in the present study.
Generalized thermalization for integrable system under quantum quench.
Muralidharan, Sushruth; Lochan, Kinjalk; Shankaranarayanan, S
2018-01-01
We investigate equilibration and generalized thermalization of the quantum Harmonic chain under local quantum quench. The quench action we consider is connecting two disjoint harmonic chains of different sizes and the system jumps between two integrable settings. We verify the validity of the generalized Gibbs ensemble description for this infinite-dimensional Hilbert space system and also identify equilibration between the subsystems as in classical systems. Using Bogoliubov transformations, we show that the eigenstates of the system prior to the quench evolve toward the Gibbs Generalized Ensemble description. Eigenstates that are more delocalized (in the sense of inverse participation ratio) prior to the quench, tend to equilibrate more rapidly. Further, through the phase space properties of a generalized Gibbs ensemble and the strength of stimulated emission, we identify the necessary criterion on the initial states for such relaxation at late times and also find out the states that would potentially not be described by the generalized Gibbs ensemble description.
Shafizadeh-Moghadam, Hossein; Valavi, Roozbeh; Shahabi, Himan; Chapi, Kamran; Shirzadi, Ataollah
2018-07-01
In this research, eight individual machine learning and statistical models are implemented and compared, and based on their results, seven ensemble models for flood susceptibility assessment are introduced. The individual models included artificial neural networks, classification and regression trees, flexible discriminant analysis, generalized linear model, generalized additive model, boosted regression trees, multivariate adaptive regression splines, and maximum entropy, and the ensemble models were Ensemble Model committee averaging (EMca), Ensemble Model confidence interval Inferior (EMciInf), Ensemble Model confidence interval Superior (EMciSup), Ensemble Model to estimate the coefficient of variation (EMcv), Ensemble Model to estimate the mean (EMmean), Ensemble Model to estimate the median (EMmedian), and Ensemble Model based on weighted mean (EMwmean). The data set covered 201 flood events in the Haraz watershed (Mazandaran province in Iran) and 10,000 randomly selected non-occurrence points. Among the individual models, the Area Under the Receiver Operating Characteristic (AUROC), which showed the highest value, belonged to boosted regression trees (0.975) and the lowest value was recorded for generalized linear model (0.642). On the other hand, the proposed EMmedian resulted in the highest accuracy (0.976) among all models. In spite of the outstanding performance of some models, nevertheless, variability among the prediction of individual models was considerable. Therefore, to reduce uncertainty, creating more generalizable, more stable, and less sensitive models, ensemble forecasting approaches and in particular the EMmedian is recommended for flood susceptibility assessment. Copyright © 2018 Elsevier Ltd. All rights reserved.
Zhang, Weihong; Howell, Steven C; Wright, David W; Heindel, Andrew; Qiu, Xiangyun; Chen, Jianhan; Curtis, Joseph E
2017-05-01
We describe a general method to use Monte Carlo simulation followed by torsion-angle molecular dynamics simulations to create ensembles of structures to model a wide variety of soft-matter biological systems. Our particular emphasis is focused on modeling low-resolution small-angle scattering and reflectivity structural data. We provide examples of this method applied to HIV-1 Gag protein and derived fragment proteins, TraI protein, linear B-DNA, a nucleosome core particle, and a glycosylated monoclonal antibody. This procedure will enable a large community of researchers to model low-resolution experimental data with greater accuracy by using robust physics based simulation and sampling methods which are a significant improvement over traditional methods used to interpret such data. Published by Elsevier Inc.
NASA Astrophysics Data System (ADS)
Wei, Jiangfeng; Dirmeyer, Paul A.; Yang, Zong-Liang; Chen, Haishan
2017-10-01
Through a series of model simulations with an atmospheric general circulation model coupled to three different land surface models, this study investigates the impacts of land model ensembles and coupled model ensemble on precipitation simulation. It is found that coupling an ensemble of land models to an atmospheric model has a very minor impact on the improvement of precipitation climatology and variability, but a simple ensemble average of the precipitation from three individually coupled land-atmosphere models produces better results, especially for precipitation variability. The generally weak impact of land processes on precipitation should be the main reason that the land model ensembles do not improve precipitation simulation. However, if there are big biases in the land surface model or land surface data set, correcting them could improve the simulated climate, especially for well-constrained regional climate simulations.
Instanton approach to large N Harish-Chandra-Itzykson-Zuber integrals.
Bun, J; Bouchaud, J P; Majumdar, S N; Potters, M
2014-08-15
We reconsider the large N asymptotics of Harish-Chandra-Itzykson-Zuber integrals. We provide, using Dyson's Brownian motion and the method of instantons, an alternative, transparent derivation of the Matytsin formalism for the unitary case. Our method is easily generalized to the orthogonal and symplectic ensembles. We obtain an explicit solution of Matytsin's equations in the case of Wigner matrices, as well as a general expansion method in the dilute limit, when the spectrum of eigenvalues spreads over very wide regions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jie; Draxl, Caroline; Hopson, Thomas
Numerical weather prediction (NWP) models have been widely used for wind resource assessment. Model runs with higher spatial resolution are generally more accurate, yet extremely computational expensive. An alternative approach is to use data generated by a low resolution NWP model, in conjunction with statistical methods. In order to analyze the accuracy and computational efficiency of different types of NWP-based wind resource assessment methods, this paper performs a comparison of three deterministic and probabilistic NWP-based wind resource assessment methodologies: (i) a coarse resolution (0.5 degrees x 0.67 degrees) global reanalysis data set, the Modern-Era Retrospective Analysis for Research and Applicationsmore » (MERRA); (ii) an analog ensemble methodology based on the MERRA, which provides both deterministic and probabilistic predictions; and (iii) a fine resolution (2-km) NWP data set, the Wind Integration National Dataset (WIND) Toolkit, based on the Weather Research and Forecasting model. Results show that: (i) as expected, the analog ensemble and WIND Toolkit perform significantly better than MERRA confirming their ability to downscale coarse estimates; (ii) the analog ensemble provides the best estimate of the multi-year wind distribution at seven of the nine sites, while the WIND Toolkit is the best at one site; (iii) the WIND Toolkit is more accurate in estimating the distribution of hourly wind speed differences, which characterizes the wind variability, at five of the available sites, with the analog ensemble being best at the remaining four locations; and (iv) the analog ensemble computational cost is negligible, whereas the WIND Toolkit requires large computational resources. Future efforts could focus on the combination of the analog ensemble with intermediate resolution (e.g., 10-15 km) NWP estimates, to considerably reduce the computational burden, while providing accurate deterministic estimates and reliable probabilistic assessments.« less
Mori, Takaharu; Jung, Jaewoon; Sugita, Yuji
2013-12-10
Conformational sampling is fundamentally important for simulating complex biomolecular systems. The generalized-ensemble algorithm, especially the temperature replica-exchange molecular dynamics method (T-REMD), is one of the most powerful methods to explore structures of biomolecules such as proteins, nucleic acids, carbohydrates, and also of lipid membranes. T-REMD simulations have focused on soluble proteins rather than membrane proteins or lipid bilayers, because explicit membranes do not keep their structural integrity at high temperature. Here, we propose a new generalized-ensemble algorithm for membrane systems, which we call the surface-tension REMD method. Each replica is simulated in the NPγT ensemble, and surface tensions in a pair of replicas are exchanged at certain intervals to enhance conformational sampling of the target membrane system. We test the method on two biological membrane systems: a fully hydrated DPPC (1,2-dipalmitoyl-sn-glycero-3-phosphatidylcholine) lipid bilayer and a WALP23-POPC (1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine) membrane system. During these simulations, a random walk in surface tension space is realized. Large-scale lateral deformation (shrinking and stretching) of the membranes takes place in all of the replicas without collapse of the lipid bilayer structure. There is accelerated lateral diffusion of DPPC lipid molecules compared with conventional MD simulation, and a much wider range of tilt angle of the WALP23 peptide is sampled due to large deformation of the POPC lipid bilayer and through peptide-lipid interactions. Our method could be applicable to a wide variety of biological membrane systems.
A brief history of the introduction of generalized ensembles to Markov chain Monte Carlo simulations
NASA Astrophysics Data System (ADS)
Berg, Bernd A.
2017-03-01
The most efficient weights for Markov chain Monte Carlo calculations of physical observables are not necessarily those of the canonical ensemble. Generalized ensembles, which do not exist in nature but can be simulated on computers, lead often to a much faster convergence. In particular, they have been used for simulations of first order phase transitions and for simulations of complex systems in which conflicting constraints lead to a rugged free energy landscape. Starting off with the Metropolis algorithm and Hastings' extension, I present a minireview which focuses on the explosive use of generalized ensembles in the early 1990s. Illustrations are given, which range from spin models to peptides.
Ensemble training to improve recognition using 2D ear
NASA Astrophysics Data System (ADS)
Middendorff, Christopher; Bowyer, Kevin W.
2009-05-01
The ear has gained popularity as a biometric feature due to the robustness of the shape over time and across emotional expression. Popular methods of ear biometrics analyze the ear as a whole, leaving these methods vulnerable to error due to occlusion. Many researchers explore ear recognition using an ensemble, but none present a method for designing the individual parts that comprise the ensemble. In this work, we introduce a method of modifying the ensemble shapes to improve performance. We determine how different properties of an ensemble training system can affect overall performance. We show that ensembles built from small parts will outperform ensembles built with larger parts, and that incorporating a large number of parts improves the performance of the ensemble.
NASA Technical Reports Server (NTRS)
Ancheta, T. C., Jr.
1976-01-01
A method of using error-correcting codes to obtain data compression, called syndrome-source-coding, is described in which the source sequence is treated as an error pattern whose syndrome forms the compressed data. It is shown that syndrome-source-coding can achieve arbitrarily small distortion with the number of compressed digits per source digit arbitrarily close to the entropy of a binary memoryless source. A 'universal' generalization of syndrome-source-coding is formulated which provides robustly effective distortionless coding of source ensembles. Two examples are given, comparing the performance of noiseless universal syndrome-source-coding to (1) run-length coding and (2) Lynch-Davisson-Schalkwijk-Cover universal coding for an ensemble of binary memoryless sources.
Total probabilities of ensemble runoff forecasts
NASA Astrophysics Data System (ADS)
Olav Skøien, Jon; Bogner, Konrad; Salamon, Peter; Smith, Paul; Pappenberger, Florian
2017-04-01
Ensemble forecasting has a long history from meteorological modelling, as an indication of the uncertainty of the forecasts. However, it is necessary to calibrate and post-process the ensembles as the they often exhibit both bias and dispersion errors. Two of the most common methods for this are Bayesian Model Averaging (Raftery et al., 2005) and Ensemble Model Output Statistics (EMOS) (Gneiting et al., 2005). There are also methods for regionalizing these methods (Berrocal et al., 2007) and for incorporating the correlation between lead times (Hemri et al., 2013). Engeland and Steinsland Engeland and Steinsland (2014) developed a framework which can estimate post-processing parameters varying in space and time, while giving a spatially and temporally consistent output. However, their method is computationally complex for our larger number of stations, which makes it unsuitable for our purpose. Our post-processing method of the ensembles is developed in the framework of the European Flood Awareness System (EFAS - http://www.efas.eu), where we are making forecasts for whole Europe, and based on observations from around 700 catchments. As the target is flood forecasting, we are also more interested in improving the forecast skill for high-flows rather than in a good prediction of the entire flow regime. EFAS uses a combination of ensemble forecasts and deterministic forecasts from different meteorological forecasters to force a distributed hydrologic model and to compute runoff ensembles for each river pixel within the model domain. Instead of showing the mean and the variability of each forecast ensemble individually, we will now post-process all model outputs to estimate the total probability, the post-processed mean and uncertainty of all ensembles. The post-processing parameters are first calibrated for each calibration location, but we are adding a spatial penalty in the calibration process to force a spatial correlation of the parameters. The penalty takes distance, stream-connectivity and size of the catchment areas into account. This can in some cases have a slight negative impact on the calibration error, but avoids large differences between parameters of nearby locations, whether stream connected or not. The spatial calibration also makes it easier to interpolate the post-processing parameters to uncalibrated locations. We also look into different methods for handling the non-normal distributions of runoff data and the effect of different data transformations on forecasts skills in general and for floods in particular. Berrocal, V. J., Raftery, A. E. and Gneiting, T.: Combining Spatial Statistical and Ensemble Information in Probabilistic Weather Forecasts, Mon. Weather Rev., 135(4), 1386-1402, doi:10.1175/MWR3341.1, 2007. Engeland, K. and Steinsland, I.: Probabilistic postprocessing models for flow forecasts for a system of catchments and several lead times, Water Resour. Res., 50(1), 182-197, doi:10.1002/2012WR012757, 2014. Gneiting, T., Raftery, A. E., Westveld, A. H. and Goldman, T.: Calibrated Probabilistic Forecasting Using Ensemble Model Output Statistics and Minimum CRPS Estimation, Mon. Weather Rev., 133(5), 1098-1118, doi:10.1175/MWR2904.1, 2005. Hemri, S., Fundel, F. and Zappa, M.: Simultaneous calibration of ensemble river flow predictions over an entire range of lead times, Water Resour. Res., 49(10), 6744-6755, doi:10.1002/wrcr.20542, 2013. Raftery, A. E., Gneiting, T., Balabdaoui, F. and Polakowski, M.: Using Bayesian Model Averaging to Calibrate Forecast Ensembles, Mon. Weather Rev., 133(5), 1155-1174, doi:10.1175/MWR2906.1, 2005.
NASA Astrophysics Data System (ADS)
Lopez, Ana; Fung, Fai; New, Mark; Watts, Glenn; Weston, Alan; Wilby, Robert L.
2009-08-01
The majority of climate change impacts and adaptation studies so far have been based on at most a few deterministic realizations of future climate, usually representing different emissions scenarios. Large ensembles of climate models are increasingly available either as ensembles of opportunity or perturbed physics ensembles, providing a wealth of additional data that is potentially useful for improving adaptation strategies to climate change. Because of the novelty of this ensemble information, there is little previous experience of practical applications or of the added value of this information for impacts and adaptation decision making. This paper evaluates the value of perturbed physics ensembles of climate models for understanding and planning public water supply under climate change. We deliberately select water resource models that are already used by water supply companies and regulators on the assumption that uptake of information from large ensembles of climate models will be more likely if it does not involve significant investment in new modeling tools and methods. We illustrate the methods with a case study on the Wimbleball water resource zone in the southwest of England. This zone is sufficiently simple to demonstrate the utility of the approach but with enough complexity to allow a variety of different decisions to be made. Our research shows that the additional information contained in the climate model ensemble provides a better understanding of the possible ranges of future conditions, compared to the use of single-model scenarios. Furthermore, with careful presentation, decision makers will find the results from large ensembles of models more accessible and be able to more easily compare the merits of different management options and the timing of different adaptation. The overhead in additional time and expertise for carrying out the impacts analysis will be justified by the increased quality of the decision-making process. We remark that even though we have focused our study on a water resource system in the United Kingdom, our conclusions about the added value of climate model ensembles in guiding adaptation decisions can be generalized to other sectors and geographical regions.
A new transform for the analysis of complex fractionated atrial electrograms
2011-01-01
Background Representation of independent biophysical sources using Fourier analysis can be inefficient because the basis is sinusoidal and general. When complex fractionated atrial electrograms (CFAE) are acquired during atrial fibrillation (AF), the electrogram morphology depends on the mix of distinct nonsinusoidal generators. Identification of these generators using efficient methods of representation and comparison would be useful for targeting catheter ablation sites to prevent arrhythmia reinduction. Method A data-driven basis and transform is described which utilizes the ensemble average of signal segments to identify and distinguish CFAE morphologic components and frequencies. Calculation of the dominant frequency (DF) of actual CFAE, and identification of simulated independent generator frequencies and morphologies embedded in CFAE, is done using a total of 216 recordings from 10 paroxysmal and 10 persistent AF patients. The transform is tested versus Fourier analysis to detect spectral components in the presence of phase noise and interference. Correspondence is shown between ensemble basis vectors of highest power and corresponding synthetic drivers embedded in CFAE. Results The ensemble basis is orthogonal, and efficient for representation of CFAE components as compared with Fourier analysis (p ≤ 0.002). When three synthetic drivers with additive phase noise and interference were decomposed, the top three peaks in the ensemble power spectrum corresponded to the driver frequencies more closely as compared with top Fourier power spectrum peaks (p ≤ 0.005). The synthesized drivers with phase noise and interference were extractable from their corresponding ensemble basis with a mean error of less than 10%. Conclusions The new transform is able to efficiently identify CFAE features using DF calculation and by discerning morphologic differences. Unlike the Fourier transform method, it does not distort CFAE signals prior to analysis, and is relatively robust to jitter in periodic events. Thus the ensemble method can provide a useful alternative for quantitative characterization of CFAE during clinical study. PMID:21569421
An information-theoretical perspective on weighted ensemble forecasts
NASA Astrophysics Data System (ADS)
Weijs, Steven V.; van de Giesen, Nick
2013-08-01
This paper presents an information-theoretical method for weighting ensemble forecasts with new information. Weighted ensemble forecasts can be used to adjust the distribution that an existing ensemble of time series represents, without modifying the values in the ensemble itself. The weighting can, for example, add new seasonal forecast information in an existing ensemble of historically measured time series that represents climatic uncertainty. A recent article in this journal compared several methods to determine the weights for the ensemble members and introduced the pdf-ratio method. In this article, a new method, the minimum relative entropy update (MRE-update), is presented. Based on the principle of minimum discrimination information, an extension of the principle of maximum entropy (POME), the method ensures that no more information is added to the ensemble than is present in the forecast. This is achieved by minimizing relative entropy, with the forecast information imposed as constraints. From this same perspective, an information-theoretical view on the various weighting methods is presented. The MRE-update is compared with the existing methods and the parallels with the pdf-ratio method are analysed. The paper provides a new, information-theoretical justification for one version of the pdf-ratio method that turns out to be equivalent to the MRE-update. All other methods result in sets of ensemble weights that, seen from the information-theoretical perspective, add either too little or too much (i.e. fictitious) information to the ensemble.
A new method for determining the optimal lagged ensemble
DelSole, T.; Tippett, M. K.; Pegion, K.
2017-01-01
Abstract We propose a general methodology for determining the lagged ensemble that minimizes the mean square forecast error. The MSE of a lagged ensemble is shown to depend only on a quantity called the cross‐lead error covariance matrix, which can be estimated from a short hindcast data set and parameterized in terms of analytic functions of time. The resulting parameterization allows the skill of forecasts to be evaluated for an arbitrary ensemble size and initialization frequency. Remarkably, the parameterization also can estimate the MSE of a burst ensemble simply by taking the limit of an infinitely small interval between initialization times. This methodology is applied to forecasts of the Madden Julian Oscillation (MJO) from version 2 of the Climate Forecast System version 2 (CFSv2). For leads greater than a week, little improvement is found in the MJO forecast skill when ensembles larger than 5 days are used or initializations greater than 4 times per day. We find that if the initialization frequency is too infrequent, important structures of the lagged error covariance matrix are lost. Lastly, we demonstrate that the forecast error at leads ≥10 days can be reduced by optimally weighting the lagged ensemble members. The weights are shown to depend only on the cross‐lead error covariance matrix. While the methodology developed here is applied to CFSv2, the technique can be easily adapted to other forecast systems. PMID:28580050
NASA Astrophysics Data System (ADS)
Murray, S.; Guerra, J. A.
2017-12-01
One essential component of operational space weather forecasting is the prediction of solar flares. Early flare forecasting work focused on statistical methods based on historical flaring rates, but more complex machine learning methods have been developed in recent years. A multitude of flare forecasting methods are now available, however it is still unclear which of these methods performs best, and none are substantially better than climatological forecasts. Current operational space weather centres cannot rely on automated methods, and generally use statistical forecasts with a little human intervention. Space weather researchers are increasingly looking towards methods used in terrestrial weather to improve current forecasting techniques. Ensemble forecasting has been used in numerical weather prediction for many years as a way to combine different predictions in order to obtain a more accurate result. It has proved useful in areas such as magnetospheric modelling and coronal mass ejection arrival analysis, however has not yet been implemented in operational flare forecasting. Here we construct ensemble forecasts for major solar flares by linearly combining the full-disk probabilistic forecasts from a group of operational forecasting methods (ASSA, ASAP, MAG4, MOSWOC, NOAA, and Solar Monitor). Forecasts from each method are weighted by a factor that accounts for the method's ability to predict previous events, and several performance metrics (both probabilistic and categorical) are considered. The results provide space weather forecasters with a set of parameters (combination weights, thresholds) that allow them to select the most appropriate values for constructing the 'best' ensemble forecast probability value, according to the performance metric of their choice. In this way different forecasts can be made to fit different end-user needs.
Ensemble Methods for MiRNA Target Prediction from Expression Data.
Le, Thuc Duy; Zhang, Junpeng; Liu, Lin; Li, Jiuyong
2015-01-01
microRNAs (miRNAs) are short regulatory RNAs that are involved in several diseases, including cancers. Identifying miRNA functions is very important in understanding disease mechanisms and determining the efficacy of drugs. An increasing number of computational methods have been developed to explore miRNA functions by inferring the miRNA-mRNA regulatory relationships from data. Each of the methods is developed based on some assumptions and constraints, for instance, assuming linear relationships between variables. For such reasons, computational methods are often subject to the problem of inconsistent performance across different datasets. On the other hand, ensemble methods integrate the results from individual methods and have been proved to outperform each of their individual component methods in theory. In this paper, we investigate the performance of some ensemble methods over the commonly used miRNA target prediction methods. We apply eight different popular miRNA target prediction methods to three cancer datasets, and compare their performance with the ensemble methods which integrate the results from each combination of the individual methods. The validation results using experimentally confirmed databases show that the results of the ensemble methods complement those obtained by the individual methods and the ensemble methods perform better than the individual methods across different datasets. The ensemble method, Pearson+IDA+Lasso, which combines methods in different approaches, including a correlation method, a causal inference method, and a regression method, is the best performed ensemble method in this study. Further analysis of the results of this ensemble method shows that the ensemble method can obtain more targets which could not be found by any of the single methods, and the discovered targets are more statistically significant and functionally enriched. The source codes, datasets, miRNA target predictions by all methods, and the ground truth for validation are available in the Supplementary materials.
Ensemble Methods for MiRNA Target Prediction from Expression Data
Le, Thuc Duy; Zhang, Junpeng; Liu, Lin; Li, Jiuyong
2015-01-01
Background microRNAs (miRNAs) are short regulatory RNAs that are involved in several diseases, including cancers. Identifying miRNA functions is very important in understanding disease mechanisms and determining the efficacy of drugs. An increasing number of computational methods have been developed to explore miRNA functions by inferring the miRNA-mRNA regulatory relationships from data. Each of the methods is developed based on some assumptions and constraints, for instance, assuming linear relationships between variables. For such reasons, computational methods are often subject to the problem of inconsistent performance across different datasets. On the other hand, ensemble methods integrate the results from individual methods and have been proved to outperform each of their individual component methods in theory. Results In this paper, we investigate the performance of some ensemble methods over the commonly used miRNA target prediction methods. We apply eight different popular miRNA target prediction methods to three cancer datasets, and compare their performance with the ensemble methods which integrate the results from each combination of the individual methods. The validation results using experimentally confirmed databases show that the results of the ensemble methods complement those obtained by the individual methods and the ensemble methods perform better than the individual methods across different datasets. The ensemble method, Pearson+IDA+Lasso, which combines methods in different approaches, including a correlation method, a causal inference method, and a regression method, is the best performed ensemble method in this study. Further analysis of the results of this ensemble method shows that the ensemble method can obtain more targets which could not be found by any of the single methods, and the discovered targets are more statistically significant and functionally enriched. The source codes, datasets, miRNA target predictions by all methods, and the ground truth for validation are available in the Supplementary materials. PMID:26114448
NASA Astrophysics Data System (ADS)
Bianconi, Ginestra
2009-03-01
In this paper we generalize the concept of random networks to describe network ensembles with nontrivial features by a statistical mechanics approach. This framework is able to describe undirected and directed network ensembles as well as weighted network ensembles. These networks might have nontrivial community structure or, in the case of networks embedded in a given space, they might have a link probability with a nontrivial dependence on the distance between the nodes. These ensembles are characterized by their entropy, which evaluates the cardinality of networks in the ensemble. In particular, in this paper we define and evaluate the structural entropy, i.e., the entropy of the ensembles of undirected uncorrelated simple networks with given degree sequence. We stress the apparent paradox that scale-free degree distributions are characterized by having small structural entropy while they are so widely encountered in natural, social, and technological complex systems. We propose a solution to the paradox by proving that scale-free degree distributions are the most likely degree distribution with the corresponding value of the structural entropy. Finally, the general framework we present in this paper is able to describe microcanonical ensembles of networks as well as canonical or hidden-variable network ensembles with significant implications for the formulation of network-constructing algorithms.
The development of ensemble theory. A new glimpse at the history of statistical mechanics
NASA Astrophysics Data System (ADS)
Inaba, Hajime
2015-12-01
This paper investigates the history of statistical mechanics from the viewpoint of the development of the ensemble theory from 1871 to 1902. In 1871, Ludwig Boltzmann introduced a prototype model of an ensemble that represents a polyatomic gas. In 1879, James Clerk Maxwell defined an ensemble as copies of systems of the same energy. Inspired by H.W. Watson, he called his approach "statistical". Boltzmann and Maxwell regarded the ensemble theory as a much more general approach than the kinetic theory. In the 1880s, influenced by Hermann von Helmholtz, Boltzmann made use of ensembles to establish thermodynamic relations. In Elementary Principles in Statistical Mechanics of 1902, Josiah Willard Gibbs tried to get his ensemble theory to mirror thermodynamics, including thermodynamic operations in its scope. Thermodynamics played the role of a "blind guide". His theory of ensembles can be characterized as more mathematically oriented than Einstein's theory proposed in the same year. Mechanical, empirical, and statistical approaches to foundations of statistical mechanics are presented. Although it was formulated in classical terms, the ensemble theory provided an infrastructure still valuable in quantum statistics because of its generality.
NASA Astrophysics Data System (ADS)
Newman, A. J.; Clark, M. P.; Nijssen, B.; Wood, A.; Gutmann, E. D.; Mizukami, N.; Longman, R. J.; Giambelluca, T. W.; Cherry, J.; Nowak, K.; Arnold, J.; Prein, A. F.
2016-12-01
Gridded precipitation and temperature products are inherently uncertain due to myriad factors. These include interpolation from a sparse observation network, measurement representativeness, and measurement errors. Despite this inherent uncertainty, uncertainty is typically not included, or is a specific addition to each dataset without much general applicability across different datasets. A lack of quantitative uncertainty estimates for hydrometeorological forcing fields limits their utility to support land surface and hydrologic modeling techniques such as data assimilation, probabilistic forecasting and verification. To address this gap, we have developed a first of its kind gridded, observation-based ensemble of precipitation and temperature at a daily increment for the period 1980-2012 over the United States (including Alaska and Hawaii). A longer, higher resolution version (1970-present, 1/16th degree) has also been implemented to support real-time hydrologic- monitoring and prediction in several regional US domains. We will present the development and evaluation of the dataset, along with initial applications of the dataset for ensemble data assimilation and probabilistic evaluation of high resolution regional climate model simulations. We will also present results on the new high resolution products for Alaska and Hawaii (2 km and 250 m respectively), to complete the first ensemble observation based product suite for the entire 50 states. Finally, we will present plans to improve the ensemble dataset, focusing on efforts to improve the methods used for station interpolation and ensemble generation, as well as methods to fuse station data with numerical weather prediction model output.
NASA Astrophysics Data System (ADS)
Hollenberg, Sebastian; Päs, Heinrich
2012-01-01
The standard wave function approach for the treatment of neutrino oscillations fails in situations where quantum ensembles at a finite temperature with or without an interacting background plasma are encountered. As a first step to treat such phenomena in a novel way, we propose a unified approach to both adiabatic and nonadiabatic two-flavor oscillations in neutrino ensembles with finite temperature and generic (e.g., matter) potentials. Neglecting effects of ensemble decoherence for now, we study the evolution of a neutrino ensemble governed by the associated quantum kinetic equations, which apply to systems with finite temperature. The quantum kinetic equations are solved formally using the Magnus expansion and it is shown that a convenient choice of the quantum mechanical picture (e.g., the interaction picture) reveals suitable parameters to characterize the physics of the underlying system (e.g., an effective oscillation length). It is understood that this method also provides a promising starting point for the treatment of the more general case in which decoherence is taken into account.
Mazurowski, Maciej A; Zurada, Jacek M; Tourassi, Georgia D
2009-07-01
Ensemble classifiers have been shown efficient in multiple applications. In this article, the authors explore the effectiveness of ensemble classifiers in a case-based computer-aided diagnosis system for detection of masses in mammograms. They evaluate two general ways of constructing subclassifiers by resampling of the available development dataset: Random division and random selection. Furthermore, they discuss the problem of selecting the ensemble size and propose two adaptive incremental techniques that automatically select the size for the problem at hand. All the techniques are evaluated with respect to a previously proposed information-theoretic CAD system (IT-CAD). The experimental results show that the examined ensemble techniques provide a statistically significant improvement (AUC = 0.905 +/- 0.024) in performance as compared to the original IT-CAD system (AUC = 0.865 +/- 0.029). Some of the techniques allow for a notable reduction in the total number of examples stored in the case base (to 1.3% of the original size), which, in turn, results in lower storage requirements and a shorter response time of the system. Among the methods examined in this article, the two proposed adaptive techniques are by far the most effective for this purpose. Furthermore, the authors provide some discussion and guidance for choosing the ensemble parameters.
Cerruela García, G; García-Pedrajas, N; Luque Ruiz, I; Gómez-Nieto, M Á
2018-03-01
This paper proposes a method for molecular activity prediction in QSAR studies using ensembles of classifiers constructed by means of two supervised subspace projection methods, namely nonparametric discriminant analysis (NDA) and hybrid discriminant analysis (HDA). We studied the performance of the proposed ensembles compared to classical ensemble methods using four molecular datasets and eight different models for the representation of the molecular structure. Using several measures and statistical tests for classifier comparison, we observe that our proposal improves the classification results with respect to classical ensemble methods. Therefore, we show that ensembles constructed using supervised subspace projections offer an effective way of creating classifiers in cheminformatics.
Exploring and Listening to Chinese Classical Ensembles in General Music
ERIC Educational Resources Information Center
Zhang, Wenzhuo
2017-01-01
Music diversity is valued in theory, but the extent to which it is efficiently presented in music class remains limited. Within this article, I aim to bridge this gap by introducing four genres of Chinese classical ensembles--Qin and Xiao duets, Jiang Nan bamboo and silk ensembles, Cantonese ensembles, and contemporary Chinese orchestras--into the…
Total probabilities of ensemble runoff forecasts
NASA Astrophysics Data System (ADS)
Olav Skøien, Jon; Bogner, Konrad; Salamon, Peter; Smith, Paul; Pappenberger, Florian
2016-04-01
Ensemble forecasting has for a long time been used as a method in meteorological modelling to indicate the uncertainty of the forecasts. However, as the ensembles often exhibit both bias and dispersion errors, it is necessary to calibrate and post-process them. Two of the most common methods for this are Bayesian Model Averaging (Raftery et al., 2005) and Ensemble Model Output Statistics (EMOS) (Gneiting et al., 2005). There are also methods for regionalizing these methods (Berrocal et al., 2007) and for incorporating the correlation between lead times (Hemri et al., 2013). Engeland and Steinsland Engeland and Steinsland (2014) developed a framework which can estimate post-processing parameters which are different in space and time, but still can give a spatially and temporally consistent output. However, their method is computationally complex for our larger number of stations, and cannot directly be regionalized in the way we would like, so we suggest a different path below. The target of our work is to create a mean forecast with uncertainty bounds for a large number of locations in the framework of the European Flood Awareness System (EFAS - http://www.efas.eu) We are therefore more interested in improving the forecast skill for high-flows rather than the forecast skill of lower runoff levels. EFAS uses a combination of ensemble forecasts and deterministic forecasts from different forecasters to force a distributed hydrologic model and to compute runoff ensembles for each river pixel within the model domain. Instead of showing the mean and the variability of each forecast ensemble individually, we will now post-process all model outputs to find a total probability, the post-processed mean and uncertainty of all ensembles. The post-processing parameters are first calibrated for each calibration location, but assuring that they have some spatial correlation, by adding a spatial penalty in the calibration process. This can in some cases have a slight negative impact on the calibration error, but makes it easier to interpolate the post-processing parameters to uncalibrated locations. We also look into different methods for handling the non-normal distributions of runoff data and the effect of different data transformations on forecasts skills in general and for floods in particular. Berrocal, V. J., Raftery, A. E. and Gneiting, T.: Combining Spatial Statistical and Ensemble Information in Probabilistic Weather Forecasts, Mon. Weather Rev., 135(4), 1386-1402, doi:10.1175/MWR3341.1, 2007. Engeland, K. and Steinsland, I.: Probabilistic postprocessing models for flow forecasts for a system of catchments and several lead times, Water Resour. Res., 50(1), 182-197, doi:10.1002/2012WR012757, 2014. Gneiting, T., Raftery, A. E., Westveld, A. H. and Goldman, T.: Calibrated Probabilistic Forecasting Using Ensemble Model Output Statistics and Minimum CRPS Estimation, Mon. Weather Rev., 133(5), 1098-1118, doi:10.1175/MWR2904.1, 2005. Hemri, S., Fundel, F. and Zappa, M.: Simultaneous calibration of ensemble river flow predictions over an entire range of lead times, Water Resour. Res., 49(10), 6744-6755, doi:10.1002/wrcr.20542, 2013. Raftery, A. E., Gneiting, T., Balabdaoui, F. and Polakowski, M.: Using Bayesian Model Averaging to Calibrate Forecast Ensembles, Mon. Weather Rev., 133(5), 1155-1174, doi:10.1175/MWR2906.1, 2005.
Caetano dos Santos, Florentino Luciano; Skottman, Heli; Juuti-Uusitalo, Kati; Hyttinen, Jari
2016-01-01
Aims A fast, non-invasive and observer-independent method to analyze the homogeneity and maturity of human pluripotent stem cell (hPSC) derived retinal pigment epithelial (RPE) cells is warranted to assess the suitability of hPSC-RPE cells for implantation or in vitro use. The aim of this work was to develop and validate methods to create ensembles of state-of-the-art texture descriptors and to provide a robust classification tool to separate three different maturation stages of RPE cells by using phase contrast microscopy images. The same methods were also validated on a wide variety of biological image classification problems, such as histological or virus image classification. Methods For image classification we used different texture descriptors, descriptor ensembles and preprocessing techniques. Also, three new methods were tested. The first approach was an ensemble of preprocessing methods, to create an additional set of images. The second was the region-based approach, where saliency detection and wavelet decomposition divide each image in two different regions, from which features were extracted through different descriptors. The third method was an ensemble of Binarized Statistical Image Features, based on different sizes and thresholds. A Support Vector Machine (SVM) was trained for each descriptor histogram and the set of SVMs combined by sum rule. The accuracy of the computer vision tool was verified in classifying the hPSC-RPE cell maturation level. Dataset and Results The RPE dataset contains 1862 subwindows from 195 phase contrast images. The final descriptor ensemble outperformed the most recent stand-alone texture descriptors, obtaining, for the RPE dataset, an area under ROC curve (AUC) of 86.49% with the 10-fold cross validation and 91.98% with the leave-one-image-out protocol. The generality of the three proposed approaches was ascertained with 10 more biological image datasets, obtaining an average AUC greater than 97%. Conclusions Here we showed that the developed ensembles of texture descriptors are able to classify the RPE cell maturation stage. Moreover, we proved that preprocessing and region-based decomposition improves many descriptors’ accuracy in biological dataset classification. Finally, we built the first public dataset of stem cell-derived RPE cells, which is publicly available to the scientific community for classification studies. The proposed tool is available at https://www.dei.unipd.it/node/2357 and the RPE dataset at http://www.biomeditech.fi/data/RPE_dataset/. Both are available at https://figshare.com/s/d6fb591f1beb4f8efa6f. PMID:26895509
NASA Astrophysics Data System (ADS)
Ishizaki, N. N.; Dairaku, K.; Ueno, G.
2016-12-01
We have developed a statistical downscaling method for estimating probabilistic climate projection using CMIP5 multi general circulation models (GCMs). A regression model was established so that the combination of weights of GCMs reflects the characteristics of the variation of observations at each grid point. Cross validations were conducted to select GCMs and to evaluate the regression model to avoid multicollinearity. By using spatially high resolution observation system, we conducted statistically downscaled probabilistic climate projections with 20-km horizontal grid spacing. Root mean squared errors for monthly mean air surface temperature and precipitation estimated by the regression method were the smallest compared with the results derived from a simple ensemble mean of GCMs and a cumulative distribution function based bias correction method. Projected changes in the mean temperature and precipitation were basically similar to those of the simple ensemble mean of GCMs. Mean precipitation was generally projected to increase associated with increased temperature and consequent increased moisture content in the air. Weakening of the winter monsoon may affect precipitation decrease in some areas. Temperature increase in excess of 4 K was expected in most areas of Japan in the end of 21st century under RCP8.5 scenario. The estimated probability of monthly precipitation exceeding 300 mm would increase around the Pacific side during the summer and the Japan Sea side during the winter season. This probabilistic climate projection based on the statistical method can be expected to bring useful information to the impact studies and risk assessments.
NASA Astrophysics Data System (ADS)
Baker, Allison H.; Hu, Yong; Hammerling, Dorit M.; Tseng, Yu-heng; Xu, Haiying; Huang, Xiaomeng; Bryan, Frank O.; Yang, Guangwen
2016-07-01
The Parallel Ocean Program (POP), the ocean model component of the Community Earth System Model (CESM), is widely used in climate research. Most current work in CESM-POP focuses on improving the model's efficiency or accuracy, such as improving numerical methods, advancing parameterization, porting to new architectures, or increasing parallelism. Since ocean dynamics are chaotic in nature, achieving bit-for-bit (BFB) identical results in ocean solutions cannot be guaranteed for even tiny code modifications, and determining whether modifications are admissible (i.e., statistically consistent with the original results) is non-trivial. In recent work, an ensemble-based statistical approach was shown to work well for software verification (i.e., quality assurance) on atmospheric model data. The general idea of the ensemble-based statistical consistency testing is to use a qualitative measurement of the variability of the ensemble of simulations as a metric with which to compare future simulations and make a determination of statistical distinguishability. The capability to determine consistency without BFB results boosts model confidence and provides the flexibility needed, for example, for more aggressive code optimizations and the use of heterogeneous execution environments. Since ocean and atmosphere models have differing characteristics in term of dynamics, spatial variability, and timescales, we present a new statistical method to evaluate ocean model simulation data that requires the evaluation of ensemble means and deviations in a spatial manner. In particular, the statistical distribution from an ensemble of CESM-POP simulations is used to determine the standard score of any new model solution at each grid point. Then the percentage of points that have scores greater than a specified threshold indicates whether the new model simulation is statistically distinguishable from the ensemble simulations. Both ensemble size and composition are important. Our experiments indicate that the new POP ensemble consistency test (POP-ECT) tool is capable of distinguishing cases that should be statistically consistent with the ensemble and those that should not, as well as providing a simple, subjective and systematic way to detect errors in CESM-POP due to the hardware or software stack, positively contributing to quality assurance for the CESM-POP code.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pribram-Jones, Aurora; Grabowski, Paul E.; Burke, Kieron
We present that the van Leeuwen proof of linear-response time-dependent density functional theory (TDDFT) is generalized to thermal ensembles. This allows generalization to finite temperatures of the Gross-Kohn relation, the exchange-correlation kernel of TDDFT, and fluctuation dissipation theorem for DFT. Finally, this produces a natural method for generating new thermal exchange-correlation approximations.
Pribram-Jones, Aurora; Grabowski, Paul E.; Burke, Kieron
2016-06-08
We present that the van Leeuwen proof of linear-response time-dependent density functional theory (TDDFT) is generalized to thermal ensembles. This allows generalization to finite temperatures of the Gross-Kohn relation, the exchange-correlation kernel of TDDFT, and fluctuation dissipation theorem for DFT. Finally, this produces a natural method for generating new thermal exchange-correlation approximations.
Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.
Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G
2017-09-01
To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.
Path planning in uncertain flow fields using ensemble method
NASA Astrophysics Data System (ADS)
Wang, Tong; Le Maître, Olivier P.; Hoteit, Ibrahim; Knio, Omar M.
2016-10-01
An ensemble-based approach is developed to conduct optimal path planning in unsteady ocean currents under uncertainty. We focus our attention on two-dimensional steady and unsteady uncertain flows, and adopt a sampling methodology that is well suited to operational forecasts, where an ensemble of deterministic predictions is used to model and quantify uncertainty. In an operational setting, much about dynamics, topography, and forcing of the ocean environment is uncertain. To address this uncertainty, the flow field is parametrized using a finite number of independent canonical random variables with known densities, and the ensemble is generated by sampling these variables. For each of the resulting realizations of the uncertain current field, we predict the path that minimizes the travel time by solving a boundary value problem (BVP), based on the Pontryagin maximum principle. A family of backward-in-time trajectories starting at the end position is used to generate suitable initial values for the BVP solver. This allows us to examine and analyze the performance of the sampling strategy and to develop insight into extensions dealing with general circulation ocean models. In particular, the ensemble method enables us to perform a statistical analysis of travel times and consequently develop a path planning approach that accounts for these statistics. The proposed methodology is tested for a number of scenarios. We first validate our algorithms by reproducing simple canonical solutions, and then demonstrate our approach in more complex flow fields, including idealized, steady and unsteady double-gyre flows.
Global Optimization Ensemble Model for Classification Methods
Anwar, Hina; Qamar, Usman; Muzaffar Qureshi, Abdul Wahab
2014-01-01
Supervised learning is the process of data mining for deducing rules from training datasets. A broad array of supervised learning algorithms exists, every one of them with its own advantages and drawbacks. There are some basic issues that affect the accuracy of classifier while solving a supervised learning problem, like bias-variance tradeoff, dimensionality of input space, and noise in the input data space. All these problems affect the accuracy of classifier and are the reason that there is no global optimal method for classification. There is not any generalized improvement method that can increase the accuracy of any classifier while addressing all the problems stated above. This paper proposes a global optimization ensemble model for classification methods (GMC) that can improve the overall accuracy for supervised learning problems. The experimental results on various public datasets showed that the proposed model improved the accuracy of the classification models from 1% to 30% depending upon the algorithm complexity. PMID:24883382
Multi-Model Ensemble Wake Vortex Prediction
NASA Technical Reports Server (NTRS)
Koerner, Stephan; Holzaepfel, Frank; Ahmad, Nash'at N.
2015-01-01
Several multi-model ensemble methods are investigated for predicting wake vortex transport and decay. This study is a joint effort between National Aeronautics and Space Administration and Deutsches Zentrum fuer Luft- und Raumfahrt to develop a multi-model ensemble capability using their wake models. An overview of different multi-model ensemble methods and their feasibility for wake applications is presented. The methods include Reliability Ensemble Averaging, Bayesian Model Averaging, and Monte Carlo Simulations. The methodologies are evaluated using data from wake vortex field experiments.
Force Sensor Based Tool Condition Monitoring Using a Heterogeneous Ensemble Learning Model
Wang, Guofeng; Yang, Yinwei; Li, Zhimeng
2014-01-01
Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability. PMID:25405514
NASA Astrophysics Data System (ADS)
Perera, Kushan C.; Western, Andrew W.; Robertson, David E.; George, Biju; Nawarathna, Bandara
2016-06-01
Irrigation demands fluctuate in response to weather variations and a range of irrigation management decisions, which creates challenges for water supply system operators. This paper develops a method for real-time ensemble forecasting of irrigation demand and applies it to irrigation command areas of various sizes for lead times of 1 to 5 days. The ensemble forecasts are based on a deterministic time series model coupled with ensemble representations of the various inputs to that model. Forecast inputs include past flow, precipitation, and potential evapotranspiration. These inputs are variously derived from flow observations from a modernized irrigation delivery system; short-term weather forecasts derived from numerical weather prediction models and observed weather data available from automatic weather stations. The predictive performance for the ensemble spread of irrigation demand was quantified using rank histograms, the mean continuous rank probability score (CRPS), the mean CRPS reliability and the temporal mean of the ensemble root mean squared error (MRMSE). The mean forecast was evaluated using root mean squared error (RMSE), Nash-Sutcliffe model efficiency (NSE) and bias. The NSE values for evaluation periods ranged between 0.96 (1 day lead time, whole study area) and 0.42 (5 days lead time, smallest command area). Rank histograms and comparison of MRMSE, mean CRPS, mean CRPS reliability and RMSE indicated that the ensemble spread is generally a reliable representation of the forecast uncertainty for short lead times but underestimates the uncertainty for long lead times.
Force sensor based tool condition monitoring using a heterogeneous ensemble learning model.
Wang, Guofeng; Yang, Yinwei; Li, Zhimeng
2014-11-14
Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability.
Marino, Ricardo; Majumdar, Satya N; Schehr, Grégory; Vivo, Pierpaolo
2016-09-01
Let P_{β}^{(V)}(N_{I}) be the probability that a N×Nβ-ensemble of random matrices with confining potential V(x) has N_{I} eigenvalues inside an interval I=[a,b] on the real line. We introduce a general formalism, based on the Coulomb gas technique and the resolvent method, to compute analytically P_{β}^{(V)}(N_{I}) for large N. We show that this probability scales for large N as P_{β}^{(V)}(N_{I})≈exp[-βN^{2}ψ^{(V)}(N_{I}/N)], where β is the Dyson index of the ensemble. The rate function ψ^{(V)}(k_{I}), independent of β, is computed in terms of single integrals that can be easily evaluated numerically. The general formalism is then applied to the classical β-Gaussian (I=[-L,L]), β-Wishart (I=[1,L]), and β-Cauchy (I=[-L,L]) ensembles. Expanding the rate function around its minimum, we find that generically the number variance var(N_{I}) exhibits a nonmonotonic behavior as a function of the size of the interval, with a maximum that can be precisely characterized. These analytical results, corroborated by numerical simulations, provide the full counting statistics of many systems where random matrix models apply. In particular, we present results for the full counting statistics of zero-temperature one-dimensional spinless fermions in a harmonic trap.
Uncertainty in modeled upper ocean heat content change
NASA Astrophysics Data System (ADS)
Tokmakian, Robin; Challenor, Peter
2014-02-01
This paper examines the uncertainty in the change in the heat content in the ocean component of a general circulation model. We describe the design and implementation of our statistical methodology. Using an ensemble of model runs and an emulator, we produce an estimate of the full probability distribution function (PDF) for the change in upper ocean heat in an Atmosphere/Ocean General Circulation Model, the Community Climate System Model v. 3, across a multi-dimensional input space. We show how the emulator of the GCM's heat content change and hence, the PDF, can be validated and how implausible outcomes from the emulator can be identified when compared to observational estimates of the metric. In addition, the paper describes how the emulator outcomes and related uncertainty information might inform estimates of the same metric from a multi-model Coupled Model Intercomparison Project phase 3 ensemble. We illustrate how to (1) construct an ensemble based on experiment design methods, (2) construct and evaluate an emulator for a particular metric of a complex model, (3) validate the emulator using observational estimates and explore the input space with respect to implausible outcomes and (4) contribute to the understanding of uncertainties within a multi-model ensemble. Finally, we estimate the most likely value for heat content change and its uncertainty for the model, with respect to both observations and the uncertainty in the value for the input parameters.
Method and apparatus for quantum information processing using entangled neutral-atom qubits
Jau, Yuan Yu; Biedermann, Grant; Deutsch, Ivan
2018-04-03
A method for preparing an entangled quantum state of an atomic ensemble is provided. The method includes loading each atom of the atomic ensemble into a respective optical trap; placing each atom of the atomic ensemble into a same first atomic quantum state by impingement of pump radiation; approaching the atoms of the atomic ensemble to within a dipole-dipole interaction length of each other; Rydberg-dressing the atomic ensemble; during the Rydberg-dressing operation, exciting the atomic ensemble with a Raman pulse tuned to stimulate a ground-state hyperfine transition from the first atomic quantum state to a second atomic quantum state; and separating the atoms of the atomic ensemble by more than a dipole-dipole interaction length.
NASA Astrophysics Data System (ADS)
Brochero, Darwin; Hajji, Islem; Pina, Jasson; Plana, Queralt; Sylvain, Jean-Daniel; Vergeynst, Jenna; Anctil, Francois
2015-04-01
Theories about generalization error with ensembles are mainly based on the diversity concept, which promotes resorting to many members of different properties to support mutually agreeable decisions. Kuncheva (2004) proposed the Multi Level Diversity Model (MLDM) to promote diversity in model ensembles, combining different data subsets, input subsets, models, parameters, and including a combiner level in order to optimize the final ensemble. This work tests the hypothesis about the minimisation of the generalization error with ensembles of Neural Network (NN) structures. We used the MLDM to evaluate two different scenarios: (i) ensembles from a same NN architecture, and (ii) a super-ensemble built by a combination of sub-ensembles of many NN architectures. The time series used correspond to the 12 basins of the MOdel Parameter Estimation eXperiment (MOPEX) project that were used by Duan et al. (2006) and Vos (2013) as benchmark. Six architectures are evaluated: FeedForward NN (FFNN) trained with the Levenberg Marquardt algorithm (Hagan et al., 1996), FFNN trained with SCE (Duan et al., 1993), Recurrent NN trained with a complex method (Weins et al., 2008), Dynamic NARX NN (Leontaritis and Billings, 1985), Echo State Network (ESN), and leak integrator neuron (L-ESN) (Lukosevicius and Jaeger, 2009). Each architecture performs separately an Input Variable Selection (IVS) according to a forward stepwise selection (Anctil et al., 2009) using mean square error as objective function. Post-processing by Predictor Stepwise Selection (PSS) of the super-ensemble has been done following the method proposed by Brochero et al. (2011). IVS results showed that the lagged stream flow, lagged precipitation, and Standardized Precipitation Index (SPI) (McKee et al., 1993) were the most relevant variables. They were respectively selected as one of the firsts three selected variables in 66, 45, and 28 of the 72 scenarios. A relationship between aridity index (Arora, 2002) and NN performance showed that wet basins are more easily modelled than dry basins. Nash-Sutcliffe (NS) Efficiency criterion was used to evaluate the performance of the models. Test results showed that in 9 of the 12 basins, the mean sub-ensembles performance was better than the one presented by Vos (2013). Furthermore, in 55 of 72 cases (6 NN structures x 12 basins) the mean sub-ensemble performance was better than the best individual performance, and in 10 basins the performance of the mean super-ensemble was better than the best individual super-ensemble member. As well, it was identified that members of ESN and L-ESN sub-ensembles have very similar and good performance values. Regarding the mean super-ensemble performance, we obtained an average gain in performance of 17%, and found that PSS preserves sub-ensemble members from different NN structures, indicating the pertinence of diversity in the super-ensemble. Moreover, it was demonstrated that around 100 predictors from the different structures are enough to optimize the super-ensemble. Although sub-ensembles of FFNN-SCE showed unstable performances, FFNN-SCE members were picked-up several times in the final predictor selection. References Anctil, F., M. Filion, and J. Tournebize (2009). "A neural network experiment on the simulation of daily nitrate-nitrogen and suspended sediment fluxes from a small agricultural catchment". In: Ecol. Model. 220.6, pp. 879-887. Arora, V. K. (2002). "The use of the aridity index to assess climate change effect on annual runoff". In: J. Hydrol. 265.164, pp. 164 -177 . Brochero, D., F. Anctil, and C. Gagn'e (2011). "Simplifying a hydrological ensemble prediction system with a backward greedy selection of members Part 1: Optimization criteria". In: Hydrol. Earth Syst. Sci. 15.11, pp. 3307-3325. Duan, Q., J. Schaake, V. Andr'eassian, S. Franks, G. Goteti, H. Gupta, Y. Gusev, F. Habets, A. Hall, L. Hay, T. Hogue, M. Huang, G. Leavesley, X. Liang, O. Nasonova, J. Noilhan, L. Oudin, S. Sorooshian, T. Wagener, and E. Wood (2006). "Model Parameter Estimation Experiment (MOPEX): An overview of science strategy and major results from the second and third workshops". In: J. Hydrol. 320.12, pp. 3-17. Duan, Q., V. Gupta, and S. Sorooshian (1993). "Shuffled complex evolution approach for effective and efficient global minimization". In: J. Optimiz. Theory App. 76.3, pp. 501-521. Hagan, M. T., H. B. Demuth, and M. Beale (1996). Neural network design . 1st ed. PWS Publishing Co., p. 730. Kuncheva, L. I. (2004). Combining Pattern Classifiers: Methods and Algorithms . Wiley-Interscience, p. 350. Leontaritis, I. and S. Billings (1985). "Input-output parametric models for non-linear systems Part I: deterministic non-linear systems". In: International Journal of Control 41.2, pp. 303-328. Lukosevicius, M. and H. Jaeger (2009). "Reservoir computing approaches to recurrent neural network training". In: Computer Science Review 3.3, pp. 127-149. McKee, T., N. Doesken, and J. Kleist (1993). The Relationship of Drought Frequency and Duration to Time Scales . In: Eighth Conference on Applied Climatology. Vos, N. J. de (2013). "Echo state networks as an alternative to traditional artificial neural networks in rainfall-runoff modelling". In: Hydrol. Earth Syst. Sci. 17.1, pp. 253-267. Weins, T., R. Burton, G. Schoenau, and D. Bitner (2008). Recursive Generalized Neural Networks (RGNN) for the Modeling of a Load Sensing Pump. In: ASME Joint Conference on Fluid Power, Transmission and Control.
NASA Astrophysics Data System (ADS)
Peishu, Zong; Jianping, Tang; Shuyu, Wang; Lingyun, Xie; Jianwei, Yu; Yunqian, Zhu; Xiaorui, Niu; Chao, Li
2017-08-01
The parameterization of physical processes is one of the critical elements to properly simulate the regional climate over eastern China. It is essential to conduct detailed analyses on the effect of physical parameterization schemes on regional climate simulation, to provide more reliable regional climate change information. In this paper, we evaluate the 25-year (1983-2007) summer monsoon climate characteristics of precipitation and surface air temperature by using the regional spectral model (RSM) with different physical schemes. The ensemble results using the reliability ensemble averaging (REA) method are also assessed. The result shows that the RSM model has the capacity to reproduce the spatial patterns, the variations, and the temporal tendency of surface air temperature and precipitation over eastern China. And it tends to predict better climatology characteristics over the Yangtze River basin and the South China. The impact of different physical schemes on RSM simulations is also investigated. Generally, the CLD3 cloud water prediction scheme tends to produce larger precipitation because of its overestimation of the low-level moisture. The systematic biases derived from the KF2 cumulus scheme are larger than those from the RAS scheme. The scale-selective bias correction (SSBC) method improves the simulation of the temporal and spatial characteristics of surface air temperature and precipitation and advances the circulation simulation capacity. The REA ensemble results show significant improvement in simulating temperature and precipitation distribution, which have much higher correlation coefficient and lower root mean square error. The REA result of selected experiments is better than that of nonselected experiments, indicating the necessity of choosing better ensemble samples for ensemble.
A comparison of breeding and ensemble transform vectors for global ensemble generation
NASA Astrophysics Data System (ADS)
Deng, Guo; Tian, Hua; Li, Xiaoli; Chen, Jing; Gong, Jiandong; Jiao, Meiyan
2012-02-01
To compare the initial perturbation techniques using breeding vectors and ensemble transform vectors, three ensemble prediction systems using both initial perturbation methods but with different ensemble member sizes based on the spectral model T213/L31 are constructed at the National Meteorological Center, China Meteorological Administration (NMC/CMA). A series of ensemble verification scores such as forecast skill of the ensemble mean, ensemble resolution, and ensemble reliability are introduced to identify the most important attributes of ensemble forecast systems. The results indicate that the ensemble transform technique is superior to the breeding vector method in light of the evaluation of anomaly correlation coefficient (ACC), which is a deterministic character of the ensemble mean, the root-mean-square error (RMSE) and spread, which are of probabilistic attributes, and the continuous ranked probability score (CRPS) and its decomposition. The advantage of the ensemble transform approach is attributed to its orthogonality among ensemble perturbations as well as its consistence with the data assimilation system. Therefore, this study may serve as a reference for configuration of the best ensemble prediction system to be used in operation.
Argumentation Based Joint Learning: A Novel Ensemble Learning Approach
Xu, Junyi; Yao, Li; Li, Le
2015-01-01
Recently, ensemble learning methods have been widely used to improve classification performance in machine learning. In this paper, we present a novel ensemble learning method: argumentation based multi-agent joint learning (AMAJL), which integrates ideas from multi-agent argumentation, ensemble learning, and association rule mining. In AMAJL, argumentation technology is introduced as an ensemble strategy to integrate multiple base classifiers and generate a high performance ensemble classifier. We design an argumentation framework named Arena as a communication platform for knowledge integration. Through argumentation based joint learning, high quality individual knowledge can be extracted, and thus a refined global knowledge base can be generated and used independently for classification. We perform numerous experiments on multiple public datasets using AMAJL and other benchmark methods. The results demonstrate that our method can effectively extract high quality knowledge for ensemble classifier and improve the performance of classification. PMID:25966359
Semantic labeling of high-resolution aerial images using an ensemble of fully convolutional networks
NASA Astrophysics Data System (ADS)
Sun, Xiaofeng; Shen, Shuhan; Lin, Xiangguo; Hu, Zhanyi
2017-10-01
High-resolution remote sensing data classification has been a challenging and promising research topic in the community of remote sensing. In recent years, with the rapid advances of deep learning, remarkable progress has been made in this field, which facilitates a transition from hand-crafted features designing to an automatic end-to-end learning. A deep fully convolutional networks (FCNs) based ensemble learning method is proposed to label the high-resolution aerial images. To fully tap the potentials of FCNs, both the Visual Geometry Group network and a deeper residual network, ResNet, are employed. Furthermore, to enlarge training samples with diversity and gain better generalization, in addition to the commonly used data augmentation methods (e.g., rotation, multiscale, and aspect ratio) in the literature, aerial images from other datasets are also collected for cross-scene learning. Finally, we combine these learned models to form an effective FCN ensemble and refine the results using a fully connected conditional random field graph model. Experiments on the ISPRS 2-D Semantic Labeling Contest dataset show that our proposed end-to-end classification method achieves an overall accuracy of 90.7%, a state-of-the-art in the field.
Mazurowski, Maciej A.; Zurada, Jacek M.; Tourassi, Georgia D.
2009-01-01
Ensemble classifiers have been shown efficient in multiple applications. In this article, the authors explore the effectiveness of ensemble classifiers in a case-based computer-aided diagnosis system for detection of masses in mammograms. They evaluate two general ways of constructing subclassifiers by resampling of the available development dataset: Random division and random selection. Furthermore, they discuss the problem of selecting the ensemble size and propose two adaptive incremental techniques that automatically select the size for the problem at hand. All the techniques are evaluated with respect to a previously proposed information-theoretic CAD system (IT-CAD). The experimental results show that the examined ensemble techniques provide a statistically significant improvement (AUC=0.905±0.024) in performance as compared to the original IT-CAD system (AUC=0.865±0.029). Some of the techniques allow for a notable reduction in the total number of examples stored in the case base (to 1.3% of the original size), which, in turn, results in lower storage requirements and a shorter response time of the system. Among the methods examined in this article, the two proposed adaptive techniques are by far the most effective for this purpose. Furthermore, the authors provide some discussion and guidance for choosing the ensemble parameters. PMID:19673196
Ocean Predictability and Uncertainty Forecasts Using Local Ensemble Transfer Kalman Filter (LETKF)
NASA Astrophysics Data System (ADS)
Wei, M.; Hogan, P. J.; Rowley, C. D.; Smedstad, O. M.; Wallcraft, A. J.; Penny, S. G.
2017-12-01
Ocean predictability and uncertainty are studied with an ensemble system that has been developed based on the US Navy's operational HYCOM using the Local Ensemble Transfer Kalman Filter (LETKF) technology. One of the advantages of this method is that the best possible initial analysis states for the HYCOM forecasts are provided by the LETKF which assimilates operational observations using ensemble method. The background covariance during this assimilation process is implicitly supplied with the ensemble avoiding the difficult task of developing tangent linear and adjoint models out of HYCOM with the complicated hybrid isopycnal vertical coordinate for 4D-VAR. The flow-dependent background covariance from the ensemble will be an indispensable part in the next generation hybrid 4D-Var/ensemble data assimilation system. The predictability and uncertainty for the ocean forecasts are studied initially for the Gulf of Mexico. The results are compared with another ensemble system using Ensemble Transfer (ET) method which has been used in the Navy's operational center. The advantages and disadvantages are discussed.
Multimodel Ensemble Methods for Prediction of Wake-Vortex Transport and Decay Originating NASA
NASA Technical Reports Server (NTRS)
Korner, Stephan; Ahmad, Nashat N.; Holzapfel, Frank; VanValkenburg, Randal L.
2017-01-01
Several multimodel ensemble methods are selected and further developed to improve the deterministic and probabilistic prediction skills of individual wake-vortex transport and decay models. The different multimodel ensemble methods are introduced, and their suitability for wake applications is demonstrated. The selected methods include direct ensemble averaging, Bayesian model averaging, and Monte Carlo simulation. The different methodologies are evaluated employing data from wake-vortex field measurement campaigns conducted in the United States and Germany.
NASA Astrophysics Data System (ADS)
Portegies Zwart, Simon; Boekholt, Tjarda
2014-04-01
The conservation of energy, linear momentum, and angular momentum are important drivers of our physical understanding of the evolution of the universe. These quantities are also conserved in Newton's laws of motion under gravity. Numerical integration of the associated equations of motion is extremely challenging, in particular due to the steady growth of numerical errors (by round-off and discrete time-stepping and the exponential divergence between two nearby solutions. As a result, numerical solutions to the general N-body problem are intrinsically questionable. Using brute force integrations to arbitrary numerical precision we demonstrate empirically that ensembles of different realizations of resonant three-body interactions produce statistically indistinguishable results. Although individual solutions using common integration methods are notoriously unreliable, we conjecture that an ensemble of approximate three-body solutions accurately represents an ensemble of true solutions, so long as the energy during integration is conserved to better than 1/10. We therefore provide an independent confirmation that previous work on self-gravitating systems can actually be trusted, irrespective of the intrinsically chaotic nature of the N-body problem.
Development of probabilistic regional climate scenario in East Asia
NASA Astrophysics Data System (ADS)
Dairaku, K.; Ueno, G.; Ishizaki, N. N.
2015-12-01
Climate information and services for Impacts, Adaptation and Vulnerability (IAV) Assessments are of great concern. In order to develop probabilistic regional climate information that represents the uncertainty in climate scenario experiments in East Asia (CORDEX-EA and Japan), the probability distribution of 2m air temperature was estimated by using developed regression model. The method can be easily applicable to other regions and other physical quantities, and also to downscale to finer-scale dependent on availability of observation dataset. Probabilistic climate information in present (1969-1998) and future (2069-2098) climate was developed using CMIP3 SRES A1b scenarios 21 models and the observation data (CRU_TS3.22 & University of Delaware in CORDEX-EA, NIAES AMeDAS mesh data in Japan). The prototype of probabilistic information in CORDEX-EA and Japan represent the quantified structural uncertainties of multi-model ensemble experiments of climate change scenarios. Appropriate combination of statistical methods and optimization of climate ensemble experiments using multi-General Circulation Models (GCMs) and multi-regional climate models (RCMs) ensemble downscaling experiments are investigated.
Multiensemble Markov models of molecular thermodynamics and kinetics.
Wu, Hao; Paul, Fabian; Wehmeyer, Christoph; Noé, Frank
2016-06-07
We introduce the general transition-based reweighting analysis method (TRAM), a statistically optimal approach to integrate both unbiased and biased molecular dynamics simulations, such as umbrella sampling or replica exchange. TRAM estimates a multiensemble Markov model (MEMM) with full thermodynamic and kinetic information at all ensembles. The approach combines the benefits of Markov state models-clustering of high-dimensional spaces and modeling of complex many-state systems-with those of the multistate Bennett acceptance ratio of exploiting biased or high-temperature ensembles to accelerate rare-event sampling. TRAM does not depend on any rate model in addition to the widely used Markov state model approximation, but uses only fundamental relations such as detailed balance and binless reweighting of configurations between ensembles. Previous methods, including the multistate Bennett acceptance ratio, discrete TRAM, and Markov state models are special cases and can be derived from the TRAM equations. TRAM is demonstrated by efficiently computing MEMMs in cases where other estimators break down, including the full thermodynamics and rare-event kinetics from high-dimensional simulation data of an all-atom protein-ligand binding model.
Multiensemble Markov models of molecular thermodynamics and kinetics
Wu, Hao; Paul, Fabian; Noé, Frank
2016-01-01
We introduce the general transition-based reweighting analysis method (TRAM), a statistically optimal approach to integrate both unbiased and biased molecular dynamics simulations, such as umbrella sampling or replica exchange. TRAM estimates a multiensemble Markov model (MEMM) with full thermodynamic and kinetic information at all ensembles. The approach combines the benefits of Markov state models—clustering of high-dimensional spaces and modeling of complex many-state systems—with those of the multistate Bennett acceptance ratio of exploiting biased or high-temperature ensembles to accelerate rare-event sampling. TRAM does not depend on any rate model in addition to the widely used Markov state model approximation, but uses only fundamental relations such as detailed balance and binless reweighting of configurations between ensembles. Previous methods, including the multistate Bennett acceptance ratio, discrete TRAM, and Markov state models are special cases and can be derived from the TRAM equations. TRAM is demonstrated by efficiently computing MEMMs in cases where other estimators break down, including the full thermodynamics and rare-event kinetics from high-dimensional simulation data of an all-atom protein–ligand binding model. PMID:27226302
Geometric integrator for simulations in the canonical ensemble
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tapias, Diego, E-mail: diego.tapias@nucleares.unam.mx; Sanders, David P., E-mail: dpsanders@ciencias.unam.mx; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139
2016-08-28
We introduce a geometric integrator for molecular dynamics simulations of physical systems in the canonical ensemble that preserves the invariant distribution in equations arising from the density dynamics algorithm, with any possible type of thermostat. Our integrator thus constitutes a unified framework that allows the study and comparison of different thermostats and of their influence on the equilibrium and non-equilibrium (thermo-)dynamic properties of a system. To show the validity and the generality of the integrator, we implement it with a second-order, time-reversible method and apply it to the simulation of a Lennard-Jones system with three different thermostats, obtaining good conservationmore » of the geometrical properties and recovering the expected thermodynamic results. Moreover, to show the advantage of our geometric integrator over a non-geometric one, we compare the results with those obtained by using the non-geometric Gear integrator, which is frequently used to perform simulations in the canonical ensemble. The non-geometric integrator induces a drift in the invariant quantity, while our integrator has no such drift, thus ensuring that the system is effectively sampling the correct ensemble.« less
Snyder, David A; Montelione, Gaetano T
2005-06-01
An important open question in the field of NMR-based biomolecular structure determination is how best to characterize the precision of the resulting ensemble of structures. Typically, the RMSD, as minimized in superimposing the ensemble of structures, is the preferred measure of precision. However, the presence of poorly determined atomic coordinates and multiple "RMSD-stable domains"--locally well-defined regions that are not aligned in global superimpositions--complicate RMSD calculations. In this paper, we present a method, based on a novel, structurally defined order parameter, for identifying a set of core atoms to use in determining superimpositions for RMSD calculations. In addition we present a method for deciding whether to partition that core atom set into "RMSD-stable domains" and, if so, how to determine partitioning of the core atom set. We demonstrate our algorithm and its application in calculating statistically sound RMSD values by applying it to a set of NMR-derived structural ensembles, superimposing each RMSD-stable domain (or the entire core atom set, where appropriate) found in each protein structure under consideration. A parameter calculated by our algorithm using a novel, kurtosis-based criterion, the epsilon-value, is a measure of precision of the superimposition that complements the RMSD. In addition, we compare our algorithm with previously described algorithms for determining core atom sets. The methods presented in this paper for biomolecular structure superimposition are quite general, and have application in many areas of structural bioinformatics and structural biology.
Network-induced chaos in integrate-and-fire neuronal ensembles.
Zhou, Douglas; Rangan, Aaditya V; Sun, Yi; Cai, David
2009-09-01
It has been shown that a single standard linear integrate-and-fire (IF) neuron under a general time-dependent stimulus cannot possess chaotic dynamics despite the firing-reset discontinuity. Here we address the issue of whether conductance-based, pulsed-coupled network interactions can induce chaos in an IF neuronal ensemble. Using numerical methods, we demonstrate that all-to-all, homogeneously pulse-coupled IF neuronal networks can indeed give rise to chaotic dynamics under an external periodic current drive. We also provide a precise characterization of the largest Lyapunov exponent for these high dimensional nonsmooth dynamical systems. In addition, we present a stable and accurate numerical algorithm for evaluating the largest Lyapunov exponent, which can overcome difficulties encountered by traditional methods for these nonsmooth dynamical systems with degeneracy induced by, e.g., refractoriness of neurons.
Fast adaptive flat-histogram ensemble to enhance the sampling in large systems
NASA Astrophysics Data System (ADS)
Xu, Shun; Zhou, Xin; Jiang, Yi; Wang, YanTing
2015-09-01
An efficient novel algorithm was developed to estimate the Density of States (DOS) for large systems by calculating the ensemble means of an extensive physical variable, such as the potential energy, U, in generalized canonical ensembles to interpolate the interior reverse temperature curve , where S( U) is the logarithm of the DOS. This curve is computed with different accuracies in different energy regions to capture the dependence of the reverse temperature on U without setting prior grid in the U space. By combining with a U-compression transformation, we decrease the computational complexity from O( N 3/2) in the normal Wang Landau type method to O( N 1/2) in the current algorithm, as the degrees of freedom of system N. The efficiency of the algorithm is demonstrated by applying to Lennard Jones fluids with various N, along with its ability to find different macroscopic states, including metastable states.
An efficient ensemble learning method for gene microarray classification.
Osareh, Alireza; Shadgar, Bita
2013-01-01
The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.
Multi-model analysis in hydrological prediction
NASA Astrophysics Data System (ADS)
Lanthier, M.; Arsenault, R.; Brissette, F.
2017-12-01
Hydrologic modelling, by nature, is a simplification of the real-world hydrologic system. Therefore ensemble hydrological predictions thus obtained do not present the full range of possible streamflow outcomes, thereby producing ensembles which demonstrate errors in variance such as under-dispersion. Past studies show that lumped models used in prediction mode can return satisfactory results, especially when there is not enough information available on the watershed to run a distributed model. But all lumped models greatly simplify the complex processes of the hydrologic cycle. To generate more spread in the hydrologic ensemble predictions, multi-model ensembles have been considered. In this study, the aim is to propose and analyse a method that gives an ensemble streamflow prediction that properly represents the forecast probabilities and reduced ensemble bias. To achieve this, three simple lumped models are used to generate an ensemble. These will also be combined using multi-model averaging techniques, which generally generate a more accurate hydrogram than the best of the individual models in simulation mode. This new predictive combined hydrogram is added to the ensemble, thus creating a large ensemble which may improve the variability while also improving the ensemble mean bias. The quality of the predictions is then assessed on different periods: 2 weeks, 1 month, 3 months and 6 months using a PIT Histogram of the percentiles of the real observation volumes with respect to the volumes of the ensemble members. Initially, the models were run using historical weather data to generate synthetic flows. This worked for individual models, but not for the multi-model and for the large ensemble. Consequently, by performing data assimilation at each prediction period and thus adjusting the initial states of the models, the PIT Histogram could be constructed using the observed flows while allowing the use of the multi-model predictions. The under-dispersion has been largely corrected on short-term predictions. For the longer term, the addition of the multi-model member has been beneficial to the quality of the predictions, although it is too early to determine whether the gain is related to the addition of a member or if multi-model member has plus-value itself.
EFS: an ensemble feature selection tool implemented as R-package and web-application.
Neumann, Ursula; Genze, Nikita; Heider, Dominik
2017-01-01
Feature selection methods aim at identifying a subset of features that improve the prediction performance of subsequent classification models and thereby also simplify their interpretability. Preceding studies demonstrated that single feature selection methods can have specific biases, whereas an ensemble feature selection has the advantage to alleviate and compensate for these biases. The software EFS (Ensemble Feature Selection) makes use of multiple feature selection methods and combines their normalized outputs to a quantitative ensemble importance. Currently, eight different feature selection methods have been integrated in EFS, which can be used separately or combined in an ensemble. EFS identifies relevant features while compensating specific biases of single methods due to an ensemble approach. Thereby, EFS can improve the prediction accuracy and interpretability in subsequent binary classification models. EFS can be downloaded as an R-package from CRAN or used via a web application at http://EFS.heiderlab.de.
Extreme Value Analysis of hydro meteorological extremes in the ClimEx Large-Ensemble
NASA Astrophysics Data System (ADS)
Wood, R. R.; Martel, J. L.; Willkofer, F.; von Trentini, F.; Schmid, F. J.; Leduc, M.; Frigon, A.; Ludwig, R.
2017-12-01
Many studies show an increase in the magnitude and frequency of hydrological extreme events in the course of climate change. However the contribution of natural variability to the magnitude and frequency of hydrological extreme events is not yet settled. A reliable estimate of extreme events is from great interest for water management and public safety. In the course of the ClimEx Project (www.climex-project.org) a new single-model large-ensemble was created by dynamically downscaling the CanESM2 large-ensemble with the Canadian Regional Climate Model version 5 (CRCM5) for an European Domain and a Northeastern North-American domain. By utilizing the ClimEx 50-Member Large-Ensemble (CRCM5 driven by CanESM2 Large-Ensemble) a thorough analysis of natural variability in extreme events is possible. Are the current extreme value statistical methods able to account for natural variability? How large is the natural variability for e.g. a 1/100 year return period derived from a 50-Member Large-Ensemble for Europe and Northeastern North-America? These questions should be answered by applying various generalized extreme value distributions (GEV) to the ClimEx Large-Ensemble. Hereby various return levels (5-, 10-, 20-, 30-, 60- and 100-years) based on various lengths of time series (20-, 30-, 50-, 100- and 1500-years) should be analyzed for the maximum one day precipitation (RX1d), the maximum three hourly precipitation (RX3h) and the streamflow for selected catchments in Europe. The long time series of the ClimEx Ensemble (7500 years) allows us to give a first reliable estimate of the magnitude and frequency of certain extreme events.
Multi-objective optimization for generating a weighted multi-model ensemble
NASA Astrophysics Data System (ADS)
Lee, H.
2017-12-01
Many studies have demonstrated that multi-model ensembles generally show better skill than each ensemble member. When generating weighted multi-model ensembles, the first step is measuring the performance of individual model simulations using observations. There is a consensus on the assignment of weighting factors based on a single evaluation metric. When considering only one evaluation metric, the weighting factor for each model is proportional to a performance score or inversely proportional to an error for the model. While this conventional approach can provide appropriate combinations of multiple models, the approach confronts a big challenge when there are multiple metrics under consideration. When considering multiple evaluation metrics, it is obvious that a simple averaging of multiple performance scores or model ranks does not address the trade-off problem between conflicting metrics. So far, there seems to be no best method to generate weighted multi-model ensembles based on multiple performance metrics. The current study applies the multi-objective optimization, a mathematical process that provides a set of optimal trade-off solutions based on a range of evaluation metrics, to combining multiple performance metrics for the global climate models and their dynamically downscaled regional climate simulations over North America and generating a weighted multi-model ensemble. NASA satellite data and the Regional Climate Model Evaluation System (RCMES) software toolkit are used for assessment of the climate simulations. Overall, the performance of each model differs markedly with strong seasonal dependence. Because of the considerable variability across the climate simulations, it is important to evaluate models systematically and make future projections by assigning optimized weighting factors to the models with relatively good performance. Our results indicate that the optimally weighted multi-model ensemble always shows better performance than an arithmetic ensemble mean and may provide reliable future projections.
A New Method for Determining Structure Ensemble: Application to a RNA Binding Di-Domain Protein.
Liu, Wei; Zhang, Jingfeng; Fan, Jing-Song; Tria, Giancarlo; Grüber, Gerhard; Yang, Daiwen
2016-05-10
Structure ensemble determination is the basis of understanding the structure-function relationship of a multidomain protein with weak domain-domain interactions. Paramagnetic relaxation enhancement has been proven a powerful tool in the study of structure ensembles, but there exist a number of challenges such as spin-label flexibility, domain dynamics, and overfitting. Here we propose a new (to our knowledge) method to describe structure ensembles using a minimal number of conformers. In this method, individual domains are considered rigid; the position of each spin-label conformer and the structure of each protein conformer are defined by three and six orthogonal parameters, respectively. First, the spin-label ensemble is determined by optimizing the positions and populations of spin-label conformers against intradomain paramagnetic relaxation enhancements with a genetic algorithm. Subsequently, the protein structure ensemble is optimized using a more efficient genetic algorithm-based approach and an overfitting indicator, both of which were established in this work. The method was validated using a reference ensemble with a set of conformers whose populations and structures are known. This method was also applied to study the structure ensemble of the tandem di-domain of a poly (U) binding protein. The determined ensemble was supported by small-angle x-ray scattering and nuclear magnetic resonance relaxation data. The ensemble obtained suggests an induced fit mechanism for recognition of target RNA by the protein. Copyright © 2016 Biophysical Society. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Li, Hui; Hong, Lu-Yao; Zhou, Qing; Yu, Hai-Jie
2015-08-01
The business failure of numerous companies results in financial crises. The high social costs associated with such crises have made people to search for effective tools for business risk prediction, among which, support vector machine is very effective. Several modelling means, including single-technique modelling, hybrid modelling, and ensemble modelling, have been suggested in forecasting business risk with support vector machine. However, existing literature seldom focuses on the general modelling frame for business risk prediction, and seldom investigates performance differences among different modelling means. We reviewed researches on forecasting business risk with support vector machine, proposed the general assisted prediction modelling frame with hybridisation and ensemble (APMF-WHAE), and finally, investigated the use of principal components analysis, support vector machine, random sampling, and group decision, under the general frame in forecasting business risk. Under the APMF-WHAE frame with support vector machine as the base predictive model, four specific predictive models were produced, namely, pure support vector machine, a hybrid support vector machine involved with principal components analysis, a support vector machine ensemble involved with random sampling and group decision, and an ensemble of hybrid support vector machine using group decision to integrate various hybrid support vector machines on variables produced from principle components analysis and samples from random sampling. The experimental results indicate that hybrid support vector machine and ensemble of hybrid support vector machines were able to produce dominating performance than pure support vector machine and support vector machine ensemble.
ENSO Bred Vectors in Coupled Ocean-Atmosphere General Circulation Models
NASA Technical Reports Server (NTRS)
Yang, S. C.; Cai, Ming; Kalnay, E.; Rienecker, M.; Yuan, G.; Toth, ZA.
2004-01-01
The breeding method has been implemented in the NASA Seasonal-to-Interannual Prediction Project (NSIPP) Coupled General Circulation Model (CGCM) with the goal of improving operational seasonal to interannual climate predictions through ensemble forecasting and data assimilation. The coupled instability as cap'tured by the breeding method is the first attempt to isolate the evolving ENSO instability and its corresponding global atmospheric response in a fully coupled ocean-atmosphere GCM. Our results show that the growth rate of the coupled bred vectors (BV) peaks at about 3 months before a background ENSO event. The dominant growing BV modes are reminiscent of the background ENSO anomalies and show a strong tropical response with wind/SST/thermocline interrelated in a manner similar to the background ENSO mode. They exhibit larger amplitudes in the eastern tropical Pacific, reflecting the natural dynamical sensitivity associated with the presence of the shallow thermocline. Moreover, the extratropical perturbations associated with these coupled BV modes reveal the variations related to the atmospheric teleconnection patterns associated with background ENSO variability, e.g. over the North Pacific and North America. A similar experiment was carried out with the NCEP/CFS03 CGCM. Comparisons between bred vectors from the NSIPP CGCM and NCEP/CFS03 CGCM demonstrate the robustness of the results. Our results strongly suggest that the breeding method can serve as a natural filter to identify the slowly varying, coupled instabilities in a coupled GCM, which can be used to construct ensemble perturbations for ensemble forecasts and to estimate the coupled background error covariance for coupled data assimilation.
Sørensen, Lauge; Nielsen, Mads
2018-05-15
The International Challenge for Automated Prediction of MCI from MRI data offered independent, standardized comparison of machine learning algorithms for multi-class classification of normal control (NC), mild cognitive impairment (MCI), converting MCI (cMCI), and Alzheimer's disease (AD) using brain imaging and general cognition. We proposed to use an ensemble of support vector machines (SVMs) that combined bagging without replacement and feature selection. SVM is the most commonly used algorithm in multivariate classification of dementia, and it was therefore valuable to evaluate the potential benefit of ensembling this type of classifier. The ensemble SVM, using either a linear or a radial basis function (RBF) kernel, achieved multi-class classification accuracies of 55.6% and 55.0% in the challenge test set (60 NC, 60 MCI, 60 cMCI, 60 AD), resulting in a third place in the challenge. Similar feature subset sizes were obtained for both kernels, and the most frequently selected MRI features were the volumes of the two hippocampal subregions left presubiculum and right subiculum. Post-challenge analysis revealed that enforcing a minimum number of selected features and increasing the number of ensemble classifiers improved classification accuracy up to 59.1%. The ensemble SVM outperformed single SVM classifications consistently in the challenge test set. Ensemble methods using bagging and feature selection can improve the performance of the commonly applied SVM classifier in dementia classification. This resulted in competitive classification accuracies in the International Challenge for Automated Prediction of MCI from MRI data. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Greenway, D. P.; Hackett, E.
2017-12-01
Under certain atmospheric refractivity conditions, propagated electromagnetic waves (EM) can become trapped between the surface and the bottom of the atmosphere's mixed layer, which is referred to as surface duct propagation. Being able to predict the presence of these surface ducts can reap many benefits to users and developers of sensing technologies and communication systems because they significantly influence the performance of these systems. However, the ability to directly measure or model a surface ducting layer is challenging due to the high spatial resolution and large spatial coverage needed to make accurate refractivity estimates for EM propagation; thus, inverse methods have become an increasingly popular way of determining atmospheric refractivity. This study uses data from the Coupled Ocean/Atmosphere Mesoscale Prediction System developed by the Naval Research Laboratory and instrumented helicopter (helo) measurements taken during the Wallops Island Field Experiment to evaluate the use of ensemble forecasts in refractivity inversions. Helo measurements and ensemble forecasts are optimized to a parametric refractivity model, and three experiments are performed to evaluate whether incorporation of ensemble forecast data aids in more timely and accurate inverse solutions using genetic algorithms. The results suggest that using optimized ensemble members as an initial population for the genetic algorithms generally enhances the accuracy and speed of the inverse solution; however, use of the ensemble data to restrict parameter search space yields mixed results. Inaccurate results are related to parameterization of the ensemble members' refractivity profile and the subsequent extraction of the parameter ranges to limit the search space.
Comparison of different deep learning approaches for parotid gland segmentation from CT images
NASA Astrophysics Data System (ADS)
Hänsch, Annika; Schwier, Michael; Gass, Tobias; Morgas, Tomasz; Haas, Benjamin; Klein, Jan; Hahn, Horst K.
2018-02-01
The segmentation of target structures and organs at risk is a crucial and very time-consuming step in radiotherapy planning. Good automatic methods can significantly reduce the time clinicians have to spend on this task. Due to its variability in shape and often low contrast to surrounding structures, segmentation of the parotid gland is especially challenging. Motivated by the recent success of deep learning, we study different deep learning approaches for parotid gland segmentation. Particularly, we compare 2D, 2D ensemble and 3D U-Net approaches and find that the 2D U-Net ensemble yields the best results with a mean Dice score of 0.817 on our test data. The ensemble approach reduces false positives without the need for an automatic region of interest detection. We also apply our trained 2D U-Net ensemble to segment the test data of the 2015 MICCAI head and neck auto-segmentation challenge. With a mean Dice score of 0.861, our classifier exceeds the highest mean score in the challenge. This shows that the method generalizes well onto data from independent sites. Since appropriate reference annotations are essential for training but often difficult and expensive to obtain, it is important to know how many samples are needed to properly train a neural network. We evaluate the classifier performance after training with differently sized training sets (50-450) and find that 250 cases (without using extensive data augmentation) are sufficient to obtain good results with the 2D ensemble. Adding more samples does not significantly improve the Dice score of the segmentations.
Thermodynamic-ensemble independence of solvation free energy.
Chong, Song-Ho; Ham, Sihyun
2015-02-10
Solvation free energy is the fundamental thermodynamic quantity in solution chemistry. Recently, it has been suggested that the partial molar volume correction is necessary to convert the solvation free energy determined in different thermodynamic ensembles. Here, we demonstrate ensemble-independence of the solvation free energy on general thermodynamic grounds. Theoretical estimates of the solvation free energy based on the canonical or grand-canonical ensemble are pertinent to experiments carried out under constant pressure without any conversion.
Adaptive correction of ensemble forecasts
NASA Astrophysics Data System (ADS)
Pelosi, Anna; Battista Chirico, Giovanni; Van den Bergh, Joris; Vannitsem, Stephane
2017-04-01
Forecasts from numerical weather prediction (NWP) models often suffer from both systematic and non-systematic errors. These are present in both deterministic and ensemble forecasts, and originate from various sources such as model error and subgrid variability. Statistical post-processing techniques can partly remove such errors, which is particularly important when NWP outputs concerning surface weather variables are employed for site specific applications. Many different post-processing techniques have been developed. For deterministic forecasts, adaptive methods such as the Kalman filter are often used, which sequentially post-process the forecasts by continuously updating the correction parameters as new ground observations become available. These methods are especially valuable when long training data sets do not exist. For ensemble forecasts, well-known techniques are ensemble model output statistics (EMOS), and so-called "member-by-member" approaches (MBM). Here, we introduce a new adaptive post-processing technique for ensemble predictions. The proposed method is a sequential Kalman filtering technique that fully exploits the information content of the ensemble. One correction equation is retrieved and applied to all members, however the parameters of the regression equations are retrieved by exploiting the second order statistics of the forecast ensemble. We compare our new method with two other techniques: a simple method that makes use of a running bias correction of the ensemble mean, and an MBM post-processing approach that rescales the ensemble mean and spread, based on minimization of the Continuous Ranked Probability Score (CRPS). We perform a verification study for the region of Campania in southern Italy. We use two years (2014-2015) of daily meteorological observations of 2-meter temperature and 10-meter wind speed from 18 ground-based automatic weather stations distributed across the region, comparing them with the corresponding COSMO-LEPS ensemble forecasts. Deterministic verification scores (e.g., mean absolute error, bias) and probabilistic scores (e.g., CRPS) are used to evaluate the post-processing techniques. We conclude that the new adaptive method outperforms the simpler running bias-correction. The proposed adaptive method often outperforms the MBM method in removing bias. The MBM method has the advantage of correcting the ensemble spread, although it needs more training data.
NASA Astrophysics Data System (ADS)
Fyodorov, Yan V.
2018-06-01
We suggest a method of studying the joint probability density (JPD) of an eigenvalue and the associated `non-orthogonality overlap factor' (also known as the `eigenvalue condition number') of the left and right eigenvectors for non-selfadjoint Gaussian random matrices of size {N× N} . First we derive the general finite N expression for the JPD of a real eigenvalue {λ} and the associated non-orthogonality factor in the real Ginibre ensemble, and then analyze its `bulk' and `edge' scaling limits. The ensuing distribution is maximally heavy-tailed, so that all integer moments beyond normalization are divergent. A similar calculation for a complex eigenvalue z and the associated non-orthogonality factor in the complex Ginibre ensemble is presented as well and yields a distribution with the finite first moment. Its `bulk' scaling limit yields a distribution whose first moment reproduces the well-known result of Chalker and Mehlig (Phys Rev Lett 81(16):3367-3370, 1998), and we provide the `edge' scaling distribution for this case as well. Our method involves evaluating the ensemble average of products and ratios of integer and half-integer powers of characteristic polynomials for Ginibre matrices, which we perform in the framework of a supersymmetry approach. Our paper complements recent studies by Bourgade and Dubach (The distribution of overlaps between eigenvectors of Ginibre matrices, 2018. arXiv:1801.01219).
Pandini, Alessandro; Fraccalvieri, Domenico; Bonati, Laura
2013-01-01
The biological function of proteins is strictly related to their molecular flexibility and dynamics: enzymatic activity, protein-protein interactions, ligand binding and allosteric regulation are important mechanisms involving protein motions. Computational approaches, such as Molecular Dynamics (MD) simulations, are now routinely used to study the intrinsic dynamics of target proteins as well as to complement molecular docking approaches. These methods have also successfully supported the process of rational design and discovery of new drugs. Identification of functionally relevant conformations is a key step in these studies. This is generally done by cluster analysis of the ensemble of structures in the MD trajectory. Recently Artificial Neural Network (ANN) approaches, in particular methods based on Self-Organising Maps (SOMs), have been reported performing more accurately and providing more consistent results than traditional clustering algorithms in various data-mining problems. In the specific case of conformational analysis, SOMs have been successfully used to compare multiple ensembles of protein conformations demonstrating a potential in efficiently detecting the dynamic signatures central to biological function. Moreover, examples of the use of SOMs to address problems relevant to other stages of the drug-design process, including clustering of docking poses, have been reported. In this contribution we review recent applications of ANN algorithms in analysing conformational and structural ensembles and we discuss their potential in computer-based approaches for medicinal chemistry.
Exploring the calibration of a wind forecast ensemble for energy applications
NASA Astrophysics Data System (ADS)
Heppelmann, Tobias; Ben Bouallegue, Zied; Theis, Susanne
2015-04-01
In the German research project EWeLiNE, Deutscher Wetterdienst (DWD) and Fraunhofer Institute for Wind Energy and Energy System Technology (IWES) are collaborating with three German Transmission System Operators (TSO) in order to provide the TSOs with improved probabilistic power forecasts. Probabilistic power forecasts are derived from probabilistic weather forecasts, themselves derived from ensemble prediction systems (EPS). Since the considered raw ensemble wind forecasts suffer from underdispersiveness and bias, calibration methods are developed for the correction of the model bias and the ensemble spread bias. The overall aim is to improve the ensemble forecasts such that the uncertainty of the possible weather deployment is depicted by the ensemble spread from the first forecast hours. Additionally, the ensemble members after calibration should remain physically consistent scenarios. We focus on probabilistic hourly wind forecasts with horizon of 21 h delivered by the convection permitting high-resolution ensemble system COSMO-DE-EPS which has become operational in 2012 at DWD. The ensemble consists of 20 ensemble members driven by four different global models. The model area includes whole Germany and parts of Central Europe with a horizontal resolution of 2.8 km and a vertical resolution of 50 model levels. For verification we use wind mast measurements around 100 m height that corresponds to the hub height of wind energy plants that belong to wind farms within the model area. Calibration of the ensemble forecasts can be performed by different statistical methods applied to the raw ensemble output. Here, we explore local bivariate Ensemble Model Output Statistics at individual sites and quantile regression with different predictors. Applying different methods, we already show an improvement of ensemble wind forecasts from COSMO-DE-EPS for energy applications. In addition, an ensemble copula coupling approach transfers the time-dependencies of the raw ensemble to the calibrated ensemble. The calibrated wind forecasts are evaluated first with univariate probabilistic scores and additionally with diagnostics of wind ramps in order to assess the time-consistency of the calibrated ensemble members.
NASA Astrophysics Data System (ADS)
Landsgesell, Jonas; Holm, Christian; Smiatek, Jens
2017-03-01
The reaction ensemble and the constant pH method are well-known chemical equilibrium approaches to simulate protonation and deprotonation reactions in classical molecular dynamics and Monte Carlo simulations. In this article, we demonstrate the similarity between both methods under certain conditions. We perform molecular dynamics simulations of a weak polyelectrolyte in order to compare the titration curves obtained by both approaches. Our findings reveal a good agreement between the methods when the reaction ensemble is used to sweep the reaction constant. Pronounced differences between the reaction ensemble and the constant pH method can be observed for stronger acids and bases in terms of adaptive pH values. These deviations are due to the presence of explicit protons in the reaction ensemble method which induce a screening of electrostatic interactions between the charged titrable groups of the polyelectrolyte. The outcomes of our simulation hint to a better applicability of the reaction ensemble method for systems in confined geometries and titrable groups in polyelectrolytes with different pKa values.
NASA Technical Reports Server (NTRS)
Oza, Nikunj C.
2004-01-01
Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: each member of the committee should be as competent as possible, but the members should be complementary to one another. If the members are not complementary, Le., if they always agree, then the committee is unnecessary---any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.
Near-optimal protocols in complex nonequilibrium transformations
Gingrich, Todd R.; Rotskoff, Grant M.; Crooks, Gavin E.; ...
2016-08-29
The development of sophisticated experimental means to control nanoscale systems has motivated efforts to design driving protocols that minimize the energy dissipated to the environment. Computational models are a crucial tool in this practical challenge. In this paper, we describe a general method for sampling an ensemble of finite-time, nonequilibrium protocols biased toward a low average dissipation. In addition, we show that this scheme can be carried out very efficiently in several limiting cases. As an application, we sample the ensemble of low-dissipation protocols that invert the magnetization of a 2D Ising model and explore how the diversity of themore » protocols varies in response to constraints on the average dissipation. In this example, we find that there is a large set of protocols with average dissipation close to the optimal value, which we argue is a general phenomenon.« less
Hansen, Bjoern Oest; Meyer, Etienne H; Ferrari, Camilla; Vaid, Neha; Movahedi, Sara; Vandepoele, Klaas; Nikoloski, Zoran; Mutwil, Marek
2018-03-01
Recent advances in gene function prediction rely on ensemble approaches that integrate results from multiple inference methods to produce superior predictions. Yet, these developments remain largely unexplored in plants. We have explored and compared two methods to integrate 10 gene co-function networks for Arabidopsis thaliana and demonstrate how the integration of these networks produces more accurate gene function predictions for a larger fraction of genes with unknown function. These predictions were used to identify genes involved in mitochondrial complex I formation, and for five of them, we confirmed the predictions experimentally. The ensemble predictions are provided as a user-friendly online database, EnsembleNet. The methods presented here demonstrate that ensemble gene function prediction is a powerful method to boost prediction performance, whereas the EnsembleNet database provides a cutting-edge community tool to guide experimentalists. © 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.
Haberman, Jason; Brady, Timothy F; Alvarez, George A
2015-04-01
Ensemble perception, including the ability to "see the average" from a group of items, operates in numerous feature domains (size, orientation, speed, facial expression, etc.). Although the ubiquity of ensemble representations is well established, the large-scale cognitive architecture of this process remains poorly defined. We address this using an individual differences approach. In a series of experiments, observers saw groups of objects and reported either a single item from the group or the average of the entire group. High-level ensemble representations (e.g., average facial expression) showed complete independence from low-level ensemble representations (e.g., average orientation). In contrast, low-level ensemble representations (e.g., orientation and color) were correlated with each other, but not with high-level ensemble representations (e.g., facial expression and person identity). These results suggest that there is not a single domain-general ensemble mechanism, and that the relationship among various ensemble representations depends on how proximal they are in representational space. (c) 2015 APA, all rights reserved).
Generalization of information-based concepts in forecast verification
NASA Astrophysics Data System (ADS)
Tödter, J.; Ahrens, B.
2012-04-01
This work deals with information-theoretical methods in probabilistic forecast verification. Recent findings concerning the Ignorance Score are shortly reviewed, then the generalization to continuous forecasts is shown. For ensemble forecasts, the presented measures can be calculated exactly. The Brier Score (BS) and its generalizations to the multi-categorical Ranked Probability Score (RPS) and to the Continuous Ranked Probability Score (CRPS) are the prominent verification measures for probabilistic forecasts. Particularly, their decompositions into measures quantifying the reliability, resolution and uncertainty of the forecasts are attractive. Information theory sets up the natural framework for forecast verification. Recently, it has been shown that the BS is a second-order approximation of the information-based Ignorance Score (IGN), which also contains easily interpretable components and can also be generalized to a ranked version (RIGN). Here, the IGN, its generalizations and decompositions are systematically discussed in analogy to the variants of the BS. Additionally, a Continuous Ranked IGN (CRIGN) is introduced in analogy to the CRPS. The applicability and usefulness of the conceptually appealing CRIGN is illustrated, together with an algorithm to evaluate its components reliability, resolution, and uncertainty for ensemble-generated forecasts. This is also directly applicable to the more traditional CRPS.
A Novel Data-Driven Learning Method for Radar Target Detection in Nonstationary Environments
2016-05-01
Classifier ensembles for changing environments,” in Multiple Classifier Systems, vol. 3077, F. Roli, J. Kittler and T. Windeatt, Eds. New York, NY...Dec. 2006, pp. 1113–1118. [21] J. Z. Kolter and M. A. Maloof, “Dynamic weighted majority: An ensemble method for drifting concepts,” J. Mach. Learn...Trans. Neural Netw., vol. 22, no. 10, pp. 1517–1531, Oct. 2011. [23] R. Polikar, “ Ensemble learning,” in Ensemble Machine Learning: Methods and
NASA Astrophysics Data System (ADS)
Brekke, L. D.; Prairie, J.; Pruitt, T.; Rajagopalan, B.; Woodhouse, C.
2008-12-01
Water resources adaptation planning under climate change involves making assumptions about probabilistic water supply conditions, which are linked to a given climate context (e.g., instrument records, paleoclimate indicators, projected climate data, or blend of these). Methods have been demonstrated to associate water supply assumptions with any of these climate information types. Additionally, demonstrations have been offered that represent these information types in a scenario-rich (ensemble) planning framework, either via ensembles (e.g., survey of many climate projections) or stochastic modeling (e.g., based on instrument records or paleoclimate indicators). If the planning goal involves using a hydrologic ensemble that jointly reflects paleoclimate (e.g., lower- frequency variations) and projected climate information (e.g., monthly to annual trends), methods are required to guide how these information types might be translated into water supply assumptions. However, even if such a method exists, there is lack of understanding on how such a hydrologic ensemble might differ from ensembles developed relative to paleoclimate or projected climate information alone. This research explores two questions: (1) how might paleoclimate and projected climate information be blended into an planning hydrologic ensemble, and (2) how does a planning hydrologic ensemble differ when associated with the individual climate information types (i.e. instrumental records, paleoclimate, projected climate, or blend of the latter two). Case study basins include the Gunnison River Basin in Colorado and the Missouri River Basin above Toston in Montana. Presentation will highlight ensemble development methods by information type, and comparison of ensemble results.
Lu, Qing; Kim, Jaegil; Straub, John E
2013-03-14
The generalized Replica Exchange Method (gREM) is extended into the isobaric-isothermal ensemble, and applied to simulate a vapor-liquid phase transition in Lennard-Jones fluids. Merging an optimally designed generalized ensemble sampling with replica exchange, gREM is particularly well suited for the effective simulation of first-order phase transitions characterized by "backbending" in the statistical temperature. While the metastable and unstable states in the vicinity of the first-order phase transition are masked by the enthalpy gap in temperature replica exchange method simulations, they are transformed into stable states through the parameterized effective sampling weights in gREM simulations, and join vapor and liquid phases with a succession of unimodal enthalpy distributions. The enhanced sampling across metastable and unstable states is achieved without the need to identify a "good" order parameter for biased sampling. We performed gREM simulations at various pressures below and near the critical pressure to examine the change in behavior of the vapor-liquid phase transition at different pressures. We observed a crossover from the first-order phase transition at low pressure, characterized by the backbending in the statistical temperature and the "kink" in the Gibbs free energy, to a continuous second-order phase transition near the critical pressure. The controlling mechanisms of nucleation and continuous phase transition are evident and the coexistence properties and phase diagram are found in agreement with literature results.
Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling
NASA Astrophysics Data System (ADS)
Galelli, S.; Castelletti, A.
2013-02-01
Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modeling. In this paper we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modeling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalization property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally very efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analyzed on two real-world case studies (Marina catchment (Singapore) and Canning River (Western Australia)) representing two different morphoclimatic contexts comparatively with other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.
Bhattacharyya, Moitrayee; Vishveshwara, Saraswathi
2011-07-01
In this article, we present a novel application of a quantum clustering (QC) technique to objectively cluster the conformations, sampled by molecular dynamics simulations performed on different ligand bound structures of the protein. We further portray each conformational population in terms of dynamically stable network parameters which beautifully capture the ligand induced variations in the ensemble in atomistic detail. The conformational populations thus identified by the QC method and verified by network parameters are evaluated for different ligand bound states of the protein pyrrolysyl-tRNA synthetase (DhPylRS) from D. hafniense. The ligand/environment induced re-distribution of protein conformational ensembles forms the basis for understanding several important biological phenomena such as allostery and enzyme catalysis. The atomistic level characterization of each population in the conformational ensemble in terms of the re-orchestrated networks of amino acids is a challenging problem, especially when the changes are minimal at the backbone level. Here we demonstrate that the QC method is sensitive to such subtle changes and is able to cluster MD snapshots which are similar at the side-chain interaction level. Although we have applied these methods on simulation trajectories of a modest time scale (20 ns each), we emphasize that our methodology provides a general approach towards an objective clustering of large-scale MD simulation data and may be applied to probe multistate equilibria at higher time scales, and to problems related to protein folding for any protein or protein-protein/RNA/DNA complex of interest with a known structure.
Dynamic principle for ensemble control tools.
Samoletov, A; Vasiev, B
2017-11-28
Dynamical equations describing physical systems in contact with a thermal bath are commonly extended by mathematical tools called "thermostats." These tools are designed for sampling ensembles in statistical mechanics. Here we propose a dynamic principle underlying a range of thermostats which is derived using fundamental laws of statistical physics and ensures invariance of the canonical measure. The principle covers both stochastic and deterministic thermostat schemes. Our method has a clear advantage over a range of proposed and widely used thermostat schemes that are based on formal mathematical reasoning. Following the derivation of the proposed principle, we show its generality and illustrate its applications including design of temperature control tools that differ from the Nosé-Hoover-Langevin scheme.
A common fallacy in climate model evaluation
NASA Astrophysics Data System (ADS)
Annan, J. D.; Hargreaves, J. C.; Tachiiri, K.
2012-04-01
We discuss the assessment of model ensembles such as that arising from the CMIP3 coordinated multi-model experiments. An important aspect of this is not merely the closeness of the models to observations in absolute terms but also the reliability of the ensemble spread as an indication of uncertainty. In this context, it has been widely argued that the multi-model ensemble of opportunity is insufficiently broad to adequately represent uncertainties regarding future climate change. For example, the IPCC AR4 summarises the consensus with the sentence: "Those studies also suggest that the current AOGCMs may not cover the full range of uncertainty for climate sensitivity." Similar claims have been made in the literature for other properties of the climate system, including the transient climate response and efficiency of ocean heat uptake. Comparison of model outputs with observations of the climate system forms an essential component of model assessment and is crucial for building our confidence in model predictions. However, methods for undertaking this comparison are not always clearly justified and understood. Here we show that the popular approach which forms the basis for the above claims, of comparing the ensemble spread to a so-called "observationally-constrained pdf", can be highly misleading. Such a comparison will almost certainly result in disagreement, but in reality tells us little about the performance of the ensemble. We present an alternative approach based on an assessment of the predictive performance of the ensemble, and show how it may lead to very different, and rather more encouraging, conclusions. We additionally outline some necessary conditions for an ensemble (or more generally, a probabilistic prediction) to be challenged by an observation.
Pourhoseingholi, Mohamad Amin; Kheirian, Sedigheh; Zali, Mohammad Reza
2017-12-01
Colorectal cancer (CRC) is one of the most common malignancies and cause of cancer mortality worldwide. Given the importance of predicting the survival of CRC patients and the growing use of data mining methods, this study aims to compare the performance of models for predicting 5-year survival of CRC patients using variety of basic and ensemble data mining methods. The CRC dataset from The Shahid Beheshti University of Medical Sciences Research Center for Gastroenterology and Liver Diseases were used for prediction and comparative study of the base and ensemble data mining techniques. Feature selection methods were used to select predictor attributes for classification. The WEKA toolkit and MedCalc software were respectively utilized for creating and comparing the models. The obtained results showed that the predictive performance of developed models was altogether high (all greater than 90%). Overall, the performance of ensemble models was higher than that of basic classifiers and the best result achieved by ensemble voting model in terms of area under the ROC curve (AUC= 0.96). AUC Comparison of models showed that the ensemble voting method significantly outperformed all models except for two methods of Random Forest (RF) and Bayesian Network (BN) considered the overlapping 95% confidence intervals. This result may indicate high predictive power of these two methods along with ensemble voting for predicting 5-year survival of CRC patients.
The role of ensemble post-processing for modeling the ensemble tail
NASA Astrophysics Data System (ADS)
Van De Vyver, Hans; Van Schaeybroeck, Bert; Vannitsem, Stéphane
2016-04-01
The past decades the numerical weather prediction community has witnessed a paradigm shift from deterministic to probabilistic forecast and state estimation (Buizza and Leutbecher, 2015; Buizza et al., 2008), in an attempt to quantify the uncertainties associated with initial-condition and model errors. An important benefit of a probabilistic framework is the improved prediction of extreme events. However, one may ask to what extent such model estimates contain information on the occurrence probability of extreme events and how this information can be optimally extracted. Different approaches have been proposed and applied on real-world systems which, based on extreme value theory, allow the estimation of extreme-event probabilities conditional on forecasts and state estimates (Ferro, 2007; Friederichs, 2010). Using ensemble predictions generated with a model of low dimensionality, a thorough investigation is presented quantifying the change of predictability of extreme events associated with ensemble post-processing and other influencing factors including the finite ensemble size, lead time and model assumption and the use of different covariates (ensemble mean, maximum, spread...) for modeling the tail distribution. Tail modeling is performed by deriving extreme-quantile estimates using peak-over-threshold representation (generalized Pareto distribution) or quantile regression. Common ensemble post-processing methods aim to improve mostly the ensemble mean and spread of a raw forecast (Van Schaeybroeck and Vannitsem, 2015). Conditional tail modeling, on the other hand, is a post-processing in itself, focusing on the tails only. Therefore, it is unclear how applying ensemble post-processing prior to conditional tail modeling impacts the skill of extreme-event predictions. This work is investigating this question in details. Buizza, Leutbecher, and Isaksen, 2008: Potential use of an ensemble of analyses in the ECMWF Ensemble Prediction System, Q. J. R. Meteorol. Soc. 134: 2051-2066.Buizza and Leutbecher, 2015: The forecast skill horizon, Q. J. R. Meteorol. Soc. 141: 3366-3382.Ferro, 2007: A probability model for verifying deterministic forecasts of extreme events. Weather and Forecasting 22 (5), 1089-1100.Friederichs, 2010: Statistical downscaling of extreme precipitation events using extreme value theory. Extremes 13, 109-132.Van Schaeybroeck and Vannitsem, 2015: Ensemble post-processing using member-by-member approaches: theoretical aspects. Q.J.R. Meteorol. Soc., 141: 807-818.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Portegies Zwart, Simon; Boekholt, Tjarda
2014-04-10
The conservation of energy, linear momentum, and angular momentum are important drivers of our physical understanding of the evolution of the universe. These quantities are also conserved in Newton's laws of motion under gravity. Numerical integration of the associated equations of motion is extremely challenging, in particular due to the steady growth of numerical errors (by round-off and discrete time-stepping and the exponential divergence between two nearby solutions. As a result, numerical solutions to the general N-body problem are intrinsically questionable. Using brute force integrations to arbitrary numerical precision we demonstrate empirically that ensembles of different realizations of resonant three-bodymore » interactions produce statistically indistinguishable results. Although individual solutions using common integration methods are notoriously unreliable, we conjecture that an ensemble of approximate three-body solutions accurately represents an ensemble of true solutions, so long as the energy during integration is conserved to better than 1/10. We therefore provide an independent confirmation that previous work on self-gravitating systems can actually be trusted, irrespective of the intrinsically chaotic nature of the N-body problem.« less
Improving ECG Classification Accuracy Using an Ensemble of Neural Network Modules
Javadi, Mehrdad; Ebrahimpour, Reza; Sajedin, Atena; Faridi, Soheil; Zakernejad, Shokoufeh
2011-01-01
This paper illustrates the use of a combined neural network model based on Stacked Generalization method for classification of electrocardiogram (ECG) beats. In conventional Stacked Generalization method, the combiner learns to map the base classifiers' outputs to the target data. We claim adding the input pattern to the base classifiers' outputs helps the combiner to obtain knowledge about the input space and as the result, performs better on the same task. Experimental results support our claim that the additional knowledge according to the input space, improves the performance of the proposed method which is called Modified Stacked Generalization. In particular, for classification of 14966 ECG beats that were not previously seen during training phase, the Modified Stacked Generalization method reduced the error rate for 12.41% in comparison with the best of ten popular classifier fusion methods including Max, Min, Average, Product, Majority Voting, Borda Count, Decision Templates, Weighted Averaging based on Particle Swarm Optimization and Stacked Generalization. PMID:22046232
Similarity Measures for Protein Ensembles
Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper
2009-01-01
Analyses of similarities and changes in protein conformation can provide important information regarding protein function and evolution. Many scores, including the commonly used root mean square deviation, have therefore been developed to quantify the similarities of different protein conformations. However, instead of examining individual conformations it is in many cases more relevant to analyse ensembles of conformations that have been obtained either through experiments or from methods such as molecular dynamics simulations. We here present three approaches that can be used to compare conformational ensembles in the same way as the root mean square deviation is used to compare individual pairs of structures. The methods are based on the estimation of the probability distributions underlying the ensembles and subsequent comparison of these distributions. We first validate the methods using a synthetic example from molecular dynamics simulations. We then apply the algorithms to revisit the problem of ensemble averaging during structure determination of proteins, and find that an ensemble refinement method is able to recover the correct distribution of conformations better than standard single-molecule refinement. PMID:19145244
A Statistical Description of Neural Ensemble Dynamics
Long, John D.; Carmena, Jose M.
2011-01-01
The growing use of multi-channel neural recording techniques in behaving animals has produced rich datasets that hold immense potential for advancing our understanding of how the brain mediates behavior. One limitation of these techniques is they do not provide important information about the underlying anatomical connections among the recorded neurons within an ensemble. Inferring these connections is often intractable because the set of possible interactions grows exponentially with ensemble size. This is a fundamental challenge one confronts when interpreting these data. Unfortunately, the combination of expert knowledge and ensemble data is often insufficient for selecting a unique model of these interactions. Our approach shifts away from modeling the network diagram of the ensemble toward analyzing changes in the dynamics of the ensemble as they relate to behavior. Our contribution consists of adapting techniques from signal processing and Bayesian statistics to track the dynamics of ensemble data on time-scales comparable with behavior. We employ a Bayesian estimator to weigh prior information against the available ensemble data, and use an adaptive quantization technique to aggregate poorly estimated regions of the ensemble data space. Importantly, our method is capable of detecting changes in both the magnitude and structure of correlations among neurons missed by firing rate metrics. We show that this method is scalable across a wide range of time-scales and ensemble sizes. Lastly, the performance of this method on both simulated and real ensemble data is used to demonstrate its utility. PMID:22319486
Wang, Xueyi; Davidson, Nicholas J.
2011-01-01
Ensemble methods have been widely used to improve prediction accuracy over individual classifiers. In this paper, we achieve a few results about the prediction accuracies of ensemble methods for binary classification that are missed or misinterpreted in previous literature. First we show the upper and lower bounds of the prediction accuracies (i.e. the best and worst possible prediction accuracies) of ensemble methods. Next we show that an ensemble method can achieve > 0.5 prediction accuracy, while individual classifiers have < 0.5 prediction accuracies. Furthermore, for individual classifiers with different prediction accuracies, the average of the individual accuracies determines the upper and lower bounds. We perform two experiments to verify the results and show that it is hard to achieve the upper and lower bounds accuracies by random individual classifiers and better algorithms need to be developed. PMID:21853162
SSAGES: Software Suite for Advanced General Ensemble Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sidky, Hythem; Colón, Yamil J.; Helfferich, Julian
Molecular simulation has emerged as an essential tool for modern-day research, but obtaining proper results and making reliable conclusions from simulations requires adequate sampling of the system under consideration. To this end, a variety of methods exist in the literature that can enhance sampling considerably, and increasingly sophisticated, effective algorithms continue to be developed at a rapid pace. Implementation of these techniques, however, can be challenging for experts and non-experts alike. There is a clear need for software that provides rapid, reliable, and easy access to a wide range of advanced sampling methods, and that facilitates implementation of new techniquesmore » as they emerge. Here we present SSAGES, a publicly available Software Suite for Advanced General Ensemble Simulations designed to interface with multiple widely used molecular dynamics simulations packages. SSAGES allows facile application of a variety of enhanced sampling techniques—including adaptive biasing force, string methods, and forward flux sampling—that extract meaningful free energy and transition path data from all-atom and coarse grained simulations. A noteworthy feature of SSAGES is a user-friendly framework that facilitates further development and implementation of new methods and collective variables. In this work, the use of SSAGES is illustrated in the context of simple representative applications involving distinct methods and different collective variables that are available in the current release of the suite.« less
SSAGES: Software Suite for Advanced General Ensemble Simulations.
Sidky, Hythem; Colón, Yamil J; Helfferich, Julian; Sikora, Benjamin J; Bezik, Cody; Chu, Weiwei; Giberti, Federico; Guo, Ashley Z; Jiang, Xikai; Lequieu, Joshua; Li, Jiyuan; Moller, Joshua; Quevillon, Michael J; Rahimi, Mohammad; Ramezani-Dakhel, Hadi; Rathee, Vikramjit S; Reid, Daniel R; Sevgen, Emre; Thapar, Vikram; Webb, Michael A; Whitmer, Jonathan K; de Pablo, Juan J
2018-01-28
Molecular simulation has emerged as an essential tool for modern-day research, but obtaining proper results and making reliable conclusions from simulations requires adequate sampling of the system under consideration. To this end, a variety of methods exist in the literature that can enhance sampling considerably, and increasingly sophisticated, effective algorithms continue to be developed at a rapid pace. Implementation of these techniques, however, can be challenging for experts and non-experts alike. There is a clear need for software that provides rapid, reliable, and easy access to a wide range of advanced sampling methods and that facilitates implementation of new techniques as they emerge. Here we present SSAGES, a publicly available Software Suite for Advanced General Ensemble Simulations designed to interface with multiple widely used molecular dynamics simulations packages. SSAGES allows facile application of a variety of enhanced sampling techniques-including adaptive biasing force, string methods, and forward flux sampling-that extract meaningful free energy and transition path data from all-atom and coarse-grained simulations. A noteworthy feature of SSAGES is a user-friendly framework that facilitates further development and implementation of new methods and collective variables. In this work, the use of SSAGES is illustrated in the context of simple representative applications involving distinct methods and different collective variables that are available in the current release of the suite. The code may be found at: https://github.com/MICCoM/SSAGES-public.
SSAGES: Software Suite for Advanced General Ensemble Simulations
NASA Astrophysics Data System (ADS)
Sidky, Hythem; Colón, Yamil J.; Helfferich, Julian; Sikora, Benjamin J.; Bezik, Cody; Chu, Weiwei; Giberti, Federico; Guo, Ashley Z.; Jiang, Xikai; Lequieu, Joshua; Li, Jiyuan; Moller, Joshua; Quevillon, Michael J.; Rahimi, Mohammad; Ramezani-Dakhel, Hadi; Rathee, Vikramjit S.; Reid, Daniel R.; Sevgen, Emre; Thapar, Vikram; Webb, Michael A.; Whitmer, Jonathan K.; de Pablo, Juan J.
2018-01-01
Molecular simulation has emerged as an essential tool for modern-day research, but obtaining proper results and making reliable conclusions from simulations requires adequate sampling of the system under consideration. To this end, a variety of methods exist in the literature that can enhance sampling considerably, and increasingly sophisticated, effective algorithms continue to be developed at a rapid pace. Implementation of these techniques, however, can be challenging for experts and non-experts alike. There is a clear need for software that provides rapid, reliable, and easy access to a wide range of advanced sampling methods and that facilitates implementation of new techniques as they emerge. Here we present SSAGES, a publicly available Software Suite for Advanced General Ensemble Simulations designed to interface with multiple widely used molecular dynamics simulations packages. SSAGES allows facile application of a variety of enhanced sampling techniques—including adaptive biasing force, string methods, and forward flux sampling—that extract meaningful free energy and transition path data from all-atom and coarse-grained simulations. A noteworthy feature of SSAGES is a user-friendly framework that facilitates further development and implementation of new methods and collective variables. In this work, the use of SSAGES is illustrated in the context of simple representative applications involving distinct methods and different collective variables that are available in the current release of the suite. The code may be found at: https://github.com/MICCoM/SSAGES-public.
Kumar, Sanjeev; Karmeshu
2018-04-01
A theoretical investigation is presented that characterizes the emerging sub-threshold membrane potential and inter-spike interval (ISI) distributions of an ensemble of IF neurons that group together and fire together. The squared-noise intensity σ 2 of the ensemble of neurons is treated as a random variable to account for the electrophysiological variations across population of nearly identical neurons. Employing superstatistical framework, both ISI distribution and sub-threshold membrane potential distribution of neuronal ensemble are obtained in terms of generalized K-distribution. The resulting distributions exhibit asymptotic behavior akin to stretched exponential family. Extensive simulations of the underlying SDE with random σ 2 are carried out. The results are found to be in excellent agreement with the analytical results. The analysis has been extended to cover the case corresponding to independent random fluctuations in drift in addition to random squared-noise intensity. The novelty of the proposed analytical investigation for the ensemble of IF neurons is that it yields closed form expressions of probability distributions in terms of generalized K-distribution. Based on a record of spiking activity of thousands of neurons, the findings of the proposed model are validated. The squared-noise intensity σ 2 of identified neurons from the data is found to follow gamma distribution. The proposed generalized K-distribution is found to be in excellent agreement with that of empirically obtained ISI distribution of neuronal ensemble. Copyright © 2018 Elsevier B.V. All rights reserved.
Comparing generalized ensemble methods for sampling of systems with many degrees of freedom
Lincoff, James; Sasmal, Sukanya; Head-Gordon, Teresa
2016-11-03
Here, we compare two standard replica exchange methods using temperature and dielectric constant as the scaling variables for independent replicas against two new corresponding enhanced sampling methods based on non-equilibrium statistical cooling (temperature) or descreening (dielectric). We test the four methods on a rough 1D potential as well as for alanine dipeptide in water, for which their relatively small phase space allows for the ability to define quantitative convergence metrics. We show that both dielectric methods are inferior to the temperature enhanced sampling methods, and in turn show that temperature cool walking (TCW) systematically outperforms the standard temperature replica exchangemore » (TREx) method. We extend our comparisons of the TCW and TREx methods to the 5 residue met-enkephalin peptide, in which we evaluate the Kullback-Leibler divergence metric to show that the rate of convergence between two independent trajectories is faster for TCW compared to TREx. Finally we apply the temperature methods to the 42 residue amyloid-β peptide in which we find non-negligible differences in the disordered ensemble using TCW compared to the standard TREx. All four methods have been made available as software through the OpenMM Omnia software consortium.« less
Comparing generalized ensemble methods for sampling of systems with many degrees of freedom.
Lincoff, James; Sasmal, Sukanya; Head-Gordon, Teresa
2016-11-07
We compare two standard replica exchange methods using temperature and dielectric constant as the scaling variables for independent replicas against two new corresponding enhanced sampling methods based on non-equilibrium statistical cooling (temperature) or descreening (dielectric). We test the four methods on a rough 1D potential as well as for alanine dipeptide in water, for which their relatively small phase space allows for the ability to define quantitative convergence metrics. We show that both dielectric methods are inferior to the temperature enhanced sampling methods, and in turn show that temperature cool walking (TCW) systematically outperforms the standard temperature replica exchange (TREx) method. We extend our comparisons of the TCW and TREx methods to the 5 residue met-enkephalin peptide, in which we evaluate the Kullback-Leibler divergence metric to show that the rate of convergence between two independent trajectories is faster for TCW compared to TREx. Finally we apply the temperature methods to the 42 residue amyloid-β peptide in which we find non-negligible differences in the disordered ensemble using TCW compared to the standard TREx. All four methods have been made available as software through the OpenMM Omnia software consortium (http://www.omnia.md/).
Conformational ensembles of RNA oligonucleotides from integrating NMR and molecular simulations.
Bottaro, Sandro; Bussi, Giovanni; Kennedy, Scott D; Turner, Douglas H; Lindorff-Larsen, Kresten
2018-05-01
RNA molecules are key players in numerous cellular processes and are characterized by a complex relationship between structure, dynamics, and function. Despite their apparent simplicity, RNA oligonucleotides are very flexible molecules, and understanding their internal dynamics is particularly challenging using experimental data alone. We show how to reconstruct the conformational ensemble of four RNA tetranucleotides by combining atomistic molecular dynamics simulations with nuclear magnetic resonance spectroscopy data. The goal is achieved by reweighting simulations using a maximum entropy/Bayesian approach. In this way, we overcome problems of current simulation methods, as well as in interpreting ensemble- and time-averaged experimental data. We determine the populations of different conformational states by considering several nuclear magnetic resonance parameters and point toward properties that are not captured by state-of-the-art molecular force fields. Although our approach is applied on a set of model systems, it is fully general and may be used to study the conformational dynamics of flexible biomolecules and to detect inaccuracies in molecular dynamics force fields.
NASA Astrophysics Data System (ADS)
Durner, Maximilian; Márton, Zoltán.; Hillenbrand, Ulrich; Ali, Haider; Kleinsteuber, Martin
2017-03-01
In this work, a new ensemble method for the task of category recognition in different environments is presented. The focus is on service robotic perception in an open environment, where the robot's task is to recognize previously unseen objects of predefined categories, based on training on a public dataset. We propose an ensemble learning approach to be able to flexibly combine complementary sources of information (different state-of-the-art descriptors computed on color and depth images), based on a Markov Random Field (MRF). By exploiting its specific characteristics, the MRF ensemble method can also be executed as a Dynamic Classifier Selection (DCS) system. In the experiments, the committee- and topology-dependent performance boost of our ensemble is shown. Despite reduced computational costs and using less information, our strategy performs on the same level as common ensemble approaches. Finally, the impact of large differences between datasets is analyzed.
Concrete ensemble Kalman filters with rigorous catastrophic filter divergence
Kelly, David; Majda, Andrew J.; Tong, Xin T.
2015-01-01
The ensemble Kalman filter and ensemble square root filters are data assimilation methods used to combine high-dimensional, nonlinear dynamical models with observed data. Ensemble methods are indispensable tools in science and engineering and have enjoyed great success in geophysical sciences, because they allow for computationally cheap low-ensemble-state approximation for extremely high-dimensional turbulent forecast models. From a theoretical perspective, the dynamical properties of these methods are poorly understood. One of the central mysteries is the numerical phenomenon known as catastrophic filter divergence, whereby ensemble-state estimates explode to machine infinity, despite the true state remaining in a bounded region. In this article we provide a breakthrough insight into the phenomenon, by introducing a simple and natural forecast model that transparently exhibits catastrophic filter divergence under all ensemble methods and a large set of initializations. For this model, catastrophic filter divergence is not an artifact of numerical instability, but rather a true dynamical property of the filter. The divergence is not only validated numerically but also proven rigorously. The model cleanly illustrates mechanisms that give rise to catastrophic divergence and confirms intuitive accounts of the phenomena given in past literature. PMID:26261335
Concrete ensemble Kalman filters with rigorous catastrophic filter divergence.
Kelly, David; Majda, Andrew J; Tong, Xin T
2015-08-25
The ensemble Kalman filter and ensemble square root filters are data assimilation methods used to combine high-dimensional, nonlinear dynamical models with observed data. Ensemble methods are indispensable tools in science and engineering and have enjoyed great success in geophysical sciences, because they allow for computationally cheap low-ensemble-state approximation for extremely high-dimensional turbulent forecast models. From a theoretical perspective, the dynamical properties of these methods are poorly understood. One of the central mysteries is the numerical phenomenon known as catastrophic filter divergence, whereby ensemble-state estimates explode to machine infinity, despite the true state remaining in a bounded region. In this article we provide a breakthrough insight into the phenomenon, by introducing a simple and natural forecast model that transparently exhibits catastrophic filter divergence under all ensemble methods and a large set of initializations. For this model, catastrophic filter divergence is not an artifact of numerical instability, but rather a true dynamical property of the filter. The divergence is not only validated numerically but also proven rigorously. The model cleanly illustrates mechanisms that give rise to catastrophic divergence and confirms intuitive accounts of the phenomena given in past literature.
NASA Astrophysics Data System (ADS)
Bennett, J.; David, R. E.; Wang, Q.; Li, M.; Shrestha, D. L.
2016-12-01
Flood forecasting in Australia has historically relied on deterministic forecasting models run only when floods are imminent, with considerable forecaster input and interpretation. These now co-existed with a continually available 7-day streamflow forecasting service (also deterministic) aimed at operational water management applications such as environmental flow releases. The 7-day service is not optimised for flood prediction. We describe progress on developing a system for ensemble streamflow forecasting that is suitable for both flood prediction and water management applications. Precipitation uncertainty is handled through post-processing of Numerical Weather Prediction (NWP) output with a Bayesian rainfall post-processor (RPP). The RPP corrects biases, downscales NWP output, and produces reliable ensemble spread. Ensemble precipitation forecasts are used to force a semi-distributed conceptual rainfall-runoff model. Uncertainty in precipitation forecasts is insufficient to reliably describe streamflow forecast uncertainty, particularly at shorter lead-times. We characterise hydrological prediction uncertainty separately with a 4-stage error model. The error model relies on data transformation to ensure residuals are homoscedastic and symmetrically distributed. To ensure streamflow forecasts are accurate and reliable, the residuals are modelled using a mixture-Gaussian distribution with distinct parameters for the rising and falling limbs of the forecast hydrograph. In a case study of the Murray River in south-eastern Australia, we show ensemble predictions of floods generally have lower errors than deterministic forecasting methods. We also discuss some of the challenges in operationalising short-term ensemble streamflow forecasts in Australia, including meeting the needs for accurate predictions across all flow ranges and comparing forecasts generated by event and continuous hydrological models.
Confidence-based ensemble for GBM brain tumor segmentation
NASA Astrophysics Data System (ADS)
Huo, Jing; van Rikxoort, Eva M.; Okada, Kazunori; Kim, Hyun J.; Pope, Whitney; Goldin, Jonathan; Brown, Matthew
2011-03-01
It is a challenging task to automatically segment glioblastoma multiforme (GBM) brain tumors on T1w post-contrast isotropic MR images. A semi-automated system using fuzzy connectedness has recently been developed for computing the tumor volume that reduces the cost of manual annotation. In this study, we propose a an ensemble method that combines multiple segmentation results into a final ensemble one. The method is evaluated on a dataset of 20 cases from a multi-center pharmaceutical drug trial and compared to the fuzzy connectedness method. Three individual methods were used in the framework: fuzzy connectedness, GrowCut, and voxel classification. The combination method is a confidence map averaging (CMA) method. The CMA method shows an improved ROC curve compared to the fuzzy connectedness method (p < 0.001). The CMA ensemble result is more robust compared to the three individual methods.
Measuring effective temperatures in a generalized Gibbs ensemble
NASA Astrophysics Data System (ADS)
Foini, Laura; Gambassi, Andrea; Konik, Robert; Cugliandolo, Leticia F.
2017-05-01
The local physical properties of an isolated quantum statistical system in the stationary state reached long after a quench are generically described by the Gibbs ensemble, which involves only its Hamiltonian and the temperature as a parameter. If the system is instead integrable, additional quantities conserved by the dynamics intervene in the description of the stationary state. The resulting generalized Gibbs ensemble involves a number of temperature-like parameters, the determination of which is practically difficult. Here we argue that in a number of simple models these parameters can be effectively determined by using fluctuation-dissipation relationships between response and correlation functions of natural observables, quantities which are accessible in experiments.
Liu, Shuguang; Tan, Zhengxi; Chen, Mingshi; Liu, Jinxun; Wein, Anne; Li, Zhengpeng; Huang, Shengli; Oeding, Jennifer; Young, Claudia; Verma, Shashi B.; Suyker, Andrew E.; Faulkner, Stephen P.
2012-01-01
The General Ensemble Biogeochemical Modeling System (GEMS) was es in individual models, it uses multiple site-scale biogeochemical models to perform model simulations. Second, it adopts Monte Carlo ensemble simulations of each simulation unit (one site/pixel or group of sites/pixels with similar biophysical conditions) to incorporate uncertainties and variability (as measured by variances and covariance) of input variables into model simulations. In this chapter, we illustrate the applications of GEMS at the site and regional scales with an emphasis on incorporating agricultural practices. Challenges in modeling soil carbon dynamics and greenhouse emissions are also discussed.
NASA Astrophysics Data System (ADS)
Foreman-Mackey, Daniel; Hogg, David W.; Lang, Dustin; Goodman, Jonathan
2013-03-01
We introduce a stable, well tested Python implementation of the affine-invariant ensemble sampler for Markov chain Monte Carlo (MCMC) proposed by Goodman & Weare (2010). The code is open source and has already been used in several published projects in the astrophysics literature. The algorithm behind emcee has several advantages over traditional MCMC sampling methods and it has excellent performance as measured by the autocorrelation time (or function calls per independent sample). One major advantage of the algorithm is that it requires hand-tuning of only 1 or 2 parameters compared to ˜N2 for a traditional algorithm in an N-dimensional parameter space. In this document, we describe the algorithm and the details of our implementation. Exploiting the parallelism of the ensemble method, emcee permits any user to take advantage of multiple CPU cores without extra effort. The code is available online at http://dan.iel.fm/emcee under the GNU General Public License v2.
Syndrome source coding and its universal generalization
NASA Technical Reports Server (NTRS)
Ancheta, T. C., Jr.
1975-01-01
A method of using error-correcting codes to obtain data compression, called syndrome-source-coding, is described in which the source sequence is treated as an error pattern whose syndrome forms the compressed data. It is shown that syndrome-source-coding can achieve arbitrarily small distortion with the number of compressed digits per source digit arbitrarily close to the entropy of a binary memoryless source. A universal generalization of syndrome-source-coding is formulated which provides robustly-effective, distortionless, coding of source ensembles.
Reciprocity in directed networks
NASA Astrophysics Data System (ADS)
Yin, Mei; Zhu, Lingjiong
2016-04-01
Reciprocity is an important characteristic of directed networks and has been widely used in the modeling of World Wide Web, email, social, and other complex networks. In this paper, we take a statistical physics point of view and study the limiting entropy and free energy densities from the microcanonical ensemble, the canonical ensemble, and the grand canonical ensemble whose sufficient statistics are given by edge and reciprocal densities. The sparse case is also studied for the grand canonical ensemble. Extensions to more general reciprocal models including reciprocal triangle and star densities will likewise be discussed.
Filatov, Michael; Liu, Fang; Martínez, Todd J.
2017-07-21
The state-averaged (SA) spin restricted ensemble referenced Kohn-Sham (REKS) method and its state interaction (SI) extension, SI-SA-REKS, enable one to describe correctly the shape of the ground and excited potential energy surfaces of molecules undergoing bond breaking/bond formation reactions including features such as conical intersections crucial for theoretical modeling of non-adiabatic reactions. Until recently, application of the SA-REKS and SI-SA-REKS methods to modeling the dynamics of such reactions was obstructed due to the lack of the analytical energy derivatives. Here, the analytical derivatives of the individual SA-REKS and SI-SA-REKS energies are derived. The final analytic gradient expressions are formulated entirelymore » in terms of traces of matrix products and are presented in the form convenient for implementation in the traditional quantum chemical codes employing basis set expansions of the molecular orbitals. Finally, we will describe the implementation and benchmarking of the derived formalism in a subsequent article of this series.« less
NASA Astrophysics Data System (ADS)
Yu, Wansik; Nakakita, Eiichi; Kim, Sunmin; Yamaguchi, Kosei
2016-08-01
The use of meteorological ensembles to produce sets of hydrological predictions increased the capability to issue flood warnings. However, space scale of the hydrological domain is still much finer than meteorological model, and NWP models have challenges with displacement. The main objective of this study to enhance the transposition method proposed in Yu et al. (2014) and to suggest the post-processing ensemble flood forecasting method for the real-time updating and the accuracy improvement of flood forecasts that considers the separation of the orographic rainfall and the correction of misplaced rain distributions using additional ensemble information through the transposition of rain distributions. In the first step of the proposed method, ensemble forecast rainfalls from a numerical weather prediction (NWP) model are separated into orographic and non-orographic rainfall fields using atmospheric variables and the extraction of topographic effect. Then the non-orographic rainfall fields are examined by the transposition scheme to produce additional ensemble information and new ensemble NWP rainfall fields are calculated by recombining the transposition results of non-orographic rain fields with separated orographic rainfall fields for a generation of place-corrected ensemble information. Then, the additional ensemble information is applied into a hydrologic model for post-flood forecasting with a 6-h interval. The newly proposed method has a clear advantage to improve the accuracy of mean value of ensemble flood forecasting. Our study is carried out and verified using the largest flood event by typhoon 'Talas' of 2011 over the two catchments, which are Futatsuno (356.1 km2) and Nanairo (182.1 km2) dam catchments of Shingu river basin (2360 km2), which is located in the Kii peninsula, Japan.
Encoding of Spatial Attention by Primate Prefrontal Cortex Neuronal Ensembles
Treue, Stefan
2018-01-01
Abstract Single neurons in the primate lateral prefrontal cortex (LPFC) encode information about the allocation of visual attention and the features of visual stimuli. However, how this compares to the performance of neuronal ensembles at encoding the same information is poorly understood. Here, we recorded the responses of neuronal ensembles in the LPFC of two macaque monkeys while they performed a task that required attending to one of two moving random dot patterns positioned in different hemifields and ignoring the other pattern. We found single units selective for the location of the attended stimulus as well as for its motion direction. To determine the coding of both variables in the population of recorded units, we used a linear classifier and progressively built neuronal ensembles by iteratively adding units according to their individual performance (best single units), or by iteratively adding units based on their contribution to the ensemble performance (best ensemble). For both methods, ensembles of relatively small sizes (n < 60) yielded substantially higher decoding performance relative to individual single units. However, the decoder reached similar performance using fewer neurons with the best ensemble building method compared with the best single units method. Our results indicate that neuronal ensembles within the LPFC encode more information about the attended spatial and nonspatial features of visual stimuli than individual neurons. They further suggest that efficient coding of attention can be achieved by relatively small neuronal ensembles characterized by a certain relationship between signal and noise correlation structures. PMID:29568798
Generalized Pauli constraints in reduced density matrix functional theory.
Theophilou, Iris; Lathiotakis, Nektarios N; Marques, Miguel A L; Helbig, Nicole
2015-04-21
Functionals of the one-body reduced density matrix (1-RDM) are routinely minimized under Coleman's ensemble N-representability conditions. Recently, the topic of pure-state N-representability conditions, also known as generalized Pauli constraints, received increased attention following the discovery of a systematic way to derive them for any number of electrons and any finite dimensionality of the Hilbert space. The target of this work is to assess the potential impact of the enforcement of the pure-state conditions on the results of reduced density-matrix functional theory calculations. In particular, we examine whether the standard minimization of typical 1-RDM functionals under the ensemble N-representability conditions violates the pure-state conditions for prototype 3-electron systems. We also enforce the pure-state conditions, in addition to the ensemble ones, for the same systems and functionals and compare the correlation energies and optimal occupation numbers with those obtained by the enforcement of the ensemble conditions alone.
Ensemble Data Assimilation Without Ensembles: Methodology and Application to Ocean Data Assimilation
NASA Technical Reports Server (NTRS)
Keppenne, Christian L.; Rienecker, Michele M.; Kovach, Robin M.; Vernieres, Guillaume
2013-01-01
Two methods to estimate background error covariances for data assimilation are introduced. While both share properties with the ensemble Kalman filter (EnKF), they differ from it in that they do not require the integration of multiple model trajectories. Instead, all the necessary covariance information is obtained from a single model integration. The first method is referred-to as SAFE (Space Adaptive Forecast error Estimation) because it estimates error covariances from the spatial distribution of model variables within a single state vector. It can thus be thought of as sampling an ensemble in space. The second method, named FAST (Flow Adaptive error Statistics from a Time series), constructs an ensemble sampled from a moving window along a model trajectory. The underlying assumption in these methods is that forecast errors in data assimilation are primarily phase errors in space and/or time.
2013-01-01
The ability to interact with different partners is one of the most important features in proteins. Proteins that bind a large number of partners (hubs) have been often associated with intrinsic disorder. However, many examples exist of hubs with an ordered structure, and evidence of a general mechanism promoting promiscuity in ordered proteins is still elusive. An intriguing hypothesis is that promiscuous binding sites have specific dynamical properties, distinct from the rest of the interface and pre-existing in the protein isolated state. Here, we present the first comprehensive study of the intrinsic dynamics of promiscuous residues in a large protein data set. Different computational methods, from coarse-grained elastic models to geometry-based sampling methods and to full-atom Molecular Dynamics simulations, were used to generate conformational ensembles for the isolated proteins. The flexibility and dynamic correlations of interface residues with a different degree of binding promiscuity were calculated and compared considering side chain and backbone motions, the latter both on a local and on a global scale. The study revealed that (a) promiscuous residues tend to be more flexible than nonpromiscuous ones, (b) this additional flexibility has a higher degree of organization, and (c) evolutionary conservation and binding promiscuity have opposite effects on intrinsic dynamics. Findings on simulated ensembles were also validated on ensembles of experimental structures extracted from the Protein Data Bank (PDB). Additionally, the low occurrence of single nucleotide polymorphisms observed for promiscuous residues indicated a tendency to preserve binding diversity at these positions. A case study on two ubiquitin-like proteins exemplifies how binding promiscuity in evolutionary related proteins can be modulated by the fine-tuning of the interface dynamics. The interplay between promiscuity and flexibility highlighted here can inspire new directions in protein–protein interaction prediction and design methods. PMID:24250278
NASA Astrophysics Data System (ADS)
Lahmiri, S.; Boukadoum, M.
2015-10-01
Accurate forecasting of stock market volatility is an important issue in portfolio risk management. In this paper, an ensemble system for stock market volatility is presented. It is composed of three different models that hybridize the exponential generalized autoregressive conditional heteroscedasticity (GARCH) process and the artificial neural network trained with the backpropagation algorithm (BPNN) to forecast stock market volatility under normal, t-Student, and generalized error distribution (GED) assumption separately. The goal is to design an ensemble system where each single hybrid model is capable to capture normality, excess skewness, or excess kurtosis in the data to achieve complementarity. The performance of each EGARCH-BPNN and the ensemble system is evaluated by the closeness of the volatility forecasts to realized volatility. Based on mean absolute error and mean of squared errors, the experimental results show that proposed ensemble model used to capture normality, skewness, and kurtosis in data is more accurate than the individual EGARCH-BPNN models in forecasting the S&P 500 intra-day volatility based on one and five-minute time horizons data.
Spam comments prediction using stacking with ensemble learning
NASA Astrophysics Data System (ADS)
Mehmood, Arif; On, Byung-Won; Lee, Ingyu; Ashraf, Imran; Choi, Gyu Sang
2018-01-01
Illusive comments of product or services are misleading for people in decision making. The current methodologies to predict deceptive comments are concerned for feature designing with single training model. Indigenous features have ability to show some linguistic phenomena but are hard to reveal the latent semantic meaning of the comments. We propose a prediction model on general features of documents using stacking with ensemble learning. Term Frequency/Inverse Document Frequency (TF/IDF) features are inputs to stacking of Random Forest and Gradient Boosted Trees and the outputs of the base learners are encapsulated with decision tree to make final training of the model. The results exhibits that our approach gives the accuracy of 92.19% which outperform the state-of-the-art method.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-05-18
... Cooling Method for Protective Clothing Ensembles AGENCY: Department of the Army, DoD. ACTION: Notice... Protective Clothing Ensembles,'' filed March 30, 2010. The United States Government, as represented by the... to a two- stage evaporative cooling method for use in protective clothing ensembles. Brenda S. Bowen...
Optimizing inhomogeneous spin ensembles for quantum memory
NASA Astrophysics Data System (ADS)
Bensky, Guy; Petrosyan, David; Majer, Johannes; Schmiedmayer, Jörg; Kurizki, Gershon
2012-07-01
We propose a method to maximize the fidelity of quantum memory implemented by a spectrally inhomogeneous spin ensemble. The method is based on preselecting the optimal spectral portion of the ensemble by judiciously designed pulses. This leads to significant improvement of the transfer and storage of quantum information encoded in the microwave or optical field.
Grand canonical ensemble Monte Carlo simulation of the dCpG/proflavine crystal hydrate.
Resat, H; Mezei, M
1996-09-01
The grand canonical ensemble Monte Carlo molecular simulation method is used to investigate hydration patterns in the crystal hydrate structure of the dCpG/proflavine intercalated complex. The objective of this study is to show by example that the recently advocated grand canonical ensemble simulation is a computationally efficient method for determining the positions of the hydrating water molecules in protein and nucleic acid structures. A detailed molecular simulation convergence analysis and an analogous comparison of the theoretical results with experiments clearly show that the grand ensemble simulations can be far more advantageous than the comparable canonical ensemble simulations.
Stabilizing canonical-ensemble calculations in the auxiliary-field Monte Carlo method
NASA Astrophysics Data System (ADS)
Gilbreth, C. N.; Alhassid, Y.
2015-03-01
Quantum Monte Carlo methods are powerful techniques for studying strongly interacting Fermi systems. However, implementing these methods on computers with finite-precision arithmetic requires careful attention to numerical stability. In the auxiliary-field Monte Carlo (AFMC) method, low-temperature or large-model-space calculations require numerically stabilized matrix multiplication. When adapting methods used in the grand-canonical ensemble to the canonical ensemble of fixed particle number, the numerical stabilization increases the number of required floating-point operations for computing observables by a factor of the size of the single-particle model space, and thus can greatly limit the systems that can be studied. We describe an improved method for stabilizing canonical-ensemble calculations in AFMC that exhibits better scaling, and present numerical tests that demonstrate the accuracy and improved performance of the method.
NASA Astrophysics Data System (ADS)
Xing, Wanqiu; Wang, Weiguang; Shao, Quanxi; Peng, Shizhang; Yu, Zhongbo; Yong, Bin; Taylor, John
2014-04-01
As the most excellent indicator for hydrological cycle and a central link to water-balance calculations, the reference evapotranspiration (ET0) is of increasing importance in assessing the potential impacts of climate change on hydrology and water resources systems since the climate change has been becoming more pronounced. In this study, we conduct an investigation on the spatial and temporal changes in ET0 of the Haihe River Basin in present and future stages. The ET0 in the past five decades (1961-2010) are calculated by the Penman-Monteith method with historical climatic variables in 40 sites while the ET0 estimation for the future period of 2011-2099 is based on the related climatic variables projected by Coupled General Circulation Model (CGCM) multimodel ensemble projections in Phase 3 of the Coupled Model Intercomparison Project (CMIP3) using the Bayesian Model Average (BMA) approach. Results can be summarized for the present and future as follows. (1) No coherent spatial patterns in ET0 changes are seen in the whole basin. Half of the stations distributed mainly in the eastern and southeastern plain regions present significant negative trends, while only 3 stations in the western mountainous and plateau basin show significant positive trends. Radiation is mainly responsible for the ET0 change in the southern and eastern basin, whereas relative humidity and wind speed are the leading factors in the eastern coastal and north parts. (2) BMA ensemble method is competent to produce lower bias in comparison with other common methods in this basin. Future spatiotemporal ET0 pattern analysis by means of the BMA method based on the ensembles of four CGCMs suggested that although the spatial patterns under three scenarios are different in the forthcoming two decades, generally increasing trends can be found in the 21st century, which is mainly attributed to the significant increasing temperature. In addition, the implication of future ET0 change in agriculture and local water resources is discussed as an extension of this work. The results can provide beneficial reference and comprehensive information to understand the impact of climate change on the future water balance and improve the regional strategy for water resource and eco-environment management in the Haihe River Basin.
A simple new filter for nonlinear high-dimensional data assimilation
NASA Astrophysics Data System (ADS)
Tödter, Julian; Kirchgessner, Paul; Ahrens, Bodo
2015-04-01
The ensemble Kalman filter (EnKF) and its deterministic variants, mostly square root filters such as the ensemble transform Kalman filter (ETKF), represent a popular alternative to variational data assimilation schemes and are applied in a wide range of operational and research activities. Their forecast step employs an ensemble integration that fully respects the nonlinear nature of the analyzed system. In the analysis step, they implicitly assume the prior state and observation errors to be Gaussian. Consequently, in nonlinear systems, the analysis mean and covariance are biased, and these filters remain suboptimal. In contrast, the fully nonlinear, non-Gaussian particle filter (PF) only relies on Bayes' theorem, which guarantees an exact asymptotic behavior, but because of the so-called curse of dimensionality it is exposed to weight collapse. This work shows how to obtain a new analysis ensemble whose mean and covariance exactly match the Bayesian estimates. This is achieved by a deterministic matrix square root transformation of the forecast ensemble, and subsequently a suitable random rotation that significantly contributes to filter stability while preserving the required second-order statistics. The forecast step remains as in the ETKF. The proposed algorithm, which is fairly easy to implement and computationally efficient, is referred to as the nonlinear ensemble transform filter (NETF). The properties and performance of the proposed algorithm are investigated via a set of Lorenz experiments. They indicate that such a filter formulation can increase the analysis quality, even for relatively small ensemble sizes, compared to other ensemble filters in nonlinear, non-Gaussian scenarios. Furthermore, localization enhances the potential applicability of this PF-inspired scheme in larger-dimensional systems. Finally, the novel algorithm is coupled to a large-scale ocean general circulation model. The NETF is stable, behaves reasonably and shows a good performance with a realistic ensemble size. The results confirm that, in principle, it can be applied successfully and as simple as the ETKF in high-dimensional problems without further modifications of the algorithm, even though it is only based on the particle weights. This proves that the suggested method constitutes a useful filter for nonlinear, high-dimensional data assimilation, and is able to overcome the curse of dimensionality even in deterministic systems.
Simulating the Generalized Gibbs Ensemble (GGE): A Hilbert space Monte Carlo approach
NASA Astrophysics Data System (ADS)
Alba, Vincenzo
By combining classical Monte Carlo and Bethe ansatz techniques we devise a numerical method to construct the Truncated Generalized Gibbs Ensemble (TGGE) for the spin-1/2 isotropic Heisenberg (XXX) chain. The key idea is to sample the Hilbert space of the model with the appropriate GGE probability measure. The method can be extended to other integrable systems, such as the Lieb-Liniger model. We benchmark the approach focusing on GGE expectation values of several local observables. As finite-size effects decay exponentially with system size, moderately large chains are sufficient to extract thermodynamic quantities. The Monte Carlo results are in agreement with both the Thermodynamic Bethe Ansatz (TBA) and the Quantum Transfer Matrix approach (QTM). Remarkably, it is possible to extract in a simple way the steady-state Bethe-Gaudin-Takahashi (BGT) roots distributions, which encode complete information about the GGE expectation values in the thermodynamic limit. Finally, it is straightforward to simulate extensions of the GGE, in which, besides the local integral of motion (local charges), one includes arbitrary functions of the BGT roots. As an example, we include in the GGE the first non-trivial quasi-local integral of motion.
Chaotic jumps in the generalized first adiabatic invariant in current sheets
NASA Technical Reports Server (NTRS)
Brittnacher, M. J.; Whipple, E. C.
1991-01-01
The present study examines how the changes in the generalized first adiabatic invariant J derived from the separatrix crossing theory can be incorporated into the drift variable approach to generating distribution functions. A method is proposed for determining distribution functions for an ensemble of particles following interaction with the tail current sheet by treating the interaction as a scattering problem characterized by changes in the invariant. Generalized drift velocities are obtained for a 1D tail configuration by using the generalized first invariant. The invariant remained constant except for the discrete changes caused by chaotic scattering as the particles cross the separatrix.
A stacking ensemble learning framework for annual river ice breakup dates
NASA Astrophysics Data System (ADS)
Sun, Wei; Trevor, Bernard
2018-06-01
River ice breakup dates (BDs) are not merely a proxy indicator of climate variability and change, but a direct concern in the management of local ice-caused flooding. A framework of stacking ensemble learning for annual river ice BDs was developed, which included two-level components: member and combining models. The member models described the relations between BD and their affecting indicators; the combining models linked the predicted BD by each member models with the observed BD. Especially, Bayesian regularization back-propagation artificial neural network (BRANN), and adaptive neuro fuzzy inference systems (ANFIS) were employed as both member and combining models. The candidate combining models also included the simple average methods (SAM). The input variables for member models were selected by a hybrid filter and wrapper method. The performances of these models were examined using the leave-one-out cross validation. As the largest unregulated river in Alberta, Canada with ice jams frequently occurring in the vicinity of Fort McMurray, the Athabasca River at Fort McMurray was selected as the study area. The breakup dates and candidate affecting indicators in 1980-2015 were collected. The results showed that, the BRANN member models generally outperformed the ANFIS member models in terms of better performances and simpler structures. The difference between the R and MI rankings of inputs in the optimal member models may imply that the linear correlation based filter method would be feasible to generate a range of candidate inputs for further screening through other wrapper or embedded IVS methods. The SAM and BRANN combining models generally outperformed all member models. The optimal SAM combining model combined two BRANN member models and improved upon them in terms of average squared errors by 14.6% and 18.1% respectively. In this study, for the first time, the stacking ensemble learning was applied to forecasting of river ice breakup dates, which appeared promising for other river ice forecasting problems.
Computational scheme for pH-dependent binding free energy calculation with explicit solvent.
Lee, Juyong; Miller, Benjamin T; Brooks, Bernard R
2016-01-01
We present a computational scheme to compute the pH-dependence of binding free energy with explicit solvent. Despite the importance of pH, the effect of pH has been generally neglected in binding free energy calculations because of a lack of accurate methods to model it. To address this limitation, we use a constant-pH methodology to obtain a true ensemble of multiple protonation states of a titratable system at a given pH and analyze the ensemble using the Bennett acceptance ratio (BAR) method. The constant pH method is based on the combination of enveloping distribution sampling (EDS) with the Hamiltonian replica exchange method (HREM), which yields an accurate semi-grand canonical ensemble of a titratable system. By considering the free energy change of constraining multiple protonation states to a single state or releasing a single protonation state to multiple states, the pH dependent binding free energy profile can be obtained. We perform benchmark simulations of a host-guest system: cucurbit[7]uril (CB[7]) and benzimidazole (BZ). BZ experiences a large pKa shift upon complex formation. The pH-dependent binding free energy profiles of the benchmark system are obtained with three different long-range interaction calculation schemes: a cutoff, the particle mesh Ewald (PME), and the isotropic periodic sum (IPS) method. Our scheme captures the pH-dependent behavior of binding free energy successfully. Absolute binding free energy values obtained with the PME and IPS methods are consistent, while cutoff method results are off by 2 kcal mol(-1) . We also discuss the characteristics of three long-range interaction calculation methods for constant-pH simulations. © 2015 The Protein Society.
Kingsley, Laura J.; Lill, Markus A.
2014-01-01
Computational prediction of ligand entry and egress paths in proteins has become an emerging topic in computational biology and has proven useful in fields such as protein engineering and drug design. Geometric tunnel prediction programs, such as Caver3.0 and MolAxis, are computationally efficient methods to identify potential ligand entry and egress routes in proteins. Although many geometric tunnel programs are designed to accommodate a single input structure, the increasingly recognized importance of protein flexibility in tunnel formation and behavior has led to the more widespread use of protein ensembles in tunnel prediction. However, there has not yet been an attempt to directly investigate the influence of ensemble size and composition on geometric tunnel prediction. In this study, we compared tunnels found in a single crystal structure to ensembles of various sizes generated using different methods on both the apo and holo forms of cytochrome P450 enzymes CYP119, CYP2C9, and CYP3A4. Several protein structure clustering methods were tested in an attempt to generate smaller ensembles that were capable of reproducing the data from larger ensembles. Ultimately, we found that by including members from both the apo and holo data sets, we could produce ensembles containing less than 15 members that were comparable to apo or holo ensembles containing over 100 members. Furthermore, we found that, in the absence of either apo or holo crystal structure data, pseudo-apo or –holo ensembles (e.g. adding ligand to apo protein throughout MD simulations) could be used to resemble the structural ensembles of the corresponding apo and holo ensembles, respectively. Our findings not only further highlight the importance of including protein flexibility in geometric tunnel prediction, but also suggest that smaller ensembles can be as capable as larger ensembles at capturing many of the protein motions important for tunnel prediction at a lower computational cost. PMID:24956479
A Hyper-Heuristic Ensemble Method for Static Job-Shop Scheduling.
Hart, Emma; Sim, Kevin
2016-01-01
We describe a new hyper-heuristic method NELLI-GP for solving job-shop scheduling problems (JSSP) that evolves an ensemble of heuristics. The ensemble adopts a divide-and-conquer approach in which each heuristic solves a unique subset of the instance set considered. NELLI-GP extends an existing ensemble method called NELLI by introducing a novel heuristic generator that evolves heuristics composed of linear sequences of dispatching rules: each rule is represented using a tree structure and is itself evolved. Following a training period, the ensemble is shown to outperform both existing dispatching rules and a standard genetic programming algorithm on a large set of new test instances. In addition, it obtains superior results on a set of 210 benchmark problems from the literature when compared to two state-of-the-art hyper-heuristic approaches. Further analysis of the relationship between heuristics in the evolved ensemble and the instances each solves provides new insights into features that might describe similar instances.
NASA Astrophysics Data System (ADS)
Higo, Junichi; Umezawa, Koji; Nakamura, Haruki
2013-05-01
We propose a novel generalized ensemble method, a virtual-system coupled multicanonical molecular dynamics (V-McMD), to enhance conformational sampling of biomolecules expressed by an all-atom model in an explicit solvent. In this method, a virtual system, of which physical quantities can be set arbitrarily, is coupled with the biomolecular system, which is the target to be studied. This method was applied to a system of an Endothelin-1 derivative, KR-CSH-ET1, known to form an antisymmetric homodimer at room temperature. V-McMD was performed starting from a configuration in which two KR-CSH-ET1 molecules were mutually distant in an explicit solvent. The lowest free-energy state (the most thermally stable state) at room temperature coincides with the experimentally determined native complex structure. This state was separated to other non-native minor clusters by a free-energy barrier, although the barrier disappeared with elevated temperature. V-McMD produced a canonical ensemble faster than a conventional McMD method.
Measuring effective temperatures in a generalized Gibbs ensemble
Foini, Laura; Gambassi, Andrea; Konik, Robert; ...
2017-05-11
The local physical properties of an isolated quantum statistical system in the stationary state reached long after a quench are generically described by the Gibbs ensemble, which involves only its Hamiltonian and the temperature as a parameter. Additional quantities conserved by the dynamics intervene in the description of the stationary state, if the system is instead integrable. The resulting generalized Gibbs ensemble involves a number of temperature-like parameters, the determination of which is practically difficult. We argue that in a number of simple models these parameters can be effectively determined by using fluctuation-dissipation relationships between response and correlation functions ofmore » natural observables, quantities which are accessible in experiments.« less
Application of Generalized Feynman-Hellmann Theorem in Quantization of LC Circuit in Thermo Bath
NASA Astrophysics Data System (ADS)
Fan, Hong-Yi; Tang, Xu-Bing
For the quantized LC electric circuit, when taking the Joule thermal effect into account, we think that physical observables should be evaluated in the context of ensemble average. We then use the generalized Feynman-Hellmann theorem for ensemble average to calculate them, which seems convenient. Fluctuation of observables in various LC electric circuits in the presence of thermo bath growing with temperature is exhibited.
Forecasting European cold waves based on subsampling strategies of CMIP5 and Euro-CORDEX ensembles
NASA Astrophysics Data System (ADS)
Cordero-Llana, Laura; Braconnot, Pascale; Vautard, Robert; Vrac, Mathieu; Jezequel, Aglae
2016-04-01
Forecasting future extreme events under the present changing climate represents a difficult task. Currently there are a large number of ensembles of simulations for climate projections that take in account different models and scenarios. However, there is a need for reducing the size of the ensemble to make the interpretation of these simulations more manageable for impact studies or climate risk assessment. This can be achieved by developing subsampling strategies to identify a limited number of simulations that best represent the ensemble. In this study, cold waves are chosen to test different approaches for subsampling available simulations. The definition of cold waves depends on the criteria used, but they are generally defined using a minimum temperature threshold, the duration of the cold spell as well as their geographical extend. These climate indicators are not universal, highlighting the difficulty of directly comparing different studies. As part of the of the CLIPC European project, we use daily surface temperature data obtained from CMIP5 outputs as well as Euro-CORDEX simulations to predict future cold waves events in Europe. From these simulations a clustering method is applied to minimise the number of ensembles required. Furthermore, we analyse the different uncertainties that arise from the different model characteristics and definitions of climate indicators. Finally, we will test if the same subsampling strategy can be used for different climate indicators. This will facilitate the use of the subsampling results for a wide number of impact assessment studies.
Entanglement with negative Wigner function of almost 3,000 atoms heralded by one photon.
McConnell, Robert; Zhang, Hao; Hu, Jiazhong; Ćuk, Senka; Vuletić, Vladan
2015-03-26
Quantum-mechanically correlated (entangled) states of many particles are of interest in quantum information, quantum computing and quantum metrology. Metrologically useful entangled states of large atomic ensembles have been experimentally realized, but these states display Gaussian spin distribution functions with a non-negative Wigner quasiprobability distribution function. Non-Gaussian entangled states have been produced in small ensembles of ions, and very recently in large atomic ensembles. Here we generate entanglement in a large atomic ensemble via an interaction with a very weak laser pulse; remarkably, the detection of a single photon prepares several thousand atoms in an entangled state. We reconstruct a negative-valued Wigner function--an important hallmark of non-classicality--and verify an entanglement depth (the minimum number of mutually entangled atoms) of 2,910 ± 190 out of 3,100 atoms. Attaining such a negative Wigner function and the mutual entanglement of virtually all atoms is unprecedented for an ensemble containing more than a few particles. Although the achieved purity of the state is slightly below the threshold for entanglement-induced metrological gain, further technical improvement should allow the generation of states that surpass this threshold, and of more complex Schrödinger cat states for quantum metrology and information processing. More generally, our results demonstrate the power of heralded methods for entanglement generation, and illustrate how the information contained in a single photon can drastically alter the quantum state of a large system.
NASA Astrophysics Data System (ADS)
Abaza, Mabrouk; Anctil, François; Fortin, Vincent; Perreault, Luc
2017-12-01
Meteorological and hydrological ensemble prediction systems are imperfect. Their outputs could often be improved through the use of a statistical processor, opening up the question of the necessity of using both processors (meteorological and hydrological), only one of them, or none. This experiment compares the predictive distributions from four hydrological ensemble prediction systems (H-EPS) utilising the Ensemble Kalman filter (EnKF) probabilistic sequential data assimilation scheme. They differ in the inclusion or not of the Distribution Based Scaling (DBS) method for post-processing meteorological forecasts and the ensemble Bayesian Model Averaging (ensemble BMA) method for hydrological forecast post-processing. The experiment is implemented on three large watersheds and relies on the combination of two meteorological reforecast products: the 4-member Canadian reforecasts from the Canadian Centre for Meteorological and Environmental Prediction (CCMEP) and the 10-member American reforecasts from the National Oceanic and Atmospheric Administration (NOAA), leading to 14 members at each time step. Results show that all four tested H-EPS lead to resolution and sharpness values that are quite similar, with an advantage to DBS + EnKF. The ensemble BMA is unable to compensate for any bias left in the precipitation ensemble forecasts. On the other hand, it succeeds in calibrating ensemble members that are otherwise under-dispersed. If reliability is preferred over resolution and sharpness, DBS + EnKF + ensemble BMA performs best, making use of both processors in the H-EPS system. Conversely, for enhanced resolution and sharpness, DBS is the preferred method.
Bayesian ensemble refinement by replica simulations and reweighting.
Hummer, Gerhard; Köfinger, Jürgen
2015-12-28
We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.
Bayesian ensemble refinement by replica simulations and reweighting
NASA Astrophysics Data System (ADS)
Hummer, Gerhard; Köfinger, Jürgen
2015-12-01
We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.
Yang, Shan; Al-Hashimi, Hashim M.
2016-01-01
A growing number of studies employ time-averaged experimental data to determine dynamic ensembles of biomolecules. While it is well known that different ensembles can satisfy experimental data to within error, the extent and nature of these degeneracies, and their impact on the accuracy of the ensemble determination remains poorly understood. Here, we use simulations and a recently introduced metric for assessing ensemble similarity to explore degeneracies in determining ensembles using NMR residual dipolar couplings (RDCs) with specific application to A-form helices in RNA. Various target ensembles were constructed representing different domain-domain orientational distributions that are confined to a topologically restricted (<10%) conformational space. Five independent sets of ensemble averaged RDCs were then computed for each target ensemble and a ‘sample and select’ scheme used to identify degenerate ensembles that satisfy RDCs to within experimental uncertainty. We find that ensembles with different ensemble sizes and that can differ significantly from the target ensemble (by as much as ΣΩ ~ 0.4 where ΣΩ varies between 0 and 1 for maximum and minimum ensemble similarity, respectively) can satisfy the ensemble averaged RDCs. These deviations increase with the number of unique conformers and breadth of the target distribution, and result in significant uncertainty in determining conformational entropy (as large as 5 kcal/mol at T = 298 K). Nevertheless, the RDC-degenerate ensembles are biased towards populated regions of the target ensemble, and capture other essential features of the distribution, including the shape. Our results identify ensemble size as a major source of uncertainty in determining ensembles and suggest that NMR interactions such as RDCs and spin relaxation, on their own, do not carry the necessary information needed to determine conformational entropy at a useful level of precision. The framework introduced here provides a general approach for exploring degeneracies in ensemble determination for different types of experimental data. PMID:26131693
Constructing better classifier ensemble based on weighted accuracy and diversity measure.
Zeng, Xiaodong; Wong, Derek F; Chao, Lidia S
2014-01-01
A weighted accuracy and diversity (WAD) method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In fact, accuracy and diversity are mutual restraint factors; that is, an ensemble with high accuracy may have low diversity, and an overly diverse ensemble may negatively affect accuracy. This study proposes a method to find the balance between accuracy and diversity that enhances the predictive ability of an ensemble for unknown data. The quality assessment for an ensemble is performed such that the final score is achieved by computing the harmonic mean of accuracy and diversity, where two weight parameters are used to balance them. The measure is compared to two representative measures, Kappa-Error and GenDiv, and two threshold measures that consider only accuracy or diversity, with two heuristic search algorithms, genetic algorithm, and forward hill-climbing algorithm, in ensemble selection tasks performed on 15 UCI benchmark datasets. The empirical results demonstrate that the WAD measure is superior to others in most cases.
Constructing Better Classifier Ensemble Based on Weighted Accuracy and Diversity Measure
Chao, Lidia S.
2014-01-01
A weighted accuracy and diversity (WAD) method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In fact, accuracy and diversity are mutual restraint factors; that is, an ensemble with high accuracy may have low diversity, and an overly diverse ensemble may negatively affect accuracy. This study proposes a method to find the balance between accuracy and diversity that enhances the predictive ability of an ensemble for unknown data. The quality assessment for an ensemble is performed such that the final score is achieved by computing the harmonic mean of accuracy and diversity, where two weight parameters are used to balance them. The measure is compared to two representative measures, Kappa-Error and GenDiv, and two threshold measures that consider only accuracy or diversity, with two heuristic search algorithms, genetic algorithm, and forward hill-climbing algorithm, in ensemble selection tasks performed on 15 UCI benchmark datasets. The empirical results demonstrate that the WAD measure is superior to others in most cases. PMID:24672402
An Ensemble Successive Project Algorithm for Liquor Detection Using Near Infrared Sensor.
Qu, Fangfang; Ren, Dong; Wang, Jihua; Zhang, Zhong; Lu, Na; Meng, Lei
2016-01-11
Spectral analysis technique based on near infrared (NIR) sensor is a powerful tool for complex information processing and high precision recognition, and it has been widely applied to quality analysis and online inspection of agricultural products. This paper proposes a new method to address the instability of small sample sizes in the successive projections algorithm (SPA) as well as the lack of association between selected variables and the analyte. The proposed method is an evaluated bootstrap ensemble SPA method (EBSPA) based on a variable evaluation index (EI) for variable selection, and is applied to the quantitative prediction of alcohol concentrations in liquor using NIR sensor. In the experiment, the proposed EBSPA with three kinds of modeling methods are established to test their performance. In addition, the proposed EBSPA combined with partial least square is compared with other state-of-the-art variable selection methods. The results show that the proposed method can solve the defects of SPA and it has the best generalization performance and stability. Furthermore, the physical meaning of the selected variables from the near infrared sensor data is clear, which can effectively reduce the variables and improve their prediction accuracy.
NASA Astrophysics Data System (ADS)
Otto, F. E. L.; Haustein, K.; Uhe, P.; Massey, N.; Rimi, R.; Allen, M. R.; Cullen, H. M.
2016-12-01
Extreme weather event attribution has become an accepted part of the atmospheric sciences with numerous methods having been put forward over the last decade. We have recently established a new framework which allows for event attribution in quasi-real-time. Here we present the methodology with which we can assess the fraction of attributable risk (FAR) of a severe weather event due to an external driver (Haustein et al. 2016). The method builds on a large ensemble of atmosphere-only GCM simulations forced by seasonal forecast SSTs (actual conditions) that are contrasted with ensembles forced by counterfactual SSTs (natural conditions). Having an associated 30 year actual and natural climatology in place, we are able to put the current event into a climatological context and determine the dynamic contribution that lead to the event as opposed to the thermodynamic contribution which would have made such an event more likely regardless of the synoptic situation. As a second independent method (also applicable in near-real-time), we apply pattern correlation to separate thermodynamic and dynamic contributions. Finally, using reanalysis data, we test whether our attributed dynamic contribution is also detectable in the observations. Despite the high monthly variability, ENSO related teleconnection patterns can be detected fairly robustly as we will demonstrate with a recent example during El Nino. The more consistent the 3 methods are, the more robust our results will be. We note that the choice of time scale matters a lot when determining the dynamic contribution as well as estimating the FAR (Uhe et al. 2016). The weather@home ensemble prediction approach is accompanied by two more methods based on observational data and the CMIP5 ensemble. If the FAR across 3 methods is consistent, we have reason to trust our central attribution statement. Two recent examples will be shown in order to demonstrate the feasibility (van Oldenborgh et al. 2016a/2016b), complemented by new results from South Asia where we also investigate the effects of anthropogenic aerosols.
Vorontsov, Ivan I; Miyashita, Osamu
2011-04-30
Complexes of two Cyanovirin-N (CVN) mutants, m4-CVN and P51G-m4-CVN, with deoxy di-mannose analogs were employed as models to generate conformational ensembles using explicit water Molecular Dynamics (MD) simulations in solution and in crystal environment. The results were utilized for evaluation of binding free energies with the molecular mechanics Poisson-Boltzmann (or Generalized Born) surface area, MM/PB(GB)SA, methods. The calculations provided the ranking of deoxy di-mannose ligands affinity in agreement with available qualitative experimental evidences. This confirms the importance of the hydrogen-bond network between di-mannose 3'- and 4'-hydroxyl groups and the protein binding site B(M) as a basis of the CVN activity as an effective HIV fusion inhibitor. Comparison of binding free energies averaged over snapshots from the solution and crystal simulations showed high promises in the use of the crystal matrix for acceleration of the conformational ensemble generation, the most time consuming step in MM/PB(GB)SA approach. Correlation between energy values based on solution versus crystal ensembles is 0.95 for both MM/PBSA and MM/GBSA methods. Copyright © 2010 Wiley Periodicals, Inc.
Grand canonical ensemble Monte Carlo simulation of the dCpG/proflavine crystal hydrate.
Resat, H; Mezei, M
1996-01-01
The grand canonical ensemble Monte Carlo molecular simulation method is used to investigate hydration patterns in the crystal hydrate structure of the dCpG/proflavine intercalated complex. The objective of this study is to show by example that the recently advocated grand canonical ensemble simulation is a computationally efficient method for determining the positions of the hydrating water molecules in protein and nucleic acid structures. A detailed molecular simulation convergence analysis and an analogous comparison of the theoretical results with experiments clearly show that the grand ensemble simulations can be far more advantageous than the comparable canonical ensemble simulations. Images FIGURE 5 FIGURE 7 PMID:8873992
NASA Astrophysics Data System (ADS)
Lu, F.; Liu, Z.; Liu, Y.; Zhang, S.; Jacob, R. L.
2017-12-01
The Regional Coupled Data Assimilation (RCDA) method is introduced as a tool to study coupled climate dynamics and teleconnections. The RCDA method is built on an ensemble-based coupled data assimilation (CDA) system in a coupled general circulation model (CGCM). The RCDA method limits the data assimilation to the desired model components (e.g. atmosphere) and regions (e.g. the extratropics), and studies the ensemble-mean model response (e.g. tropical response to "observed" extratropical atmospheric variability). When applied to the extratropical influence on tropical climate, the RCDA method has shown some unique advantages, namely the combination of a fully coupled model, real-world observations and an ensemble approach. Tropical variability (e.g. El Niño-Southern Oscillation or ENSO) and climatology (e.g. asymmetric Inter-Tropical Convergence Zone or ITCZ) were initially thought to be determined mostly by local forcing and ocean-atmosphere interaction in the tropics. Since late 20th century, numerous studies have showed that extratropical forcing could affect, or even largely determine some aspects of the tropical climate. Due to the coupled nature of the climate system, however, the challenge of determining and further quantifying the causality of extratropical forcing on the tropical climate remains. Using the RCDA method, we have demonstrated significant control of extratropical atmospheric forcing on ENSO variability in a CGCM, both with model-generated and real-world observation datasets. The RCDA method has also shown robust extratropical impact on the tropical double-ITCZ bias in a CGCM. The RCDA method has provided the first systematic and quantitative assessment of extratropical influence on tropical climatology and variability by incorporating real world observations in a CGCM.
NASA Astrophysics Data System (ADS)
Niu, Mingfei; Wang, Yufang; Sun, Shaolong; Li, Yongwu
2016-06-01
To enhance prediction reliability and accuracy, a hybrid model based on the promising principle of "decomposition and ensemble" and a recently proposed meta-heuristic called grey wolf optimizer (GWO) is introduced for daily PM2.5 concentration forecasting. Compared with existing PM2.5 forecasting methods, this proposed model has improved the prediction accuracy and hit rates of directional prediction. The proposed model involves three main steps, i.e., decomposing the original PM2.5 series into several intrinsic mode functions (IMFs) via complementary ensemble empirical mode decomposition (CEEMD) for simplifying the complex data; individually predicting each IMF with support vector regression (SVR) optimized by GWO; integrating all predicted IMFs for the ensemble result as the final prediction by another SVR optimized by GWO. Seven benchmark models, including single artificial intelligence (AI) models, other decomposition-ensemble models with different decomposition methods and models with the same decomposition-ensemble method but optimized by different algorithms, are considered to verify the superiority of the proposed hybrid model. The empirical study indicates that the proposed hybrid decomposition-ensemble model is remarkably superior to all considered benchmark models for its higher prediction accuracy and hit rates of directional prediction.
Zhang, Li; Ai, Haixin; Chen, Wen; Yin, Zimo; Hu, Huan; Zhu, Junfeng; Zhao, Jian; Zhao, Qi; Liu, Hongsheng
2017-05-18
Carcinogenicity refers to a highly toxic end point of certain chemicals, and has become an important issue in the drug development process. In this study, three novel ensemble classification models, namely Ensemble SVM, Ensemble RF, and Ensemble XGBoost, were developed to predict carcinogenicity of chemicals using seven types of molecular fingerprints and three machine learning methods based on a dataset containing 1003 diverse compounds with rat carcinogenicity. Among these three models, Ensemble XGBoost is found to be the best, giving an average accuracy of 70.1 ± 2.9%, sensitivity of 67.0 ± 5.0%, and specificity of 73.1 ± 4.4% in five-fold cross-validation and an accuracy of 70.0%, sensitivity of 65.2%, and specificity of 76.5% in external validation. In comparison with some recent methods, the ensemble models outperform some machine learning-based approaches and yield equal accuracy and higher specificity but lower sensitivity than rule-based expert systems. It is also found that the ensemble models could be further improved if more data were available. As an application, the ensemble models are employed to discover potential carcinogens in the DrugBank database. The results indicate that the proposed models are helpful in predicting the carcinogenicity of chemicals. A web server called CarcinoPred-EL has been built for these models ( http://ccsipb.lnu.edu.cn/toxicity/CarcinoPred-EL/ ).
Image Change Detection via Ensemble Learning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Martin, Benjamin W; Vatsavai, Raju
2013-01-01
The concept of geographic change detection is relevant in many areas. Changes in geography can reveal much information about a particular location. For example, analysis of changes in geography can identify regions of population growth, change in land use, and potential environmental disturbance. A common way to perform change detection is to use a simple method such as differencing to detect regions of change. Though these techniques are simple, often the application of these techniques is very limited. Recently, use of machine learning methods such as neural networks for change detection has been explored with great success. In this work,more » we explore the use of ensemble learning methodologies for detecting changes in bitemporal synthetic aperture radar (SAR) images. Ensemble learning uses a collection of weak machine learning classifiers to create a stronger classifier which has higher accuracy than the individual classifiers in the ensemble. The strength of the ensemble lies in the fact that the individual classifiers in the ensemble create a mixture of experts in which the final classification made by the ensemble classifier is calculated from the outputs of the individual classifiers. Our methodology leverages this aspect of ensemble learning by training collections of weak decision tree based classifiers to identify regions of change in SAR images collected of a region in the Staten Island, New York area during Hurricane Sandy. Preliminary studies show that the ensemble method has approximately 11.5% higher change detection accuracy than an individual classifier.« less
Simulating large-scale crop yield by using perturbed-parameter ensemble method
NASA Astrophysics Data System (ADS)
Iizumi, T.; Yokozawa, M.; Sakurai, G.; Nishimori, M.
2010-12-01
Toshichika Iizumi, Masayuki Yokozawa, Gen Sakurai, Motoki Nishimori Agro-Meteorology Division, National Institute for Agro-Environmental Sciences, Japan Abstract One of concerning issues of food security under changing climate is to predict the inter-annual variation of crop production induced by climate extremes and modulated climate. To secure food supply for growing world population, methodology that can accurately predict crop yield on a large scale is needed. However, for developing a process-based large-scale crop model with a scale of general circulation models (GCMs), 100 km in latitude and longitude, researchers encounter the difficulties in spatial heterogeneity of available information on crop production such as cultivated cultivars and management. This study proposed an ensemble-based simulation method that uses a process-based crop model and systematic parameter perturbation procedure, taking maize in U.S., China, and Brazil as examples. The crop model was developed modifying the fundamental structure of the Soil and Water Assessment Tool (SWAT) to incorporate the effect of heat stress on yield. We called the new model PRYSBI: the Process-based Regional-scale Yield Simulator with Bayesian Inference. The posterior probability density function (PDF) of 17 parameters, which represents the crop- and grid-specific features of the crop and its uncertainty under given data, was estimated by the Bayesian inversion analysis. We then take 1500 ensemble members of simulated yield values based on the parameter sets sampled from the posterior PDF to describe yearly changes of the yield, i.e. perturbed-parameter ensemble method. The ensemble median for 27 years (1980-2006) was compared with the data aggregated from the county yield. On a country scale, the ensemble median of the simulated yield showed a good correspondence with the reported yield: the Pearson’s correlation coefficient is over 0.6 for all countries. In contrast, on a grid scale, the correspondence is still high in most grids regardless of the countries. However, the model showed comparatively low reproducibility in the slope areas, such as around the Rocky Mountains in South Dakota, around the Great Xing'anling Mountains in Heilongjiang, and around the Brazilian Plateau. As there is a wide-ranging local climate conditions in the complex terrain, such as the slope of mountain, the GCM grid-scale weather inputs is likely one of major sources of error. The results of this study highlight the benefits of the perturbed-parameter ensemble method in simulating crop yield on a GCM grid scale: (1) the posterior PDF of parameter could quantify the uncertainty of parameter value of the crop model associated with the local crop production aspects; (2) the method can explicitly account for the uncertainty of parameter value in the crop model simulations; (3) the method achieve a Monte Carlo approximation of probability of sub-grid scale yield, accounting for the nonlinear response of crop yield to weather and management; (4) the method is therefore appropriate to aggregate the simulated sub-grid scale yields to a grid-scale yield and it may be a reason for high performance of the model in capturing inter-annual variation of yield.
Liu, Jing; Zhao, Songzheng; Wang, Gang
2018-01-01
With the development of Web 2.0 technology, social media websites have become lucrative but under-explored data sources for extracting adverse drug events (ADEs), which is a serious health problem. Besides ADE, other semantic relation types (e.g., drug indication and beneficial effect) could hold between the drug and adverse event mentions, making ADE relation extraction - distinguishing ADE relationship from other relation types - necessary. However, conducting ADE relation extraction in social media environment is not a trivial task because of the expertise-dependent, time-consuming and costly annotation process, and the feature space's high-dimensionality attributed to intrinsic characteristics of social media data. This study aims to develop a framework for ADE relation extraction using patient-generated content in social media with better performance than that delivered by previous efforts. To achieve the objective, a general semi-supervised ensemble learning framework, SSEL-ADE, was developed. The framework exploited various lexical, semantic, and syntactic features, and integrated ensemble learning and semi-supervised learning. A series of experiments were conducted to verify the effectiveness of the proposed framework. Empirical results demonstrate the effectiveness of each component of SSEL-ADE and reveal that our proposed framework outperforms most of existing ADE relation extraction methods The SSEL-ADE can facilitate enhanced ADE relation extraction performance, thereby providing more reliable support for pharmacovigilance. Moreover, the proposed semi-supervised ensemble methods have the potential of being applied to effectively deal with other social media-based problems. Copyright © 2017 Elsevier B.V. All rights reserved.
EnsembleGraph: Interactive Visual Analysis of Spatial-Temporal Behavior for Ensemble Simulation Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shu, Qingya; Guo, Hanqi; Che, Limei
We present a novel visualization framework—EnsembleGraph— for analyzing ensemble simulation data, in order to help scientists understand behavior similarities between ensemble members over space and time. A graph-based representation is used to visualize individual spatiotemporal regions with similar behaviors, which are extracted by hierarchical clustering algorithms. A user interface with multiple-linked views is provided, which enables users to explore, locate, and compare regions that have similar behaviors between and then users can investigate and analyze the selected regions in detail. The driving application of this paper is the studies on regional emission influences over tropospheric ozone, which is based onmore » ensemble simulations conducted with different anthropogenic emission absences using the MOZART-4 (model of ozone and related tracers, version 4) model. We demonstrate the effectiveness of our method by visualizing the MOZART-4 ensemble simulation data and evaluating the relative regional emission influences on tropospheric ozone concentrations. Positive feedbacks from domain experts and two case studies prove efficiency of our method.« less
Impact of ensemble learning in the assessment of skeletal maturity.
Cunha, Pedro; Moura, Daniel C; Guevara López, Miguel Angel; Guerra, Conceição; Pinto, Daniela; Ramos, Isabel
2014-09-01
The assessment of the bone age, or skeletal maturity, is an important task in pediatrics that measures the degree of maturation of children's bones. Nowadays, there is no standard clinical procedure for assessing bone age and the most widely used approaches are the Greulich and Pyle and the Tanner and Whitehouse methods. Computer methods have been proposed to automatize the process; however, there is a lack of exploration about how to combine the features of the different parts of the hand, and how to take advantage of ensemble techniques for this purpose. This paper presents a study where the use of ensemble techniques for improving bone age assessment is evaluated. A new computer method was developed that extracts descriptors for each joint of each finger, which are then combined using different ensemble schemes for obtaining a final bone age value. Three popular ensemble schemes are explored in this study: bagging, stacking and voting. Best results were achieved by bagging with a rule-based regression (M5P), scoring a mean absolute error of 10.16 months. Results show that ensemble techniques improve the prediction performance of most of the evaluated regression algorithms, always achieving best or comparable to best results. Therefore, the success of the ensemble methods allow us to conclude that their use may improve computer-based bone age assessment, offering a scalable option for utilizing multiple regions of interest and combining their output.
Improving Climate Projections Using "Intelligent" Ensembles
NASA Technical Reports Server (NTRS)
Baker, Noel C.; Taylor, Patrick C.
2015-01-01
Recent changes in the climate system have led to growing concern, especially in communities which are highly vulnerable to resource shortages and weather extremes. There is an urgent need for better climate information to develop solutions and strategies for adapting to a changing climate. Climate models provide excellent tools for studying the current state of climate and making future projections. However, these models are subject to biases created by structural uncertainties. Performance metrics-or the systematic determination of model biases-succinctly quantify aspects of climate model behavior. Efforts to standardize climate model experiments and collect simulation data-such as the Coupled Model Intercomparison Project (CMIP)-provide the means to directly compare and assess model performance. Performance metrics have been used to show that some models reproduce present-day climate better than others. Simulation data from multiple models are often used to add value to projections by creating a consensus projection from the model ensemble, in which each model is given an equal weight. It has been shown that the ensemble mean generally outperforms any single model. It is possible to use unequal weights to produce ensemble means, in which models are weighted based on performance (called "intelligent" ensembles). Can performance metrics be used to improve climate projections? Previous work introduced a framework for comparing the utility of model performance metrics, showing that the best metrics are related to the variance of top-of-atmosphere outgoing longwave radiation. These metrics improve present-day climate simulations of Earth's energy budget using the "intelligent" ensemble method. The current project identifies several approaches for testing whether performance metrics can be applied to future simulations to create "intelligent" ensemble-mean climate projections. It is shown that certain performance metrics test key climate processes in the models, and that these metrics can be used to evaluate model quality in both current and future climate states. This information will be used to produce new consensus projections and provide communities with improved climate projections for urgent decision-making.
NASA Astrophysics Data System (ADS)
Zhao, Liang; Xu, Shun; Tu, Yu-Song; Zhou, Xin
2017-06-01
Not Available Project supported by the National Natural Science Foundation for Outstanding Young Scholars, China (Grant No. 11422542), the National Natural Science Foundation of China (Grant Nos. 11605151 and 11675138), and the Shanghai Supercomputer Center of China and Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase).
Insights into the deterministic skill of air quality ensembles ...
Simulations from chemical weather models are subject to uncertainties in the input data (e.g. emission inventory, initial and boundary conditions) as well as those intrinsic to the model (e.g. physical parameterization, chemical mechanism). Multi-model ensembles can improve the forecast skill, provided that certain mathematical conditions are fulfilled. In this work, four ensemble methods were applied to two different datasets, and their performance was compared for ozone (O3), nitrogen dioxide (NO2) and particulate matter (PM10). Apart from the unconditional ensemble average, the approach behind the other three methods relies on adding optimum weights to members or constraining the ensemble to those members that meet certain conditions in time or frequency domain. The two different datasets were created for the first and second phase of the Air Quality Model Evaluation International Initiative (AQMEII). The methods are evaluated against ground level observations collected from the EMEP (European Monitoring and Evaluation Programme) and AirBase databases. The goal of the study is to quantify to what extent we can extract predictable signals from an ensemble with superior skill over the single models and the ensemble mean. Verification statistics show that the deterministic models simulate better O3 than NO2 and PM10, linked to different levels of complexity in the represented processes. The unconditional ensemble mean achieves higher skill compared to each stati
Lysine acetylation sites prediction using an ensemble of support vector machine classifiers.
Xu, Yan; Wang, Xiao-Bo; Ding, Jun; Wu, Ling-Yun; Deng, Nai-Yang
2010-05-07
Lysine acetylation is an essentially reversible and high regulated post-translational modification which regulates diverse protein properties. Experimental identification of acetylation sites is laborious and expensive. Hence, there is significant interest in the development of computational methods for reliable prediction of acetylation sites from amino acid sequences. In this paper we use an ensemble of support vector machine classifiers to perform this work. The experimentally determined acetylation lysine sites are extracted from Swiss-Prot database and scientific literatures. Experiment results show that an ensemble of support vector machine classifiers outperforms single support vector machine classifier and other computational methods such as PAIL and LysAcet on the problem of predicting acetylation lysine sites. The resulting method has been implemented in EnsemblePail, a web server for lysine acetylation sites prediction available at http://www.aporc.org/EnsemblePail/. Copyright (c) 2010 Elsevier Ltd. All rights reserved.
A target recognition method for maritime surveillance radars based on hybrid ensemble selection
NASA Astrophysics Data System (ADS)
Fan, Xueman; Hu, Shengliang; He, Jingbo
2017-11-01
In order to improve the generalisation ability of the maritime surveillance radar, a novel ensemble selection technique, termed Optimisation and Dynamic Selection (ODS), is proposed. During the optimisation phase, the non-dominated sorting genetic algorithm II for multi-objective optimisation is used to find the Pareto front, i.e. a set of ensembles of classifiers representing different tradeoffs between the classification error and diversity. During the dynamic selection phase, the meta-learning method is used to predict whether a candidate ensemble is competent enough to classify a query instance based on three different aspects, namely, feature space, decision space and the extent of consensus. The classification performance and time complexity of ODS are compared against nine other ensemble methods using a self-built full polarimetric high resolution range profile data-set. The experimental results clearly show the effectiveness of ODS. In addition, the influence of the selection of diversity measures is studied concurrently.
Entropy of spatial network ensembles
NASA Astrophysics Data System (ADS)
Coon, Justin P.; Dettmann, Carl P.; Georgiou, Orestis
2018-04-01
We analyze complexity in spatial network ensembles through the lens of graph entropy. Mathematically, we model a spatial network as a soft random geometric graph, i.e., a graph with two sources of randomness, namely nodes located randomly in space and links formed independently between pairs of nodes with probability given by a specified function (the "pair connection function") of their mutual distance. We consider the general case where randomness arises in node positions as well as pairwise connections (i.e., for a given pair distance, the corresponding edge state is a random variable). Classical random geometric graph and exponential graph models can be recovered in certain limits. We derive a simple bound for the entropy of a spatial network ensemble and calculate the conditional entropy of an ensemble given the node location distribution for hard and soft (probabilistic) pair connection functions. Under this formalism, we derive the connection function that yields maximum entropy under general constraints. Finally, we apply our analytical framework to study two practical examples: ad hoc wireless networks and the US flight network. Through the study of these examples, we illustrate that both exhibit properties that are indicative of nearly maximally entropic ensembles.
Dynamics of heterogeneous oscillator ensembles in terms of collective variables
NASA Astrophysics Data System (ADS)
Pikovsky, Arkady; Rosenblum, Michael
2011-04-01
We consider general heterogeneous ensembles of phase oscillators, sine coupled to arbitrary external fields. Starting with the infinitely large ensembles, we extend the Watanabe-Strogatz theory, valid for identical oscillators, to cover the case of an arbitrary parameter distribution. The obtained equations yield the description of the ensemble dynamics in terms of collective variables and constants of motion. As a particular case of the general setup we consider hierarchically organized ensembles, consisting of a finite number of subpopulations, whereas the number of elements in a subpopulation can be both finite or infinite. Next, we link the Watanabe-Strogatz and Ott-Antonsen theories and demonstrate that the latter one corresponds to a particular choice of constants of motion. The approach is applied to the standard Kuramoto-Sakaguchi model, to its extension for the case of nonlinear coupling, and to the description of two interacting subpopulations, exhibiting a chimera state. With these examples we illustrate that, although the asymptotic dynamics can be found within the framework of the Ott-Antonsen theory, the transients depend on the constants of motion. The most dramatic effect is the dependence of the basins of attraction of different synchronous regimes on the initial configuration of phases.
Decadal climate predictions improved by ocean ensemble dispersion filtering
NASA Astrophysics Data System (ADS)
Kadow, C.; Illing, S.; Kröner, I.; Ulbrich, U.; Cubasch, U.
2017-06-01
Decadal predictions by Earth system models aim to capture the state and phase of the climate several years in advance. Atmosphere-ocean interaction plays an important role for such climate forecasts. While short-term weather forecasts represent an initial value problem and long-term climate projections represent a boundary condition problem, the decadal climate prediction falls in-between these two time scales. In recent years, more precise initialization techniques of coupled Earth system models and increased ensemble sizes have improved decadal predictions. However, climate models in general start losing the initialized signal and its predictive skill from one forecast year to the next. Here we show that the climate prediction skill of an Earth system model can be improved by a shift of the ocean state toward the ensemble mean of its individual members at seasonal intervals. We found that this procedure, called ensemble dispersion filter, results in more accurate results than the standard decadal prediction. Global mean and regional temperature, precipitation, and winter cyclone predictions show an increased skill up to 5 years ahead. Furthermore, the novel technique outperforms predictions with larger ensembles and higher resolution. Our results demonstrate how decadal climate predictions benefit from ocean ensemble dispersion filtering toward the ensemble mean.
NASA Astrophysics Data System (ADS)
Verkade, J. S.; Brown, J. D.; Reggiani, P.; Weerts, A. H.
2013-09-01
The ECMWF temperature and precipitation ensemble reforecasts are evaluated for biases in the mean, spread and forecast probabilities, and how these biases propagate to streamflow ensemble forecasts. The forcing ensembles are subsequently post-processed to reduce bias and increase skill, and to investigate whether this leads to improved streamflow ensemble forecasts. Multiple post-processing techniques are used: quantile-to-quantile transform, linear regression with an assumption of bivariate normality and logistic regression. Both the raw and post-processed ensembles are run through a hydrologic model of the river Rhine to create streamflow ensembles. The results are compared using multiple verification metrics and skill scores: relative mean error, Brier skill score and its decompositions, mean continuous ranked probability skill score and its decomposition, and the ROC score. Verification of the streamflow ensembles is performed at multiple spatial scales: relatively small headwater basins, large tributaries and the Rhine outlet at Lobith. The streamflow ensembles are verified against simulated streamflow, in order to isolate the effects of biases in the forcing ensembles and any improvements therein. The results indicate that the forcing ensembles contain significant biases, and that these cascade to the streamflow ensembles. Some of the bias in the forcing ensembles is unconditional in nature; this was resolved by a simple quantile-to-quantile transform. Improvements in conditional bias and skill of the forcing ensembles vary with forecast lead time, amount, and spatial scale, but are generally moderate. The translation to streamflow forecast skill is further muted, and several explanations are considered, including limitations in the modelling of the space-time covariability of the forcing ensembles and the presence of storages.
A comparison of ensemble post-processing approaches that preserve correlation structures
NASA Astrophysics Data System (ADS)
Schefzik, Roman; Van Schaeybroeck, Bert; Vannitsem, Stéphane
2016-04-01
Despite the fact that ensemble forecasts address the major sources of uncertainty, they exhibit biases and dispersion errors and therefore are known to improve by calibration or statistical post-processing. For instance the ensemble model output statistics (EMOS) method, also known as non-homogeneous regression approach (Gneiting et al., 2005) is known to strongly improve forecast skill. EMOS is based on fitting and adjusting a parametric probability density function (PDF). However, EMOS and other common post-processing approaches apply to a single weather quantity at a single location for a single look-ahead time. They are therefore unable of taking into account spatial, inter-variable and temporal dependence structures. Recently many research efforts have been invested in designing post-processing methods that resolve this drawback but also in verification methods that enable the detection of dependence structures. New verification methods are applied on two classes of post-processing methods, both generating physically coherent ensembles. A first class uses the ensemble copula coupling (ECC) that starts from EMOS but adjusts the rank structure (Schefzik et al., 2013). The second class is a member-by-member post-processing (MBM) approach that maps each raw ensemble member to a corrected one (Van Schaeybroeck and Vannitsem, 2015). We compare variants of the EMOS-ECC and MBM classes and highlight a specific theoretical connection between them. All post-processing variants are applied in the context of the ensemble system of the European Centre of Weather Forecasts (ECMWF) and compared using multivariate verification tools including the energy score, the variogram score (Scheuerer and Hamill, 2015) and the band depth rank histogram (Thorarinsdottir et al., 2015). Gneiting, Raftery, Westveld, and Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., {133}, 1098-1118. Scheuerer and Hamill, 2015. Variogram-based proper scoring rules for probabilistic forecasts of multivariate quantities. Mon. Wea. Rev. {143},1321-1334. Schefzik, Thorarinsdottir, Gneiting. Uncertainty quantification in complex simulation models using ensemble copula coupling. Statistical Science {28},616-640, 2013. Thorarinsdottir, M. Scheuerer, and C. Heinz, 2015. Assessing the calibration of high-dimensional ensemble forecasts using rank histograms, arXiv:1310.0236. Van Schaeybroeck and Vannitsem, 2015: Ensemble post-processing using member-by-member approaches: theoretical aspects. Q.J.R. Meteorol. Soc., 141: 807-818.
NASA Astrophysics Data System (ADS)
Che, Yanqiu; Yang, Tingting; Li, Ruixue; Li, Huiyan; Han, Chunxiao; Wang, Jiang; Wei, Xile
2015-09-01
In this paper, we propose a dynamic delayed feedback control approach or desynchronization of chaotic-bursting synchronous activities in an ensemble of globally coupled neuronal oscillators. We demonstrate that the difference signal between an ensemble's mean field and its time delayed state, filtered and fed back to the ensemble, can suppress the self-synchronization in the ensemble. These individual units are decoupled and stabilized at the desired desynchronized states while the stimulation signal reduces to the noise level. The effectiveness of the method is illustrated by examples of two different populations of globally coupled chaotic-bursting neurons. The proposed method has potential for mild, effective and demand-controlled therapy of neurological diseases characterized by pathological synchronization.
Generalized Gibbs ensemble in integrable lattice models
NASA Astrophysics Data System (ADS)
Vidmar, Lev; Rigol, Marcos
2016-06-01
The generalized Gibbs ensemble (GGE) was introduced ten years ago to describe observables in isolated integrable quantum systems after equilibration. Since then, the GGE has been demonstrated to be a powerful tool to predict the outcome of the relaxation dynamics of few-body observables in a variety of integrable models, a process we call generalized thermalization. This review discusses several fundamental aspects of the GGE and generalized thermalization in integrable systems. In particular, we focus on questions such as: which observables equilibrate to the GGE predictions and who should play the role of the bath; what conserved quantities can be used to construct the GGE; what are the differences between generalized thermalization in noninteracting systems and in interacting systems mappable to noninteracting ones; why is it that the GGE works when traditional ensembles of statistical mechanics fail. Despite a lot of interest in these questions in recent years, no definite answers have been given. We review results for the XX model and for the transverse field Ising model. For the latter model, we also report original results and show that the GGE describes spin-spin correlations over the entire system. This makes apparent that there is no need to trace out a part of the system in real space for equilibration to occur and for the GGE to apply. In the past, a spectral decomposition of the weights of various statistical ensembles revealed that generalized eigenstate thermalization occurs in the XX model (hard-core bosons). Namely, eigenstates of the Hamiltonian with similar distributions of conserved quantities have similar expectation values of few-spin observables. Here we show that generalized eigenstate thermalization also occurs in the transverse field Ising model.
Advanced ensemble modelling of flexible macromolecules using X-ray solution scattering.
Tria, Giancarlo; Mertens, Haydyn D T; Kachala, Michael; Svergun, Dmitri I
2015-03-01
Dynamic ensembles of macromolecules mediate essential processes in biology. Understanding the mechanisms driving the function and molecular interactions of 'unstructured' and flexible molecules requires alternative approaches to those traditionally employed in structural biology. Small-angle X-ray scattering (SAXS) is an established method for structural characterization of biological macromolecules in solution, and is directly applicable to the study of flexible systems such as intrinsically disordered proteins and multi-domain proteins with unstructured regions. The Ensemble Optimization Method (EOM) [Bernadó et al. (2007 ▶). J. Am. Chem. Soc. 129, 5656-5664] was the first approach introducing the concept of ensemble fitting of the SAXS data from flexible systems. In this approach, a large pool of macromolecules covering the available conformational space is generated and a sub-ensemble of conformers coexisting in solution is selected guided by the fit to the experimental SAXS data. This paper presents a series of new developments and advancements to the method, including significantly enhanced functionality and also quantitative metrics for the characterization of the results. Building on the original concept of ensemble optimization, the algorithms for pool generation have been redesigned to allow for the construction of partially or completely symmetric oligomeric models, and the selection procedure was improved to refine the size of the ensemble. Quantitative measures of the flexibility of the system studied, based on the characteristic integral parameters of the selected ensemble, are introduced. These improvements are implemented in the new EOM version 2.0, and the capabilities as well as inherent limitations of the ensemble approach in SAXS, and of EOM 2.0 in particular, are discussed.
pE-DB: a database of structural ensembles of intrinsically disordered and of unfolded proteins.
Varadi, Mihaly; Kosol, Simone; Lebrun, Pierre; Valentini, Erica; Blackledge, Martin; Dunker, A Keith; Felli, Isabella C; Forman-Kay, Julie D; Kriwacki, Richard W; Pierattelli, Roberta; Sussman, Joel; Svergun, Dmitri I; Uversky, Vladimir N; Vendruscolo, Michele; Wishart, David; Wright, Peter E; Tompa, Peter
2014-01-01
The goal of pE-DB (http://pedb.vib.be) is to serve as an openly accessible database for the deposition of structural ensembles of intrinsically disordered proteins (IDPs) and of denatured proteins based on nuclear magnetic resonance spectroscopy, small-angle X-ray scattering and other data measured in solution. Owing to the inherent flexibility of IDPs, solution techniques are particularly appropriate for characterizing their biophysical properties, and structural ensembles in agreement with these data provide a convenient tool for describing the underlying conformational sampling. Database entries consist of (i) primary experimental data with descriptions of the acquisition methods and algorithms used for the ensemble calculations, and (ii) the structural ensembles consistent with these data, provided as a set of models in a Protein Data Bank format. PE-DB is open for submissions from the community, and is intended as a forum for disseminating the structural ensembles and the methodologies used to generate them. While the need to represent the IDP structures is clear, methods for determining and evaluating the structural ensembles are still evolving. The availability of the pE-DB database is expected to promote the development of new modeling methods and leads to a better understanding of how function arises from disordered states.
Comparing ensemble learning methods based on decision tree classifiers for protein fold recognition.
Bardsiri, Mahshid Khatibi; Eftekhari, Mahdi
2014-01-01
In this paper, some methods for ensemble learning of protein fold recognition based on a decision tree (DT) are compared and contrasted against each other over three datasets taken from the literature. According to previously reported studies, the features of the datasets are divided into some groups. Then, for each of these groups, three ensemble classifiers, namely, random forest, rotation forest and AdaBoost.M1 are employed. Also, some fusion methods are introduced for combining the ensemble classifiers obtained in the previous step. After this step, three classifiers are produced based on the combination of classifiers of types random forest, rotation forest and AdaBoost.M1. Finally, the three different classifiers achieved are combined to make an overall classifier. Experimental results show that the overall classifier obtained by the genetic algorithm (GA) weighting fusion method, is the best one in comparison to previously applied methods in terms of classification accuracy.
Shallow cumuli ensemble statistics for development of a stochastic parameterization
NASA Astrophysics Data System (ADS)
Sakradzija, Mirjana; Seifert, Axel; Heus, Thijs
2014-05-01
According to a conventional deterministic approach to the parameterization of moist convection in numerical atmospheric models, a given large scale forcing produces an unique response from the unresolved convective processes. This representation leaves out the small-scale variability of convection, as it is known from the empirical studies of deep and shallow convective cloud ensembles, there is a whole distribution of sub-grid states corresponding to the given large scale forcing. Moreover, this distribution gets broader with the increasing model resolution. This behavior is also consistent with our theoretical understanding of a coarse-grained nonlinear system. We propose an approach to represent the variability of the unresolved shallow-convective states, including the dependence of the sub-grid states distribution spread and shape on the model horizontal resolution. Starting from the Gibbs canonical ensemble theory, Craig and Cohen (2006) developed a theory for the fluctuations in a deep convective ensemble. The micro-states of a deep convective cloud ensemble are characterized by the cloud-base mass flux, which, according to the theory, is exponentially distributed (Boltzmann distribution). Following their work, we study the shallow cumulus ensemble statistics and the distribution of the cloud-base mass flux. We employ a Large-Eddy Simulation model (LES) and a cloud tracking algorithm, followed by a conditional sampling of clouds at the cloud base level, to retrieve the information about the individual cloud life cycles and the cloud ensemble as a whole. In the case of shallow cumulus cloud ensemble, the distribution of micro-states is a generalized exponential distribution. Based on the empirical and theoretical findings, a stochastic model has been developed to simulate the shallow convective cloud ensemble and to test the convective ensemble theory. Stochastic model simulates a compound random process, with the number of convective elements drawn from a Poisson distribution, and cloud properties sub-sampled from a generalized ensemble distribution. We study the role of the different cloud subtypes in a shallow convective ensemble and how the diverse cloud properties and cloud lifetimes affect the system macro-state. To what extent does the cloud-base mass flux distribution deviate from the simple Boltzmann distribution and how does it affect the results from the stochastic model? Is the memory, provided by the finite lifetime of individual clouds, of importance for the ensemble statistics? We also test for the minimal information given as an input to the stochastic model, able to reproduce the ensemble mean statistics and the variability in a convective ensemble. An important property of the resulting distribution of the sub-grid convective states is its scale-adaptivity - the smaller the grid-size, the broader the compound distribution of the sub-grid states.
Dehzangi, Abdollah; Paliwal, Kuldip; Sharma, Alok; Dehzangi, Omid; Sattar, Abdul
2013-01-01
Better understanding of structural class of a given protein reveals important information about its overall folding type and its domain. It can also be directly used to provide critical information on general tertiary structure of a protein which has a profound impact on protein function determination and drug design. Despite tremendous enhancements made by pattern recognition-based approaches to solve this problem, it still remains as an unsolved issue for bioinformatics that demands more attention and exploration. In this study, we propose a novel feature extraction model that incorporates physicochemical and evolutionary-based information simultaneously. We also propose overlapped segmented distribution and autocorrelation-based feature extraction methods to provide more local and global discriminatory information. The proposed feature extraction methods are explored for 15 most promising attributes that are selected from a wide range of physicochemical-based attributes. Finally, by applying an ensemble of different classifiers namely, Adaboost.M1, LogitBoost, naive Bayes, multilayer perceptron (MLP), and support vector machine (SVM) we show enhancement of the protein structural class prediction accuracy for four popular benchmarks.
NASA Astrophysics Data System (ADS)
Brekke, L. D.; Clark, M. P.; Gutmann, E. D.; Wood, A.; Mizukami, N.; Mendoza, P. A.; Rasmussen, R.; Ikeda, K.; Pruitt, T.; Arnold, J. R.; Rajagopalan, B.
2015-12-01
Adaptation planning assessments often rely on single methods for climate projection downscaling and hydrologic analysis, do not reveal uncertainties from associated method choices, and thus likely produce overly confident decision-support information. Recent work by the authors has highlighted this issue by identifying strengths and weaknesses of widely applied methods for downscaling climate projections and assessing hydrologic impacts. This work has shown that many of the methodological choices made can alter the magnitude, and even the sign of the climate change signal. Such results motivate consideration of both sources of method uncertainty within an impacts assessment. Consequently, the authors have pursued development of improved downscaling techniques spanning a range of method classes (quasi-dynamical and circulation-based statistical methods) and developed approaches to better account for hydrologic analysis uncertainty (multi-model; regional parameter estimation under forcing uncertainty). This presentation summarizes progress in the development of these methods, as well as implications of pursuing these developments. First, having access to these methods creates an opportunity to better reveal impacts uncertainty through multi-method ensembles, expanding on present-practice ensembles which are often based only on emissions scenarios and GCM choices. Second, such expansion of uncertainty treatment combined with an ever-expanding wealth of global climate projection information creates a challenge of how to use such a large ensemble for local adaptation planning. To address this challenge, the authors are evaluating methods for ensemble selection (considering the principles of fidelity, diversity and sensitivity) that is compatible with present-practice approaches for abstracting change scenarios from any "ensemble of opportunity". Early examples from this development will also be presented.
The effects of shared information on semantic calculations in the gene ontology.
Bible, Paul W; Sun, Hong-Wei; Morasso, Maria I; Loganantharaj, Rasiah; Wei, Lai
2017-01-01
The structured vocabulary that describes gene function, the gene ontology (GO), serves as a powerful tool in biological research. One application of GO in computational biology calculates semantic similarity between two concepts to make inferences about the functional similarity of genes. A class of term similarity algorithms explicitly calculates the shared information (SI) between concepts then substitutes this calculation into traditional term similarity measures such as Resnik, Lin, and Jiang-Conrath. Alternative SI approaches, when combined with ontology choice and term similarity type, lead to many gene-to-gene similarity measures. No thorough investigation has been made into the behavior, complexity, and performance of semantic methods derived from distinct SI approaches. We apply bootstrapping to compare the generalized performance of 57 gene-to-gene semantic measures across six benchmarks. Considering the number of measures, we additionally evaluate whether these methods can be leveraged through ensemble machine learning to improve prediction performance. Results showed that the choice of ontology type most strongly influenced performance across all evaluations. Combining measures into an ensemble classifier reduces cross-validation error beyond any individual measure for protein interaction prediction. This improvement resulted from information gained through the combination of ontology types as ensemble methods within each GO type offered no improvement. These results demonstrate that multiple SI measures can be leveraged for machine learning tasks such as automated gene function prediction by incorporating methods from across the ontologies. To facilitate future research in this area, we developed the GO Graph Tool Kit (GGTK), an open source C++ library with Python interface (github.com/paulbible/ggtk).
Infinitely dilute partial molar properties of proteins from computer simulation.
Ploetz, Elizabeth A; Smith, Paul E
2014-11-13
A detailed understanding of temperature and pressure effects on an infinitely dilute protein's conformational equilibrium requires knowledge of the corresponding infinitely dilute partial molar properties. Established molecular dynamics methodologies generally have not provided a way to calculate these properties without either a loss of thermodynamic rigor, the introduction of nonunique parameters, or a loss of information about which solute conformations specifically contributed to the output values. Here we implement a simple method that is thermodynamically rigorous and possesses none of the above disadvantages, and we report on the method's feasibility and computational demands. We calculate infinitely dilute partial molar properties for two proteins and attempt to distinguish the thermodynamic differences between a native and a denatured conformation of a designed miniprotein. We conclude that simple ensemble average properties can be calculated with very reasonable amounts of computational power. In contrast, properties corresponding to fluctuating quantities are computationally demanding to calculate precisely, although they can be obtained more easily by following the temperature and/or pressure dependence of the corresponding ensemble averages.
Algorithms that Defy the Gravity of Learning Curve
2017-04-28
three nearest neighbour-based anomaly detectors, i.e., an ensemble of nearest neigh- bours, a recent nearest neighbour-based ensemble method called iNNE...streams. Note that the change in sample size does not alter the geometrical data characteristics discussed here. 3.1 Experimental Methodology ...need to be answered. 3.6 Comparison with conventional ensemble methods Given the theoretical results, the third aim of this project (i.e., identify the
Fire spread estimation on forest wildfire using ensemble kalman filter
NASA Astrophysics Data System (ADS)
Syarifah, Wardatus; Apriliani, Erna
2018-04-01
Wildfire is one of the most frequent disasters in the world, for example forest wildfire, causing population of forest decrease. Forest wildfire, whether naturally occurring or prescribed, are potential risks for ecosystems and human settlements. These risks can be managed by monitoring the weather, prescribing fires to limit available fuel, and creating firebreaks. With computer simulations we can predict and explore how fires may spread. The model of fire spread on forest wildfire was established to determine the fire properties. The fire spread model is prepared based on the equation of the diffusion reaction model. There are many methods to estimate the spread of fire. The Kalman Filter Ensemble Method is a modified estimation method of the Kalman Filter algorithm that can be used to estimate linear and non-linear system models. In this research will apply Ensemble Kalman Filter (EnKF) method to estimate the spread of fire on forest wildfire. Before applying the EnKF method, the fire spread model will be discreted using finite difference method. At the end, the analysis obtained illustrated by numerical simulation using software. The simulation results show that the Ensemble Kalman Filter method is closer to the system model when the ensemble value is greater, while the covariance value of the system model and the smaller the measurement.
A Bayesian Ensemble Approach for Epidemiological Projections
Lindström, Tom; Tildesley, Michael; Webb, Colleen
2015-01-01
Mathematical models are powerful tools for epidemiology and can be used to compare control actions. However, different models and model parameterizations may provide different prediction of outcomes. In other fields of research, ensemble modeling has been used to combine multiple projections. We explore the possibility of applying such methods to epidemiology by adapting Bayesian techniques developed for climate forecasting. We exemplify the implementation with single model ensembles based on different parameterizations of the Warwick model run for the 2001 United Kingdom foot and mouth disease outbreak and compare the efficacy of different control actions. This allows us to investigate the effect that discrepancy among projections based on different modeling assumptions has on the ensemble prediction. A sensitivity analysis showed that the choice of prior can have a pronounced effect on the posterior estimates of quantities of interest, in particular for ensembles with large discrepancy among projections. However, by using a hierarchical extension of the method we show that prior sensitivity can be circumvented. We further extend the method to include a priori beliefs about different modeling assumptions and demonstrate that the effect of this can have different consequences depending on the discrepancy among projections. We propose that the method is a promising analytical tool for ensemble modeling of disease outbreaks. PMID:25927892
Training set extension for SVM ensemble in P300-speller with familiar face paradigm.
Li, Qi; Shi, Kaiyang; Gao, Ning; Li, Jian; Bai, Ou
2018-03-27
P300-spellers are brain-computer interface (BCI)-based character input systems. Support vector machine (SVM) ensembles are trained with large-scale training sets and used as classifiers in these systems. However, the required large-scale training data necessitate a prolonged collection time for each subject, which results in data collected toward the end of the period being contaminated by the subject's fatigue. This study aimed to develop a method for acquiring more training data based on a collected small training set. A new method was developed in which two corresponding training datasets in two sequences are superposed and averaged to extend the training set. The proposed method was tested offline on a P300-speller with the familiar face paradigm. The SVM ensemble with extended training set achieved 85% classification accuracy for the averaged results of four sequences, and 100% for 11 sequences in the P300-speller. In contrast, the conventional SVM ensemble with non-extended training set achieved only 65% accuracy for four sequences, and 92% for 11 sequences. The SVM ensemble with extended training set achieves higher classification accuracies than the conventional SVM ensemble, which verifies that the proposed method effectively improves the classification performance of BCI P300-spellers, thus enhancing their practicality.
Ensemble data assimilation in the Red Sea: sensitivity to ensemble selection and atmospheric forcing
NASA Astrophysics Data System (ADS)
Toye, Habib; Zhan, Peng; Gopalakrishnan, Ganesh; Kartadikaria, Aditya R.; Huang, Huang; Knio, Omar; Hoteit, Ibrahim
2017-07-01
We present our efforts to build an ensemble data assimilation and forecasting system for the Red Sea. The system consists of the high-resolution Massachusetts Institute of Technology general circulation model (MITgcm) to simulate ocean circulation and of the Data Research Testbed (DART) for ensemble data assimilation. DART has been configured to integrate all members of an ensemble adjustment Kalman filter (EAKF) in parallel, based on which we adapted the ensemble operations in DART to use an invariant ensemble, i.e., an ensemble Optimal Interpolation (EnOI) algorithm. This approach requires only single forward model integration in the forecast step and therefore saves substantial computational cost. To deal with the strong seasonal variability of the Red Sea, the EnOI ensemble is then seasonally selected from a climatology of long-term model outputs. Observations of remote sensing sea surface height (SSH) and sea surface temperature (SST) are assimilated every 3 days. Real-time atmospheric fields from the National Center for Environmental Prediction (NCEP) and the European Center for Medium-Range Weather Forecasts (ECMWF) are used as forcing in different assimilation experiments. We investigate the behaviors of the EAKF and (seasonal-) EnOI and compare their performances for assimilating and forecasting the circulation of the Red Sea. We further assess the sensitivity of the assimilation system to various filtering parameters (ensemble size, inflation) and atmospheric forcing.
Observing system simulation experiments with multiple methods
NASA Astrophysics Data System (ADS)
Ishibashi, Toshiyuki
2014-11-01
An observing System Simulation Experiment (OSSE) is a method to evaluate impacts of hypothetical observing systems on analysis and forecast accuracy in numerical weather prediction (NWP) systems. Since OSSE requires simulations of hypothetical observations, uncertainty of OSSE results is generally larger than that of observing system experiments (OSEs). To reduce such uncertainty, OSSEs for existing observing systems are often carried out as calibration of the OSSE system. The purpose of this study is to achieve reliable OSSE results based on results of OSSEs with multiple methods. There are three types of OSSE methods. The first one is the sensitivity observing system experiment (SOSE) based OSSE (SOSEOSSE). The second one is the ensemble of data assimilation cycles (ENDA) based OSSE (ENDA-OSSE). The third one is the nature-run (NR) based OSSE (NR-OSSE). These three OSSE methods have very different properties. The NROSSE evaluates hypothetical observations in a virtual (hypothetical) world, NR. The ENDA-OSSE is very simple method but has a sampling error problem due to a small size ensemble. The SOSE-OSSE requires a very highly accurate analysis field as a pseudo truth of the real atmosphere. We construct these three types of OSSE methods in the Japan meteorological Agency (JMA) global 4D-Var experimental system. In the conference, we will present initial results of these OSSE systems and their comparisons.
Automatic Estimation of Osteoporotic Fracture Cases by Using Ensemble Learning Approaches.
Kilic, Niyazi; Hosgormez, Erkan
2016-03-01
Ensemble learning methods are one of the most powerful tools for the pattern classification problems. In this paper, the effects of ensemble learning methods and some physical bone densitometry parameters on osteoporotic fracture detection were investigated. Six feature set models were constructed including different physical parameters and they fed into the ensemble classifiers as input features. As ensemble learning techniques, bagging, gradient boosting and random subspace (RSM) were used. Instance based learning (IBk) and random forest (RF) classifiers applied to six feature set models. The patients were classified into three groups such as osteoporosis, osteopenia and control (healthy), using ensemble classifiers. Total classification accuracy and f-measure were also used to evaluate diagnostic performance of the proposed ensemble classification system. The classification accuracy has reached to 98.85 % by the combination of model 6 (five BMD + five T-score values) using RSM-RF classifier. The findings of this paper suggest that the patients will be able to be warned before a bone fracture occurred, by just examining some physical parameters that can easily be measured without invasive operations.
Ali, Safdar; Majid, Abdul; Khan, Asifullah
2014-04-01
Development of an accurate and reliable intelligent decision-making method for the construction of cancer diagnosis system is one of the fast growing research areas of health sciences. Such decision-making system can provide adequate information for cancer diagnosis and drug discovery. Descriptors derived from physicochemical properties of protein sequences are very useful for classifying cancerous proteins. Recently, several interesting research studies have been reported on breast cancer classification. To this end, we propose the exploitation of the physicochemical properties of amino acids in protein primary sequences such as hydrophobicity (Hd) and hydrophilicity (Hb) for breast cancer classification. Hd and Hb properties of amino acids, in recent literature, are reported to be quite effective in characterizing the constituent amino acids and are used to study protein foldings, interactions, structures, and sequence-order effects. Especially, using these physicochemical properties, we observed that proline, serine, tyrosine, cysteine, arginine, and asparagine amino acids offer high discrimination between cancerous and healthy proteins. In addition, unlike traditional ensemble classification approaches, the proposed 'IDM-PhyChm-Ens' method was developed by combining the decision spaces of a specific classifier trained on different feature spaces. The different feature spaces used were amino acid composition, split amino acid composition, and pseudo amino acid composition. Consequently, we have exploited different feature spaces using Hd and Hb properties of amino acids to develop an accurate method for classification of cancerous protein sequences. We developed ensemble classifiers using diverse learning algorithms such as random forest (RF), support vector machines (SVM), and K-nearest neighbor (KNN) trained on different feature spaces. We observed that ensemble-RF, in case of cancer classification, performed better than ensemble-SVM and ensemble-KNN. Our analysis demonstrates that ensemble-RF, ensemble-SVM and ensemble-KNN are more effective than their individual counterparts. The proposed 'IDM-PhyChm-Ens' method has shown improved performance compared to existing techniques.
NASA Astrophysics Data System (ADS)
Wu, Xiongwu; Brooks, Bernard R.
2011-11-01
The self-guided Langevin dynamics (SGLD) is a method to accelerate conformational searching. This method is unique in the way that it selectively enhances and suppresses molecular motions based on their frequency to accelerate conformational searching without modifying energy surfaces or raising temperatures. It has been applied to studies of many long time scale events, such as protein folding. Recent progress in the understanding of the conformational distribution in SGLD simulations makes SGLD also an accurate method for quantitative studies. The SGLD partition function provides a way to convert the SGLD conformational distribution to the canonical ensemble distribution and to calculate ensemble average properties through reweighting. Based on the SGLD partition function, this work presents a force-momentum-based self-guided Langevin dynamics (SGLDfp) simulation method to directly sample the canonical ensemble. This method includes interaction forces in its guiding force to compensate the perturbation caused by the momentum-based guiding force so that it can approximately sample the canonical ensemble. Using several example systems, we demonstrate that SGLDfp simulations can approximately maintain the canonical ensemble distribution and significantly accelerate conformational searching. With optimal parameters, SGLDfp and SGLD simulations can cross energy barriers of more than 15 kT and 20 kT, respectively, at similar rates for LD simulations to cross energy barriers of 10 kT. The SGLDfp method is size extensive and works well for large systems. For studies where preserving accessible conformational space is critical, such as free energy calculations and protein folding studies, SGLDfp is an efficient approach to search and sample the conformational space.
2012-01-01
Background Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble? Results The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity. Conclusion Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway. PMID:23216969
Günther, Oliver P; Chen, Virginia; Freue, Gabriela Cohen; Balshaw, Robert F; Tebbutt, Scott J; Hollander, Zsuzsanna; Takhar, Mandeep; McMaster, W Robert; McManus, Bruce M; Keown, Paul A; Ng, Raymond T
2012-12-08
Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble? The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity. Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway.
Avoiding the ensemble decorrelation problem using member-by-member post-processing
NASA Astrophysics Data System (ADS)
Van Schaeybroeck, Bert; Vannitsem, Stéphane
2014-05-01
Forecast calibration or post-processing has become a standard tool in atmospheric and climatological science due to the presence of systematic initial condition and model errors. For ensemble forecasts the most competitive methods derive from the assumption of a fixed ensemble distribution. However, when independently applying such 'statistical' methods at different locations, lead times or for multiple variables the correlation structure for individual ensemble members is destroyed. Instead of reastablishing the correlation structure as in Schefzik et al. (2013) we instead propose a calibration method that avoids such problem by correcting each ensemble member individually. Moreover, we analyse the fundamental mechanisms by which the probabilistic ensemble skill can be enhanced. In terms of continuous ranked probability score, our member-by-member approach amounts to skill gain that extends for lead times far beyond the error doubling time and which is as good as the one of the most competitive statistical approach, non-homogeneous Gaussian regression (Gneiting et al. 2005). Besides the conservation of correlation structure, additional benefits arise including the fact that higher-order ensemble moments like kurtosis and skewness are inherited from the uncorrected forecasts. Our detailed analysis is performed in the context of the Kuramoto-Sivashinsky equation and different simple models but the results extent succesfully to the ensemble forecast of the European Centre for Medium-Range Weather Forecasts (Van Schaeybroeck and Vannitsem, 2013, 2014) . References [1] Gneiting, T., Raftery, A. E., Westveld, A., Goldman, T., 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133, 1098-1118. [2] Schefzik, R., T.L. Thorarinsdottir, and T. Gneiting, 2013: Uncertainty Quantification in Complex Simulation Models Using Ensemble Copula Coupling. To appear in Statistical Science 28. [3] Van Schaeybroeck, B., and S. Vannitsem, 2013: Reliable probabilities through statistical post-processing of ensemble forecasts. Proceedings of the European Conference on Complex Systems 2012, Springer proceedings on complexity, XVI, p. 347-352. [4] Van Schaeybroeck, B., and S. Vannitsem, 2014: Ensemble post-processing using member-by-member approaches: theoretical aspects, under review.
Interpolation of property-values between electron numbers is inconsistent with ensemble averaging
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miranda-Quintana, Ramón Alain; Department of Chemistry and Chemical Biology, McMaster University, Hamilton, Ontario L8S 4M1; Ayers, Paul W.
2016-06-28
In this work we explore the physical foundations of models that study the variation of the ground state energy with respect to the number of electrons (E vs. N models), in terms of general grand-canonical (GC) ensemble formulations. In particular, we focus on E vs. N models that interpolate the energy between states with integer number of electrons. We show that if the interpolation of the energy corresponds to a GC ensemble, it is not differentiable. Conversely, if the interpolation is smooth, then it cannot be formulated as any GC ensemble. This proves that interpolation of electronic properties between integermore » electron numbers is inconsistent with any form of ensemble averaging. This emphasizes the role of derivative discontinuities and the critical role of a subsystem’s surroundings in determining its properties.« less
Parameterizations for ensemble Kalman inversion
NASA Astrophysics Data System (ADS)
Chada, Neil K.; Iglesias, Marco A.; Roininen, Lassi; Stuart, Andrew M.
2018-05-01
The use of ensemble methods to solve inverse problems is attractive because it is a derivative-free methodology which is also well-adapted to parallelization. In its basic iterative form the method produces an ensemble of solutions which lie in the linear span of the initial ensemble. Choice of the parameterization of the unknown field is thus a key component of the success of the method. We demonstrate how both geometric ideas and hierarchical ideas can be used to design effective parameterizations for a number of applied inverse problems arising in electrical impedance tomography, groundwater flow and source inversion. In particular we show how geometric ideas, including the level set method, can be used to reconstruct piecewise continuous fields, and we show how hierarchical methods can be used to learn key parameters in continuous fields, such as length-scales, resulting in improved reconstructions. Geometric and hierarchical ideas are combined in the level set method to find piecewise constant reconstructions with interfaces of unknown topology.
Building Diversified Multiple Trees for classification in high dimensional noisy biomedical data.
Li, Jiuyong; Liu, Lin; Liu, Jixue; Green, Ryan
2017-12-01
It is common that a trained classification model is applied to the operating data that is deviated from the training data because of noise. This paper will test an ensemble method, Diversified Multiple Tree (DMT), on its capability for classifying instances in a new laboratory using the classifier built on the instances of another laboratory. DMT is tested on three real world biomedical data sets from different laboratories in comparison with four benchmark ensemble methods, AdaBoost, Bagging, Random Forests, and Random Trees. Experiments have also been conducted on studying the limitation of DMT and its possible variations. Experimental results show that DMT is significantly more accurate than other benchmark ensemble classifiers on classifying new instances of a different laboratory from the laboratory where instances are used to build the classifier. This paper demonstrates that an ensemble classifier, DMT, is more robust in classifying noisy data than other widely used ensemble methods. DMT works on the data set that supports multiple simple trees.
NASA Astrophysics Data System (ADS)
Sokol, Zbyněk; Mejsnar, Jan; Pop, Lukáš; Bližňák, Vojtěch
2017-09-01
A new method for the probabilistic nowcasting of instantaneous rain rates (ENS) based on the ensemble technique and extrapolation along Lagrangian trajectories of current radar reflectivity is presented. Assuming inaccurate forecasts of the trajectories, an ensemble of precipitation forecasts is calculated and used to estimate the probability that rain rates will exceed a given threshold in a given grid point. Although the extrapolation neglects the growth and decay of precipitation, their impact on the probability forecast is taken into account by the calibration of forecasts using the reliability component of the Brier score (BS). ENS forecasts the probability that the rain rates will exceed thresholds of 0.1, 1.0 and 3.0 mm/h in squares of 3 km by 3 km. The lead times were up to 60 min, and the forecast accuracy was measured by the BS. The ENS forecasts were compared with two other methods: combined method (COM) and neighbourhood method (NEI). NEI considered the extrapolated values in the square neighbourhood of 5 by 5 grid points of the point of interest as ensemble members, and the COM ensemble was comprised of united ensemble members of ENS and NEI. The results showed that the calibration technique significantly improves bias of the probability forecasts by including additional uncertainties that correspond to neglected processes during the extrapolation. In addition, the calibration can also be used for finding the limits of maximum lead times for which the forecasting method is useful. We found that ENS is useful for lead times up to 60 min for thresholds of 0.1 and 1 mm/h and approximately 30 to 40 min for a threshold of 3 mm/h. We also found that a reasonable size of the ensemble is 100 members, which provided better scores than ensembles with 10, 25 and 50 members. In terms of the BS, the best results were obtained by ENS and COM, which are comparable. However, ENS is better calibrated and thus preferable.
How Accurate Are Transition States from Simulations of Enzymatic Reactions?
2015-01-01
The rate expression of traditional transition state theory (TST) assumes no recrossing of the transition state (TS) and thermal quasi-equilibrium between the ground state and the TS. Currently, it is not well understood to what extent these assumptions influence the nature of the activated complex obtained in traditional TST-based simulations of processes in the condensed phase in general and in enzymes in particular. Here we scrutinize these assumptions by characterizing the TSs for hydride transfer catalyzed by the enzyme Escherichia coli dihydrofolate reductase obtained using various simulation approaches. Specifically, we compare the TSs obtained with common TST-based methods and a dynamics-based method. Using a recently developed accurate hybrid quantum mechanics/molecular mechanics potential, we find that the TST-based and dynamics-based methods give considerably different TS ensembles. This discrepancy, which could be due equilibrium solvation effects and the nature of the reaction coordinate employed and its motion, raises major questions about how to interpret the TSs determined by common simulation methods. We conclude that further investigation is needed to characterize the impact of various TST assumptions on the TS phase-space ensemble and on the reaction kinetics. PMID:24860275
Generating highly accurate prediction hypotheses through collaborative ensemble learning
NASA Astrophysics Data System (ADS)
Arsov, Nino; Pavlovski, Martin; Basnarkov, Lasko; Kocarev, Ljupco
2017-03-01
Ensemble generation is a natural and convenient way of achieving better generalization performance of learning algorithms by gathering their predictive capabilities. Here, we nurture the idea of ensemble-based learning by combining bagging and boosting for the purpose of binary classification. Since the former improves stability through variance reduction, while the latter ameliorates overfitting, the outcome of a multi-model that combines both strives toward a comprehensive net-balancing of the bias-variance trade-off. To further improve this, we alter the bagged-boosting scheme by introducing collaboration between the multi-model’s constituent learners at various levels. This novel stability-guided classification scheme is delivered in two flavours: during or after the boosting process. Applied among a crowd of Gentle Boost ensembles, the ability of the two suggested algorithms to generalize is inspected by comparing them against Subbagging and Gentle Boost on various real-world datasets. In both cases, our models obtained a 40% generalization error decrease. But their true ability to capture details in data was revealed through their application for protein detection in texture analysis of gel electrophoresis images. They achieve improved performance of approximately 0.9773 AUROC when compared to the AUROC of 0.9574 obtained by an SVM based on recursive feature elimination.
Morabito, Marco; Pavlinic, Daniela Z; Crisci, Alfonso; Capecchi, Valerio; Orlandini, Simone; Mekjavic, Igor B
2011-07-01
Military and civil defense personnel are often involved in complex activities in a variety of outdoor environments. The choice of appropriate clothing ensembles represents an important strategy to establish the success of a military mission. The main aim of this study was to compare the known clothing insulation of the garment ensembles worn by soldiers during two winter outdoor field trials (hike and guard duty) with the estimated optimal clothing thermal insulations recommended to maintain thermoneutrality, assessed by using two different biometeorological procedures. The overall aim was to assess the applicability of such biometeorological procedures to weather forecast systems, thereby developing a comprehensive biometeorological tool for military operational forecast purposes. Military trials were carried out during winter 2006 in Pokljuka (Slovenia) by Slovene Armed Forces personnel. Gastrointestinal temperature, heart rate and environmental parameters were measured with portable data acquisition systems. The thermal characteristics of the clothing ensembles worn by the soldiers, namely thermal resistance, were determined with a sweating thermal manikin. Results showed that the clothing ensemble worn by the military was appropriate during guard duty but generally inappropriate during the hike. A general under-estimation of the biometeorological forecast model in predicting the optimal clothing insulation value was observed and an additional post-processing calibration might further improve forecast accuracy. This study represents the first step in the development of a comprehensive personalized biometeorological forecast system aimed at improving recommendations regarding the optimal thermal insulation of military garment ensembles for winter activities.
New technique for ensemble dressing combining Multimodel SuperEnsemble and precipitation PDF
NASA Astrophysics Data System (ADS)
Cane, D.; Milelli, M.
2009-09-01
The Multimodel SuperEnsemble technique (Krishnamurti et al., Science 285, 1548-1550, 1999) is a postprocessing method for the estimation of weather forecast parameters reducing direct model output errors. It differs from other ensemble analysis techniques by the use of an adequate weighting of the input forecast models to obtain a combined estimation of meteorological parameters. Weights are calculated by least-square minimization of the difference between the model and the observed field during a so-called training period. Although it can be applied successfully on the continuous parameters like temperature, humidity, wind speed and mean sea level pressure (Cane and Milelli, Meteorologische Zeitschrift, 15, 2, 2006), the Multimodel SuperEnsemble gives good results also when applied on the precipitation, a parameter quite difficult to handle with standard post-processing methods. Here we present our methodology for the Multimodel precipitation forecasts applied on a wide spectrum of results over Piemonte very dense non-GTS weather station network. We will focus particularly on an accurate statistical method for bias correction and on the ensemble dressing in agreement with the observed precipitation forecast-conditioned PDF. Acknowledgement: this work is supported by the Italian Civil Defence Department.
Magnetic separation of general solid particles realised by a permanent magnet
Hisayoshi, K.; Uyeda, C.; Terada, K.
2016-01-01
Most existing solids are categorised as diamagnetic or weak paramagnetic materials. The possibility of magnetic motion has not been intensively considered for these materials. Here, we demonstrate for the first time that ensembles of heterogeneous particles (diamagnetic bismuth, diamond and graphite particles, as well as two paramagnetic olivines) can be dynamically separated into five fractions by the low field produced by neodymium (NdFeB) magnets during short-duration microgravity (μg). This result is in contrast to the generally accepted notion that ordinary solid materials are magnetically inert. The materials of the separated particles are identified by their magnetic susceptibility (χ), which is determined from the translating velocity. The potential of this approach as an analytical technique is comparable to that of chromatography separation because the extraction of new solid phases from a heterogeneous grain ensemble will lead to important discoveries about inorganic materials. The method is applicable for the separation of the precious samples such as lunar soils and/or the Hayabusa particles recovered from the asteroids, because even micron-order grains can be thoroughly separated without sample-loss. PMID:27929081
Magnetic separation of general solid particles realised by a permanent magnet
NASA Astrophysics Data System (ADS)
Hisayoshi, K.; Uyeda, C.; Terada, K.
2016-12-01
Most existing solids are categorised as diamagnetic or weak paramagnetic materials. The possibility of magnetic motion has not been intensively considered for these materials. Here, we demonstrate for the first time that ensembles of heterogeneous particles (diamagnetic bismuth, diamond and graphite particles, as well as two paramagnetic olivines) can be dynamically separated into five fractions by the low field produced by neodymium (NdFeB) magnets during short-duration microgravity (μg). This result is in contrast to the generally accepted notion that ordinary solid materials are magnetically inert. The materials of the separated particles are identified by their magnetic susceptibility (χ), which is determined from the translating velocity. The potential of this approach as an analytical technique is comparable to that of chromatography separation because the extraction of new solid phases from a heterogeneous grain ensemble will lead to important discoveries about inorganic materials. The method is applicable for the separation of the precious samples such as lunar soils and/or the Hayabusa particles recovered from the asteroids, because even micron-order grains can be thoroughly separated without sample-loss.
Magnetic separation of general solid particles realised by a permanent magnet.
Hisayoshi, K; Uyeda, C; Terada, K
2016-12-08
Most existing solids are categorised as diamagnetic or weak paramagnetic materials. The possibility of magnetic motion has not been intensively considered for these materials. Here, we demonstrate for the first time that ensembles of heterogeneous particles (diamagnetic bismuth, diamond and graphite particles, as well as two paramagnetic olivines) can be dynamically separated into five fractions by the low field produced by neodymium (NdFeB) magnets during short-duration microgravity (μg). This result is in contrast to the generally accepted notion that ordinary solid materials are magnetically inert. The materials of the separated particles are identified by their magnetic susceptibility (χ), which is determined from the translating velocity. The potential of this approach as an analytical technique is comparable to that of chromatography separation because the extraction of new solid phases from a heterogeneous grain ensemble will lead to important discoveries about inorganic materials. The method is applicable for the separation of the precious samples such as lunar soils and/or the Hayabusa particles recovered from the asteroids, because even micron-order grains can be thoroughly separated without sample-loss.
Yin, Yizhou; Kundu, Kunal; Pal, Lipika R; Moult, John
2017-09-01
CAGI (Critical Assessment of Genome Interpretation) conducts community experiments to determine the state of the art in relating genotype to phenotype. Here, we report results obtained using newly developed ensemble methods to address two CAGI4 challenges: enzyme activity for population missense variants found in NAGLU (Human N-acetyl-glucosaminidase) and random missense mutations in Human UBE2I (Human SUMO E2 ligase), assayed in a high-throughput competitive yeast complementation procedure. The ensemble methods are effective, ranked second for SUMO-ligase and third for NAGLU, according to the CAGI independent assessors. However, in common with other methods used in CAGI, there are large discrepancies between predicted and experimental activities for a subset of variants. Analysis of the structural context provides some insight into these. Post-challenge analysis shows that the ensemble methods are also effective at assigning pathogenicity for the NAGLU variants. In the clinic, providing an estimate of the reliability of pathogenic assignments is the key. We have also used the NAGLU dataset to show that ensemble methods have considerable potential for this task, and are already reliable enough for use with a subset of mutations. © 2017 Wiley Periodicals, Inc.
Relaxation in a two-body Fermi-Pasta-Ulam system in the canonical ensemble
NASA Astrophysics Data System (ADS)
Sen, Surajit; Barrett, Tyler
The study of the dynamics of the Fermi-Pasta-Ulam (FPU) chain remains a challenging problem. Inspired by the recent work of Onorato et al. on thermalization in the FPU system, we report a study of relaxation processes in a two-body FPU system in the canonical ensemble. The studies have been carried out using the Recurrence Relations Method introduced by Zwanzig, Mori, Lee and others. We have obtained exact analytical expressions for the first thirteen levels of the continued fraction representation of the Laplace transformed velocity autocorrelation function of the system. Using simple and reasonable extrapolation schemes and known limits we are able to estimate the relaxation behavior of the oscillators in the two-body FPU system and recover the expected behavior in the harmonic limit. Generalizations of the calculations to larger systems will be discussed.
Metal Oxide Gas Sensor Drift Compensation Using a Two-Dimensional Classifier Ensemble
Liu, Hang; Chu, Renzhi; Tang, Zhenan
2015-01-01
Sensor drift is the most challenging problem in gas sensing at present. We propose a novel two-dimensional classifier ensemble strategy to solve the gas discrimination problem, regardless of the gas concentration, with high accuracy over extended periods of time. This strategy is appropriate for multi-class classifiers that consist of combinations of pairwise classifiers, such as support vector machines. We compare the performance of the strategy with those of competing methods in an experiment based on a public dataset that was compiled over a period of three years. The experimental results demonstrate that the two-dimensional ensemble outperforms the other methods considered. Furthermore, we propose a pre-aging process inspired by that applied to the sensors to improve the stability of the classifier ensemble. The experimental results demonstrate that the weight of each multi-class classifier model in the ensemble remains fairly static before and after the addition of new classifier models to the ensemble, when a pre-aging procedure is applied. PMID:25942640
Gridded Calibration of Ensemble Wind Vector Forecasts Using Ensemble Model Output Statistics
NASA Astrophysics Data System (ADS)
Lazarus, S. M.; Holman, B. P.; Splitt, M. E.
2017-12-01
A computationally efficient method is developed that performs gridded post processing of ensemble wind vector forecasts. An expansive set of idealized WRF model simulations are generated to provide physically consistent high resolution winds over a coastal domain characterized by an intricate land / water mask. Ensemble model output statistics (EMOS) is used to calibrate the ensemble wind vector forecasts at observation locations. The local EMOS predictive parameters (mean and variance) are then spread throughout the grid utilizing flow-dependent statistical relationships extracted from the downscaled WRF winds. Using data withdrawal and 28 east central Florida stations, the method is applied to one year of 24 h wind forecasts from the Global Ensemble Forecast System (GEFS). Compared to the raw GEFS, the approach improves both the deterministic and probabilistic forecast skill. Analysis of multivariate rank histograms indicate the post processed forecasts are calibrated. Two downscaling case studies are presented, a quiescent easterly flow event and a frontal passage. Strengths and weaknesses of the approach are presented and discussed.
Fitting a function to time-dependent ensemble averaged data.
Fogelmark, Karl; Lomholt, Michael A; Irbäck, Anders; Ambjörnsson, Tobias
2018-05-03
Time-dependent ensemble averages, i.e., trajectory-based averages of some observable, are of importance in many fields of science. A crucial objective when interpreting such data is to fit these averages (for instance, squared displacements) with a function and extract parameters (such as diffusion constants). A commonly overlooked challenge in such function fitting procedures is that fluctuations around mean values, by construction, exhibit temporal correlations. We show that the only available general purpose function fitting methods, correlated chi-square method and the weighted least squares method (which neglects correlation), fail at either robust parameter estimation or accurate error estimation. We remedy this by deriving a new closed-form error estimation formula for weighted least square fitting. The new formula uses the full covariance matrix, i.e., rigorously includes temporal correlations, but is free of the robustness issues, inherent to the correlated chi-square method. We demonstrate its accuracy in four examples of importance in many fields: Brownian motion, damped harmonic oscillation, fractional Brownian motion and continuous time random walks. We also successfully apply our method, weighted least squares including correlation in error estimation (WLS-ICE), to particle tracking data. The WLS-ICE method is applicable to arbitrary fit functions, and we provide a publically available WLS-ICE software.
Smith, Colin A; Kortemme, Tanja
2011-01-01
Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.
Rauscher, Sarah; Neale, Chris; Pomès, Régis
2009-10-13
Generalized-ensemble algorithms in temperature space have become popular tools to enhance conformational sampling in biomolecular simulations. A random walk in temperature leads to a corresponding random walk in potential energy, which can be used to cross over energetic barriers and overcome the problem of quasi-nonergodicity. In this paper, we introduce two novel methods: simulated tempering distributed replica sampling (STDR) and virtual replica exchange (VREX). These methods are designed to address the practical issues inherent in the replica exchange (RE), simulated tempering (ST), and serial replica exchange (SREM) algorithms. RE requires a large, dedicated, and homogeneous cluster of CPUs to function efficiently when applied to complex systems. ST and SREM both have the drawback of requiring extensive initial simulations, possibly adaptive, for the calculation of weight factors or potential energy distribution functions. STDR and VREX alleviate the need for lengthy initial simulations, and for synchronization and extensive communication between replicas. Both methods are therefore suitable for distributed or heterogeneous computing platforms. We perform an objective comparison of all five algorithms in terms of both implementation issues and sampling efficiency. We use disordered peptides in explicit water as test systems, for a total simulation time of over 42 μs. Efficiency is defined in terms of both structural convergence and temperature diffusion, and we show that these definitions of efficiency are in fact correlated. Importantly, we find that ST-based methods exhibit faster temperature diffusion and correspondingly faster convergence of structural properties compared to RE-based methods. Within the RE-based methods, VREX is superior to both SREM and RE. On the basis of our observations, we conclude that ST is ideal for simple systems, while STDR is well-suited for complex systems.
Remarks on thermalization in 2D CFT
NASA Astrophysics Data System (ADS)
de Boer, Jan; Engelhardt, Dalit
2016-12-01
We revisit certain aspects of thermalization in 2D conformal field theory (CFT). In particular, we consider similarities and differences between the time dependence of correlation functions in various states in rational and non-rational CFTs. We also consider the distinction between global and local thermalization and explain how states obtained by acting with a diffeomorphism on the ground state can appear locally thermal, and we review why the time-dependent expectation value of the energy-momentum tensor is generally a poor diagnostic of global thermalization. Since all 2D CFTs have an infinite set of commuting conserved charges, generic initial states might be expected to give rise to a generalized Gibbs ensemble rather than a pure thermal ensemble at late times. We construct the holographic dual of the generalized Gibbs ensemble and show that, to leading order, it is still described by a Banados-Teitelboim-Zanelli black hole. The extra conserved charges, while rendering c <1 theories essentially integrable, therefore seem to have little effect on large-c conformal field theories.
NASA Astrophysics Data System (ADS)
Chen, Dongyue; Lin, Jianhui; Li, Yanping
2018-06-01
Complementary ensemble empirical mode decomposition (CEEMD) has been developed for the mode-mixing problem in Empirical Mode Decomposition (EMD) method. Compared to the ensemble empirical mode decomposition (EEMD), the CEEMD method reduces residue noise in the signal reconstruction. Both CEEMD and EEMD need enough ensemble number to reduce the residue noise, and hence it would be too much computation cost. Moreover, the selection of intrinsic mode functions (IMFs) for further analysis usually depends on experience. A modified CEEMD method and IMFs evaluation index are proposed with the aim of reducing the computational cost and select IMFs automatically. A simulated signal and in-service high-speed train gearbox vibration signals are employed to validate the proposed method in this paper. The results demonstrate that the modified CEEMD can decompose the signal efficiently with less computation cost, and the IMFs evaluation index can select the meaningful IMFs automatically.
Shukla, Shraddhanand; Roberts, Jason B.; Hoell. Andrew,; Funk, Chris; Robertson, Franklin R.; Kirtmann, Benjamin
2016-01-01
The skill of North American multimodel ensemble (NMME) seasonal forecasts in East Africa (EA), which encompasses one of the most food and water insecure areas of the world, is evaluated using deterministic, categorical, and probabilistic evaluation methods. The skill is estimated for all three primary growing seasons: March–May (MAM), July–September (JAS), and October–December (OND). It is found that the precipitation forecast skill in this region is generally limited and statistically significant over only a small part of the domain. In the case of MAM (JAS) [OND] season it exceeds the skill of climatological forecasts in parts of equatorial EA (Northern Ethiopia) [equatorial EA] for up to 2 (5) [5] months lead. Temperature forecast skill is generally much higher than precipitation forecast skill (in terms of deterministic and probabilistic skill scores) and statistically significant over a majority of the region. Over the region as a whole, temperature forecasts also exhibit greater reliability than the precipitation forecasts. The NMME ensemble forecasts are found to be more skillful and reliable than the forecast from any individual model. The results also demonstrate that for some seasons (e.g. JAS), the predictability of precipitation signals varies and is higher during certain climate events (e.g. ENSO). Finally, potential room for improvement in forecast skill is identified in some models by comparing homogeneous predictability in individual NMME models with their respective forecast skill.
CMIP5 ensemble-based spatial rainfall projection over homogeneous zones of India
NASA Astrophysics Data System (ADS)
Akhter, Javed; Das, Lalu; Deb, Argha
2017-09-01
Performances of the state-of-the-art CMIP5 models in reproducing the spatial rainfall patterns over seven homogeneous rainfall zones of India viz. North Mountainous India (NMI), Northwest India (NWI), North Central India (NCI), Northeast India (NEI), West Peninsular India (WPI), East Peninsular India (EPI) and South Peninsular India (SPI) have been assessed using different conventional performance metrics namely spatial correlation (R), index of agreement (d-index), Nash-Sutcliffe efficiency (NSE), Ratio of RMSE to the standard deviation of the observations (RSR) and mean bias (MB). The results based on these indices revealed that majority of the models are unable to reproduce finer-scaled spatial patterns over most of the zones. Thereafter, four bias correction methods i.e. Scaling, Standardized Reconstruction, Empirical Quantile Mapping and Gamma Quantile Mapping have been applied on GCM simulations to enhance the skills of the GCM projections. It has been found that scaling method compared to other three methods shown its better skill in capturing mean spatial patterns. Multi-model ensemble (MME) comprising 25 numbers of better performing bias corrected (Scaled) GCMs, have been considered for developing future rainfall patterns over seven zones. Models' spread from ensemble mean (uncertainty) has been found to be larger in RCP 8.5 than RCP4.5 ensemble. In general, future rainfall projections from RCP 4.5 and RCP 8.5 revealed an increasing rainfall over seven zones during 2020s, 2050s, and 2080s. The maximum increase has been found over southwestern part of NWI (12-30%), northwestern part of WPI (3-30%), southeastern part of NEI (5-18%) and northern and eastern part of SPI (6-24%). However, the contiguous region comprising by the southeastern part of NCI and northeastern part of EPI, may experience slight decreasing rainfall (about 3%) during 2020s whereas the western part of NMI may also receive around 3% reduction in rainfall during both 2050s and 2080s.
Monthly ENSO Forecast Skill and Lagged Ensemble Size
DelSole, T.; Tippett, M.K.; Pegion, K.
2018-01-01
Abstract The mean square error (MSE) of a lagged ensemble of monthly forecasts of the Niño 3.4 index from the Climate Forecast System (CFSv2) is examined with respect to ensemble size and configuration. Although the real‐time forecast is initialized 4 times per day, it is possible to infer the MSE for arbitrary initialization frequency and for burst ensembles by fitting error covariances to a parametric model and then extrapolating to arbitrary ensemble size and initialization frequency. Applying this method to real‐time forecasts, we find that the MSE consistently reaches a minimum for a lagged ensemble size between one and eight days, when four initializations per day are included. This ensemble size is consistent with the 8–10 day lagged ensemble configuration used operationally. Interestingly, the skill of both ensemble configurations is close to the estimated skill of the infinite ensemble. The skill of the weighted, lagged, and burst ensembles are found to be comparable. Certain unphysical features of the estimated error growth were tracked down to problems with the climatology and data discontinuities. PMID:29937973
Monthly ENSO Forecast Skill and Lagged Ensemble Size
NASA Astrophysics Data System (ADS)
Trenary, L.; DelSole, T.; Tippett, M. K.; Pegion, K.
2018-04-01
The mean square error (MSE) of a lagged ensemble of monthly forecasts of the Niño 3.4 index from the Climate Forecast System (CFSv2) is examined with respect to ensemble size and configuration. Although the real-time forecast is initialized 4 times per day, it is possible to infer the MSE for arbitrary initialization frequency and for burst ensembles by fitting error covariances to a parametric model and then extrapolating to arbitrary ensemble size and initialization frequency. Applying this method to real-time forecasts, we find that the MSE consistently reaches a minimum for a lagged ensemble size between one and eight days, when four initializations per day are included. This ensemble size is consistent with the 8-10 day lagged ensemble configuration used operationally. Interestingly, the skill of both ensemble configurations is close to the estimated skill of the infinite ensemble. The skill of the weighted, lagged, and burst ensembles are found to be comparable. Certain unphysical features of the estimated error growth were tracked down to problems with the climatology and data discontinuities.
Project FIRES. Volume 1: Program Overview and Summary, Phase 1B
NASA Technical Reports Server (NTRS)
Abeles, F. J.
1980-01-01
Overall performance requirements and evaluation methods for firefighters protective equipment were established and published as the Protective Ensemble Performance Standards (PEPS). Current firefighters protective equipment was tested and evaluated against the PEPS requirements, and the preliminary design of a prototype protective ensemble was performed. In phase 1B, the design of the prototype ensemble was finalized. Prototype ensembles were fabricated and then subjected to a series of qualification tests which were based upon the PEPS requirements. Engineering drawings and purchase specifications were prepared for the new protective ensemble.
Classifying medical relations in clinical text via convolutional neural networks.
He, Bin; Guan, Yi; Dai, Rui
2018-05-16
Deep learning research on relation classification has achieved solid performance in the general domain. This study proposes a convolutional neural network (CNN) architecture with a multi-pooling operation for medical relation classification on clinical records and explores a loss function with a category-level constraint matrix. Experiments using the 2010 i2b2/VA relation corpus demonstrate these models, which do not depend on any external features, outperform previous single-model methods and our best model is competitive with the existing ensemble-based method. Copyright © 2018. Published by Elsevier B.V.
New technologies for examining the role of neuronal ensembles in drug addiction and fear.
Cruz, Fabio C; Koya, Eisuke; Guez-Barber, Danielle H; Bossert, Jennifer M; Lupica, Carl R; Shaham, Yavin; Hope, Bruce T
2013-11-01
Correlational data suggest that learned associations are encoded within neuronal ensembles. However, it has been difficult to prove that neuronal ensembles mediate learned behaviours because traditional pharmacological and lesion methods, and even newer cell type-specific methods, affect both activated and non-activated neurons. In addition, previous studies on synaptic and molecular alterations induced by learning did not distinguish between behaviourally activated and non-activated neurons. Here, we describe three new approaches--Daun02 inactivation, FACS sorting of activated neurons and Fos-GFP transgenic rats--that have been used to selectively target and study activated neuronal ensembles in models of conditioned drug effects and relapse. We also describe two new tools--Fos-tTA transgenic mice and inactivation of CREB-overexpressing neurons--that have been used to study the role of neuronal ensembles in conditioned fear.
A method for determining the weak statistical stationarity of a random process
NASA Technical Reports Server (NTRS)
Sadeh, W. Z.; Koper, C. A., Jr.
1978-01-01
A method for determining the weak statistical stationarity of a random process is presented. The core of this testing procedure consists of generating an equivalent ensemble which approximates a true ensemble. Formation of an equivalent ensemble is accomplished through segmenting a sufficiently long time history of a random process into equal, finite, and statistically independent sample records. The weak statistical stationarity is ascertained based on the time invariance of the equivalent-ensemble averages. Comparison of these averages with their corresponding time averages over a single sample record leads to a heuristic estimate of the ergodicity of a random process. Specific variance tests are introduced for evaluating the statistical independence of the sample records, the time invariance of the equivalent-ensemble autocorrelations, and the ergodicity. Examination and substantiation of these procedures were conducted utilizing turbulent velocity signals.
Subsurface characterization with localized ensemble Kalman filter employing adaptive thresholding
NASA Astrophysics Data System (ADS)
Delijani, Ebrahim Biniaz; Pishvaie, Mahmoud Reza; Boozarjomehry, Ramin Bozorgmehry
2014-07-01
Ensemble Kalman filter, EnKF, as a Monte Carlo sequential data assimilation method has emerged promisingly for subsurface media characterization during past decade. Due to high computational cost of large ensemble size, EnKF is limited to small ensemble set in practice. This results in appearance of spurious correlation in covariance structure leading to incorrect or probable divergence of updated realizations. In this paper, a universal/adaptive thresholding method is presented to remove and/or mitigate spurious correlation problem in the forecast covariance matrix. This method is, then, extended to regularize Kalman gain directly. Four different thresholding functions have been considered to threshold forecast covariance and gain matrices. These include hard, soft, lasso and Smoothly Clipped Absolute Deviation (SCAD) functions. Three benchmarks are used to evaluate the performances of these methods. These benchmarks include a small 1D linear model and two 2D water flooding (in petroleum reservoirs) cases whose levels of heterogeneity/nonlinearity are different. It should be noted that beside the adaptive thresholding, the standard distance dependant localization and bootstrap Kalman gain are also implemented for comparison purposes. We assessed each setup with different ensemble sets to investigate the sensitivity of each method on ensemble size. The results indicate that thresholding of forecast covariance yields more reliable performance than Kalman gain. Among thresholding function, SCAD is more robust for both covariance and gain estimation. Our analyses emphasize that not all assimilation cycles do require thresholding and it should be performed wisely during the early assimilation cycles. The proposed scheme of adaptive thresholding outperforms other methods for subsurface characterization of underlying benchmarks.
Fine-Tuning Your Ensemble's Jazz Style.
ERIC Educational Resources Information Center
Garcia, Antonio J.
1991-01-01
Proposes instructional strategies for directors of jazz groups, including guidelines for developing of skills necessary for good performance. Includes effective methods for positive changes in ensemble style. Addresses jazz group problems such as beat, tempo, staying in tune, wind power, and solo/ensemble lines. Discusses percussionists, bassists,…
Breaking of Ensemble Equivalence in Networks
NASA Astrophysics Data System (ADS)
Squartini, Tiziano; de Mol, Joey; den Hollander, Frank; Garlaschelli, Diego
2015-12-01
It is generally believed that, in the thermodynamic limit, the microcanonical description as a function of energy coincides with the canonical description as a function of temperature. However, various examples of systems for which the microcanonical and canonical ensembles are not equivalent have been identified. A complete theory of this intriguing phenomenon is still missing. Here we show that ensemble nonequivalence can manifest itself also in random graphs with topological constraints. We find that, while graphs with a given number of links are ensemble equivalent, graphs with a given degree sequence are not. This result holds irrespective of whether the energy is nonadditive (as in unipartite graphs) or additive (as in bipartite graphs). In contrast with previous expectations, our results show that (1) physically, nonequivalence can be induced by an extensive number of local constraints, and not necessarily by long-range interactions or nonadditivity, (2) mathematically, nonequivalence is determined by a different large-deviation behavior of microcanonical and canonical probabilities for a single microstate, and not necessarily for almost all microstates. The latter criterion, which is entirely local, is not restricted to networks and holds in general.
Quantum Gibbs ensemble Monte Carlo
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fantoni, Riccardo, E-mail: rfantoni@ts.infn.it; Moroni, Saverio, E-mail: moroni@democritos.it
We present a path integral Monte Carlo method which is the full quantum analogue of the Gibbs ensemble Monte Carlo method of Panagiotopoulos to study the gas-liquid coexistence line of a classical fluid. Unlike previous extensions of Gibbs ensemble Monte Carlo to include quantum effects, our scheme is viable even for systems with strong quantum delocalization in the degenerate regime of temperature. This is demonstrated by an illustrative application to the gas-superfluid transition of {sup 4}He in two dimensions.
NASA Astrophysics Data System (ADS)
Romanova, Vanya; Hense, Andreas; Wahl, Sabrina; Brune, Sebastian; Baehr, Johanna
2016-04-01
The decadal variability and its predictability of the surface net freshwater fluxes is compared in a set of retrospective predictions, all using the same model setup, and only differing in the implemented ocean initialisation method and ensemble generation method. The basic aim is to deduce the differences between the initialization/ensemble generation methods in view of the uncertainty of the verifying observational data sets. The analysis will give an approximation of the uncertainties of the net freshwater fluxes, which up to now appear to be one of the most uncertain products in observational data and model outputs. All ensemble generation methods are implemented into the MPI-ESM earth system model in the framework of the ongoing MiKlip project (www.fona-miklip.de). Hindcast experiments are initialised annually between 2000-2004, and from each start year 10 ensemble members are initialized for 5 years each. Four different ensemble generation methods are compared: (i) a method based on the Anomaly Transform method (Romanova and Hense, 2015) in which the initial oceanic perturbations represent orthogonal and balanced anomaly structures in space and time and between the variables taken from a control run, (ii) one-day-lagged ocean states from the MPI-ESM-LR baseline system (iii) one-day-lagged of ocean and atmospheric states with preceding full-field nudging to re-analysis in both the atmospheric and the oceanic component of the system - the baseline one MPI-ESM-LR system, (iv) an Ensemble Kalman Filter (EnKF) implemented into oceanic part of MPI-ESM (Brune et al. 2015), assimilating monthly subsurface oceanic temperature and salinity (EN3) using the Parallel Data Assimilation Framework (PDAF). The hindcasts are evaluated probabilistically using fresh water flux data sets from four different reanalysis data sets: MERRA, NCEP-R1, GFDL ocean reanalysis and GECCO2. The assessments show no clear differences in the evaluations scores on regional scales. However, on the global scale the physically motivated methods (i) and (iv) provide probabilistic hindcasts with a consistently higher reliability than the lagged initialization methods (ii)/(iii) despite the large uncertainties in the verifying observations and in the simulations.
Nonequilibrium Statistical Operator Method and Generalized Kinetic Equations
NASA Astrophysics Data System (ADS)
Kuzemsky, A. L.
2018-01-01
We consider some principal problems of nonequilibrium statistical thermodynamics in the framework of the Zubarev nonequilibrium statistical operator approach. We present a brief comparative analysis of some approaches to describing irreversible processes based on the concept of nonequilibrium Gibbs ensembles and their applicability to describing nonequilibrium processes. We discuss the derivation of generalized kinetic equations for a system in a heat bath. We obtain and analyze a damped Schrödinger-type equation for a dynamical system in a heat bath. We study the dynamical behavior of a particle in a medium taking the dissipation effects into account. We consider the scattering problem for neutrons in a nonequilibrium medium and derive a generalized Van Hove formula. We show that the nonequilibrium statistical operator method is an effective, convenient tool for describing irreversible processes in condensed matter.
An ensemble method for extracting adverse drug events from social media.
Liu, Jing; Zhao, Songzheng; Zhang, Xiaodi
2016-06-01
Because adverse drug events (ADEs) are a serious health problem and a leading cause of death, it is of vital importance to identify them correctly and in a timely manner. With the development of Web 2.0, social media has become a large data source for information on ADEs. The objective of this study is to develop a relation extraction system that uses natural language processing techniques to effectively distinguish between ADEs and non-ADEs in informal text on social media. We develop a feature-based approach that utilizes various lexical, syntactic, and semantic features. Information-gain-based feature selection is performed to address high-dimensional features. Then, we evaluate the effectiveness of four well-known kernel-based approaches (i.e., subset tree kernel, tree kernel, shortest dependency path kernel, and all-paths graph kernel) and several ensembles that are generated by adopting different combination methods (i.e., majority voting, weighted averaging, and stacked generalization). All of the approaches are tested using three data sets: two health-related discussion forums and one general social networking site (i.e., Twitter). When investigating the contribution of each feature subset, the feature-based approach attains the best area under the receiver operating characteristics curve (AUC) values, which are 78.6%, 72.2%, and 79.2% on the three data sets. When individual methods are used, we attain the best AUC values of 82.1%, 73.2%, and 77.0% using the subset tree kernel, shortest dependency path kernel, and feature-based approach on the three data sets, respectively. When using classifier ensembles, we achieve the best AUC values of 84.5%, 77.3%, and 84.5% on the three data sets, outperforming the baselines. Our experimental results indicate that ADE extraction from social media can benefit from feature selection. With respect to the effectiveness of different feature subsets, lexical features and semantic features can enhance the ADE extraction capability. Kernel-based approaches, which can stay away from the feature sparsity issue, are qualified to address the ADE extraction problem. Combining different individual classifiers using suitable combination methods can further enhance the ADE extraction effectiveness. Copyright © 2016 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Matte, Simon; Boucher, Marie-Amélie; Boucher, Vincent; Fortier Filion, Thomas-Charles
2017-06-01
A large effort has been made over the past 10 years to promote the operational use of probabilistic or ensemble streamflow forecasts. Numerous studies have shown that ensemble forecasts are of higher quality than deterministic ones. Many studies also conclude that decisions based on ensemble rather than deterministic forecasts lead to better decisions in the context of flood mitigation. Hence, it is believed that ensemble forecasts possess a greater economic and social value for both decision makers and the general population. However, the vast majority of, if not all, existing hydro-economic studies rely on a cost-loss ratio framework that assumes a risk-neutral decision maker. To overcome this important flaw, this study borrows from economics and evaluates the economic value of early warning flood systems using the well-known Constant Absolute Risk Aversion (CARA) utility function, which explicitly accounts for the level of risk aversion of the decision maker. This new framework allows for the full exploitation of the information related to a forecasts' uncertainty, making it especially suited for the economic assessment of ensemble or probabilistic forecasts. Rather than comparing deterministic and ensemble forecasts, this study focuses on comparing different types of ensemble forecasts. There are multiple ways of assessing and representing forecast uncertainty. Consequently, there exist many different means of building an ensemble forecasting system for future streamflow. One such possibility is to dress deterministic forecasts using the statistics of past error forecasts. Such dressing methods are popular among operational agencies because of their simplicity and intuitiveness. Another approach is the use of ensemble meteorological forecasts for precipitation and temperature, which are then provided as inputs to one or many hydrological model(s). In this study, three concurrent ensemble streamflow forecasting systems are compared: simple statistically dressed deterministic forecasts, forecasts based on meteorological ensembles, and a variant of the latter that also includes an estimation of state variable uncertainty. This comparison takes place for the Montmorency River, a small flood-prone watershed in southern central Quebec, Canada. The assessment of forecasts is performed for lead times of 1 to 5 days, both in terms of forecasts' quality (relative to the corresponding record of observations) and in terms of economic value, using the new proposed framework based on the CARA utility function. It is found that the economic value of a forecast for a risk-averse decision maker is closely linked to the forecast reliability in predicting the upper tail of the streamflow distribution. Hence, post-processing forecasts to avoid over-forecasting could help improve both the quality and the value of forecasts.
Overlapped Partitioning for Ensemble Classifiers of P300-Based Brain-Computer Interfaces
Onishi, Akinari; Natsume, Kiyohisa
2014-01-01
A P300-based brain-computer interface (BCI) enables a wide range of people to control devices that improve their quality of life. Ensemble classifiers with naive partitioning were recently applied to the P300-based BCI and these classification performances were assessed. However, they were usually trained on a large amount of training data (e.g., 15300). In this study, we evaluated ensemble linear discriminant analysis (LDA) classifiers with a newly proposed overlapped partitioning method using 900 training data. In addition, the classification performances of the ensemble classifier with naive partitioning and a single LDA classifier were compared. One of three conditions for dimension reduction was applied: the stepwise method, principal component analysis (PCA), or none. The results show that an ensemble stepwise LDA (SWLDA) classifier with overlapped partitioning achieved a better performance than the commonly used single SWLDA classifier and an ensemble SWLDA classifier with naive partitioning. This result implies that the performance of the SWLDA is improved by overlapped partitioning and the ensemble classifier with overlapped partitioning requires less training data than that with naive partitioning. This study contributes towards reducing the required amount of training data and achieving better classification performance. PMID:24695550
Overlapped partitioning for ensemble classifiers of P300-based brain-computer interfaces.
Onishi, Akinari; Natsume, Kiyohisa
2014-01-01
A P300-based brain-computer interface (BCI) enables a wide range of people to control devices that improve their quality of life. Ensemble classifiers with naive partitioning were recently applied to the P300-based BCI and these classification performances were assessed. However, they were usually trained on a large amount of training data (e.g., 15300). In this study, we evaluated ensemble linear discriminant analysis (LDA) classifiers with a newly proposed overlapped partitioning method using 900 training data. In addition, the classification performances of the ensemble classifier with naive partitioning and a single LDA classifier were compared. One of three conditions for dimension reduction was applied: the stepwise method, principal component analysis (PCA), or none. The results show that an ensemble stepwise LDA (SWLDA) classifier with overlapped partitioning achieved a better performance than the commonly used single SWLDA classifier and an ensemble SWLDA classifier with naive partitioning. This result implies that the performance of the SWLDA is improved by overlapped partitioning and the ensemble classifier with overlapped partitioning requires less training data than that with naive partitioning. This study contributes towards reducing the required amount of training data and achieving better classification performance.
HLPI-Ensemble: Prediction of human lncRNA-protein interactions based on ensemble strategy.
Hu, Huan; Zhang, Li; Ai, Haixin; Zhang, Hui; Fan, Yetian; Zhao, Qi; Liu, Hongsheng
2018-03-27
LncRNA plays an important role in many biological and disease progression by binding to related proteins. However, the experimental methods for studying lncRNA-protein interactions are time-consuming and expensive. Although there are a few models designed to predict the interactions of ncRNA-protein, they all have some common drawbacks that limit their predictive performance. In this study, we present a model called HLPI-Ensemble designed specifically for human lncRNA-protein interactions. HLPI-Ensemble adopts the ensemble strategy based on three mainstream machine learning algorithms of Support Vector Machines (SVM), Random Forests (RF) and Extreme Gradient Boosting (XGB) to generate HLPI-SVM Ensemble, HLPI-RF Ensemble and HLPI-XGB Ensemble, respectively. The results of 10-fold cross-validation show that HLPI-SVM Ensemble, HLPI-RF Ensemble and HLPI-XGB Ensemble achieved AUCs of 0.95, 0.96 and 0.96, respectively, in the test dataset. Furthermore, we compared the performance of the HLPI-Ensemble models with the previous models through external validation dataset. The results show that the false positives (FPs) of HLPI-Ensemble models are much lower than that of the previous models, and other evaluation indicators of HLPI-Ensemble models are also higher than those of the previous models. It is further showed that HLPI-Ensemble models are superior in predicting human lncRNA-protein interaction compared with previous models. The HLPI-Ensemble is publicly available at: http://ccsipb.lnu.edu.cn/hlpiensemble/ .
Bayes Error Rate Estimation Using Classifier Ensembles
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Ghosh, Joydeep
2003-01-01
The Bayes error rate gives a statistical lower bound on the error achievable for a given classification problem and the associated choice of features. By reliably estimating th is rate, one can assess the usefulness of the feature set that is being used for classification. Moreover, by comparing the accuracy achieved by a given classifier with the Bayes rate, one can quantify how effective that classifier is. Classical approaches for estimating or finding bounds for the Bayes error, in general, yield rather weak results for small sample sizes; unless the problem has some simple characteristics, such as Gaussian class-conditional likelihoods. This article shows how the outputs of a classifier ensemble can be used to provide reliable and easily obtainable estimates of the Bayes error with negligible extra computation. Three methods of varying sophistication are described. First, we present a framework that estimates the Bayes error when multiple classifiers, each providing an estimate of the a posteriori class probabilities, a recombined through averaging. Second, we bolster this approach by adding an information theoretic measure of output correlation to the estimate. Finally, we discuss a more general method that just looks at the class labels indicated by ensem ble members and provides error estimates based on the disagreements among classifiers. The methods are illustrated for artificial data, a difficult four-class problem involving underwater acoustic data, and two problems from the Problem benchmarks. For data sets with known Bayes error, the combiner-based methods introduced in this article outperform existing methods. The estimates obtained by the proposed methods also seem quite reliable for the real-life data sets for which the true Bayes rates are unknown.
De l'importance des orbites periodiques: Detection et applications
NASA Astrophysics Data System (ADS)
Doyon, Bernard
L'ensemble des Orbites Periodiques Instables (OPIs) d'un systeme chaotique est intimement relie a ses proprietes dynamiques. A partir de l'ensemble (en principe infini) d'OPIs cachees dans l'espace des phases, on peut obtenir des quantites dynamiques importantes telles les exposants de Lyapunov, la mesure invariante, l'entropie topologique et la dimension fractale. En chaos quantique (i.e. l'etude de systemes quantiques qui ont un equivalent chaotique dans la limite classique), ces memes OPIs permettent de faire le pont entre le comportement classique et quantique de systemes non-integrables. La localisation de ces cycles fondamentaux est un probleme complexe. Cette these aborde dans un premier temps le probleme de la detection des OPIs dans les systemes chaotiques. Une etude comparative de deux algorithmes recents est presentee. Nous approfondissons ces deux methodes afin de les utiliser sur differents systemes dont des flots continus dissipatifs et conservatifs. Une analyse du taux de convergence des algorithmes est aussi realisee afin de degager les forces et les limites de ces schemes numeriques. Les methodes de detection que nous utilisons reposent sur une transformation particuliere de la dynamique initiale. Cette astuce nous a inspire une methode alternative pour cibler et stabiliser une orbite periodique quelconque dans un systeme chaotique. Le ciblage est en general combine aux methodes de controle pour stabiliser rapidement un cycle donne. En general, il faut connaitre la position et la stabilite du cycle en question. La nouvelle methode de ciblage que nous presentons ne demande pas de connaitre a priori la position et la stabilite des orbites periodiques. Elle pourrait etre un outil complementaire aux methodes de ciblage et de controle actuelles.
Negative correlation learning for customer churn prediction: a comparison study.
Rodan, Ali; Fayyoumi, Ayham; Faris, Hossam; Alsakran, Jamal; Al-Kadi, Omar
2015-01-01
Recently, telecommunication companies have been paying more attention toward the problem of identification of customer churn behavior. In business, it is well known for service providers that attracting new customers is much more expensive than retaining existing ones. Therefore, adopting accurate models that are able to predict customer churn can effectively help in customer retention campaigns and maximizing the profit. In this paper we will utilize an ensemble of Multilayer perceptrons (MLP) whose training is obtained using negative correlation learning (NCL) for predicting customer churn in a telecommunication company. Experiments results confirm that NCL based MLP ensemble can achieve better generalization performance (high churn rate) compared with ensemble of MLP without NCL (flat ensemble) and other common data mining techniques used for churn analysis.
Generalized multiple kernel learning with data-dependent priors.
Mao, Qi; Tsang, Ivor W; Gao, Shenghua; Wang, Li
2015-06-01
Multiple kernel learning (MKL) and classifier ensemble are two mainstream methods for solving learning problems in which some sets of features/views are more informative than others, or the features/views within a given set are inconsistent. In this paper, we first present a novel probabilistic interpretation of MKL such that maximum entropy discrimination with a noninformative prior over multiple views is equivalent to the formulation of MKL. Instead of using the noninformative prior, we introduce a novel data-dependent prior based on an ensemble of kernel predictors, which enhances the prediction performance of MKL by leveraging the merits of the classifier ensemble. With the proposed probabilistic framework of MKL, we propose a hierarchical Bayesian model to learn the proposed data-dependent prior and classification model simultaneously. The resultant problem is convex and other information (e.g., instances with either missing views or missing labels) can be seamlessly incorporated into the data-dependent priors. Furthermore, a variety of existing MKL models can be recovered under the proposed MKL framework and can be readily extended to incorporate these priors. Extensive experiments demonstrate the benefits of our proposed framework in supervised and semisupervised settings, as well as in tasks with partial correspondence among multiple views.
Population interactions between parietal and primary motor cortices during reach
Rao, Naveen G.; Bondy, Adrian; Truccolo, Wilson; Donoghue, John P.
2014-01-01
Neural interactions between parietal area 2/5 and primary motor cortex (M1) were examined to determine the timing and behavioral correlates of cortico-cortical interactions. Neural activity in areas 2/5 and M1 was simultaneously recorded with 96-channel microelectrode arrays in three rhesus monkeys performing a center-out reach task. We introduce a new method to reveal parietal-motor interactions at a population level using partial spike-field coherence (PSFC) between ensembles of neurons in one area and a local field potential (LFP) in another. PSFC reflects the extent of phase locking between spike times and LFP, after removing the coherence between LFPs in the two areas. Spectral analysis of M1 LFP revealed three bands: low, medium, and high, differing in power between movement preparation and performance. We focus on PSFC in the 1–10 Hz band, in which coherence was strongest. PSFC was also present in the 10–40 Hz band during movement preparation in many channels but generally nonsignificant in the 60–200 Hz band. Ensemble PSFC revealed stronger interactions than single cell-LFP pairings. PSFC of area 2/5 ensembles with M1 LFP typically rose around movement onset and peaked ∼500 ms afterward. PSFC was typically stronger for subsets of area 2/5 neurons and M1 LFPs with similar directional bias than for those with opposite bias, indicating that area 2/5 contributes movement direction information. Together with linear prediction of M1 LFP by area 2/5 spiking, the ensemble-LFP pairing approach reveals interactions missed by single neuron-LFP pairing, demonstrating that cortico-cortical communication can be more readily observed at the ensemble level. PMID:25210154
Ozçift, Akin
2011-05-01
Supervised classification algorithms are commonly used in the designing of computer-aided diagnosis systems. In this study, we present a resampling strategy based Random Forests (RF) ensemble classifier to improve diagnosis of cardiac arrhythmia. Random forests is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. In this way, an RF ensemble classifier performs better than a single tree from classification performance point of view. In general, multiclass datasets having unbalanced distribution of sample sizes are difficult to analyze in terms of class discrimination. Cardiac arrhythmia is such a dataset that has multiple classes with small sample sizes and it is therefore adequate to test our resampling based training strategy. The dataset contains 452 samples in fourteen types of arrhythmias and eleven of these classes have sample sizes less than 15. Our diagnosis strategy consists of two parts: (i) a correlation based feature selection algorithm is used to select relevant features from cardiac arrhythmia dataset. (ii) RF machine learning algorithm is used to evaluate the performance of selected features with and without simple random sampling to evaluate the efficiency of proposed training strategy. The resultant accuracy of the classifier is found to be 90.0% and this is a quite high diagnosis performance for cardiac arrhythmia. Furthermore, three case studies, i.e., thyroid, cardiotocography and audiology, are used to benchmark the effectiveness of the proposed method. The results of experiments demonstrated the efficiency of random sampling strategy in training RF ensemble classification algorithm. Copyright © 2011 Elsevier Ltd. All rights reserved.
Relations between dissipated work and Rényi divergences in the generalized Gibbs ensemble
NASA Astrophysics Data System (ADS)
Wei, Bo-Bo
2018-04-01
In this work, we show that the dissipation in a many-body system under an arbitrary nonequilibrium process is related to the Rényi divergences between two states along the forward and reversed dynamics under a very general family of initial conditions. This relation generalizes the links between dissipated work and Rényi divergences to quantum systems with conserved quantities whose equilibrium state is described by the generalized Gibbs ensemble. The relation is applicable for quantum systems with conserved quantities and can be applied to protocols driving the system between integrable and chaotic regimes. We demonstrate our ideas by considering the one-dimensional transverse quantum Ising model and the Jaynes-Cummings model which are driven out of equilibrium.
Variety and volatility in financial markets
NASA Astrophysics Data System (ADS)
Lillo, Fabrizio; Mantegna, Rosario N.
2000-11-01
We study the price dynamics of stocks traded in a financial market by considering the statistical properties of both a single time series and an ensemble of stocks traded simultaneously. We use the n stocks traded on the New York Stock Exchange to form a statistical ensemble of daily stock returns. For each trading day of our database, we study the ensemble return distribution. We find that a typical ensemble return distribution exists in most of the trading days with the exception of crash and rally days and of the days following these extreme events. We analyze each ensemble return distribution by extracting its first two central moments. We observe that these moments fluctuate in time and are stochastic processes, themselves. We characterize the statistical properties of ensemble return distribution central moments by investigating their probability density functions and temporal correlation properties. In general, time-averaged and portfolio-averaged price returns have different statistical properties. We infer from these differences information about the relative strength of correlation between stocks and between different trading days. Last, we compare our empirical results with those predicted by the single-index model and we conclude that this simple model cannot explain the statistical properties of the second moment of the ensemble return distribution.
Quantum Ensemble Classification: A Sampling-Based Learning Control Approach.
Chen, Chunlin; Dong, Daoyi; Qi, Bo; Petersen, Ian R; Rabitz, Herschel
2017-06-01
Quantum ensemble classification (QEC) has significant applications in discrimination of atoms (or molecules), separation of isotopes, and quantum information extraction. However, quantum mechanics forbids deterministic discrimination among nonorthogonal states. The classification of inhomogeneous quantum ensembles is very challenging, since there exist variations in the parameters characterizing the members within different classes. In this paper, we recast QEC as a supervised quantum learning problem. A systematic classification methodology is presented by using a sampling-based learning control (SLC) approach for quantum discrimination. The classification task is accomplished via simultaneously steering members belonging to different classes to their corresponding target states (e.g., mutually orthogonal states). First, a new discrimination method is proposed for two similar quantum systems. Then, an SLC method is presented for QEC. Numerical results demonstrate the effectiveness of the proposed approach for the binary classification of two-level quantum ensembles and the multiclass classification of multilevel quantum ensembles.
New technologies for examining neuronal ensembles in drug addiction and fear
Cruz, Fabio C.; Koya, Eisuke; Guez-Barber, Danielle H.; Bossert, Jennifer M.; Lupica, Carl R.; Shaham, Yavin; Hope, Bruce T.
2015-01-01
Correlational data suggest that learned associations are encoded within neuronal ensembles. However, it has been difficult to prove that neuronal ensembles mediate learned behaviours because traditional pharmacological and lesion methods, and even newer cell type-specific methods, affect both activated and non-activated neurons. Additionally, previous studies on synaptic and molecular alterations induced by learning did not distinguish between behaviourally activated and non-activated neurons. Here, we describe three new approaches—Daun02 inactivation, FACS sorting of activated neurons and c-fos-GFP transgenic rats — that have been used to selectively target and study activated neuronal ensembles in models of conditioned drug effects and relapse. We also describe two new tools — c-fos-tTA mice and inactivation of CREB-overexpressing neurons — that have been used to study the role of neuronal ensembles in conditioned fear. PMID:24088811
Prediction of Weather Impacted Airport Capacity using Ensemble Learning
NASA Technical Reports Server (NTRS)
Wang, Yao Xun
2011-01-01
Ensemble learning with the Bagging Decision Tree (BDT) model was used to assess the impact of weather on airport capacities at selected high-demand airports in the United States. The ensemble bagging decision tree models were developed and validated using the Federal Aviation Administration (FAA) Aviation System Performance Metrics (ASPM) data and weather forecast at these airports. The study examines the performance of BDT, along with traditional single Support Vector Machines (SVM), for airport runway configuration selection and airport arrival rates (AAR) prediction during weather impacts. Testing of these models was accomplished using observed weather, weather forecast, and airport operation information at the chosen airports. The experimental results show that ensemble methods are more accurate than a single SVM classifier. The airport capacity ensemble method presented here can be used as a decision support model that supports air traffic flow management to meet the weather impacted airport capacity in order to reduce costs and increase safety.
A deep learning-based multi-model ensemble method for cancer prediction.
Xiao, Yawen; Wu, Jun; Lin, Zongli; Zhao, Xiaodong
2018-01-01
Cancer is a complex worldwide health problem associated with high mortality. With the rapid development of the high-throughput sequencing technology and the application of various machine learning methods that have emerged in recent years, progress in cancer prediction has been increasingly made based on gene expression, providing insight into effective and accurate treatment decision making. Thus, developing machine learning methods, which can successfully distinguish cancer patients from healthy persons, is of great current interest. However, among the classification methods applied to cancer prediction so far, no one method outperforms all the others. In this paper, we demonstrate a new strategy, which applies deep learning to an ensemble approach that incorporates multiple different machine learning models. We supply informative gene data selected by differential gene expression analysis to five different classification models. Then, a deep learning method is employed to ensemble the outputs of the five classifiers. The proposed deep learning-based multi-model ensemble method was tested on three public RNA-seq data sets of three kinds of cancers, Lung Adenocarcinoma, Stomach Adenocarcinoma and Breast Invasive Carcinoma. The test results indicate that it increases the prediction accuracy of cancer for all the tested RNA-seq data sets as compared to using a single classifier or the majority voting algorithm. By taking full advantage of different classifiers, the proposed deep learning-based multi-model ensemble method is shown to be accurate and effective for cancer prediction. Copyright © 2017 Elsevier B.V. All rights reserved.
Parameter Uncertainty on AGCM-simulated Tropical Cyclones
NASA Astrophysics Data System (ADS)
He, F.
2015-12-01
This work studies the parameter uncertainty on tropical cyclone (TC) simulations in Atmospheric General Circulation Models (AGCMs) using the Reed-Jablonowski TC test case, which is illustrated in Community Atmosphere Model (CAM). It examines the impact from 24 parameters across the physical parameterization schemes that represent the convection, turbulence, precipitation and cloud processes in AGCMs. The one-at-a-time (OAT) sensitivity analysis method first quantifies their relative importance on TC simulations and identifies the key parameters to the six different TC characteristics: intensity, precipitation, longwave cloud radiative forcing (LWCF), shortwave cloud radiative forcing (SWCF), cloud liquid water path (LWP) and ice water path (IWP). Then, 8 physical parameters are chosen and perturbed using the Latin-Hypercube Sampling (LHS) method. The comparison between OAT ensemble run and LHS ensemble run shows that the simulated TC intensity is mainly affected by the parcel fractional mass entrainment rate in Zhang-McFarlane (ZM) deep convection scheme. The nonlinear interactive effect among different physical parameters is negligible on simulated TC intensity. In contrast, this nonlinear interactive effect plays a significant role in other simulated tropical cyclone characteristics (precipitation, LWCF, SWCF, LWP and IWP) and greatly enlarge their simulated uncertainties. The statistical emulator Extended Multivariate Adaptive Regression Splines (EMARS) is applied to characterize the response functions for nonlinear effect. Last, we find that the intensity uncertainty caused by physical parameters is in a degree comparable to uncertainty caused by model structure (e.g. grid) and initial conditions (e.g. sea surface temperature, atmospheric moisture). These findings suggest the importance of using the perturbed physics ensemble (PPE) method to revisit tropical cyclone prediction under climate change scenario.
Minimalist ensemble algorithms for genome-wide protein localization prediction.
Lin, Jhih-Rong; Mondal, Ananda Mohan; Liu, Rong; Hu, Jianjun
2012-07-03
Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.
Minimalist ensemble algorithms for genome-wide protein localization prediction
2012-01-01
Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. PMID:22759391
A new approach to human microRNA target prediction using ensemble pruning and rotation forest.
Mousavi, Reza; Eftekhari, Mahdi; Haghighi, Mehdi Ghezelbash
2015-12-01
MicroRNAs (miRNAs) are small non-coding RNAs that have important functions in gene regulation. Since finding miRNA target experimentally is costly and needs spending much time, the use of machine learning methods is a growing research area for miRNA target prediction. In this paper, a new approach is proposed by using two popular ensemble strategies, i.e. Ensemble Pruning and Rotation Forest (EP-RTF), to predict human miRNA target. For EP, the approach utilizes Genetic Algorithm (GA). In other words, a subset of classifiers from the heterogeneous ensemble is first selected by GA. Next, the selected classifiers are trained based on the RTF method and then are combined using weighted majority voting. In addition to seeking a better subset of classifiers, the parameter of RTF is also optimized by GA. Findings of the present study confirm that the newly developed EP-RTF outperforms (in terms of classification accuracy, sensitivity, and specificity) the previously applied methods over four datasets in the field of human miRNA target. Diversity-error diagrams reveal that the proposed ensemble approach constructs individual classifiers which are more accurate and usually diverse than the other ensemble approaches. Given these experimental results, we highly recommend EP-RTF for improving the performance of miRNA target prediction.
NASA Astrophysics Data System (ADS)
Merker, Claire; Ament, Felix; Clemens, Marco
2017-04-01
The quantification of measurement uncertainty for rain radar data remains challenging. Radar reflectivity measurements are affected, amongst other things, by calibration errors, noise, blocking and clutter, and attenuation. Their combined impact on measurement accuracy is difficult to quantify due to incomplete process understanding and complex interdependencies. An improved quality assessment of rain radar measurements is of interest for applications both in meteorology and hydrology, for example for precipitation ensemble generation, rainfall runoff simulations, or in data assimilation for numerical weather prediction. Especially a detailed description of the spatial and temporal structure of errors is beneficial in order to make best use of the areal precipitation information provided by radars. Radar precipitation ensembles are one promising approach to represent spatially variable radar measurement errors. We present a method combining ensemble radar precipitation nowcasting with data assimilation to estimate radar measurement uncertainty at each pixel. This combination of ensemble forecast and observation yields a consistent spatial and temporal evolution of the radar error field. We use an advection-based nowcasting method to generate an ensemble reflectivity forecast from initial data of a rain radar network. Subsequently, reflectivity data from single radars is assimilated into the forecast using the Local Ensemble Transform Kalman Filter. The spread of the resulting analysis ensemble provides a flow-dependent, spatially and temporally correlated reflectivity error estimate at each pixel. We will present first case studies that illustrate the method using data from a high-resolution X-band radar network.
NASA Astrophysics Data System (ADS)
Clark, Elizabeth; Wood, Andy; Nijssen, Bart; Mendoza, Pablo; Newman, Andy; Nowak, Kenneth; Arnold, Jeffrey
2017-04-01
In an automated forecast system, hydrologic data assimilation (DA) performs the valuable function of correcting raw simulated watershed model states to better represent external observations, including measurements of streamflow, snow, soil moisture, and the like. Yet the incorporation of automated DA into operational forecasting systems has been a long-standing challenge due to the complexities of the hydrologic system, which include numerous lags between state and output variations. To help demonstrate that such methods can succeed in operational automated implementations, we present results from the real-time application of an ensemble particle filter (PF) for short-range (7 day lead) ensemble flow forecasts in western US river basins. We use the System for Hydromet Applications, Research and Prediction (SHARP), developed by the National Center for Atmospheric Research (NCAR) in collaboration with the University of Washington, U.S. Army Corps of Engineers, and U.S. Bureau of Reclamation. SHARP is a fully automated platform for short-term to seasonal hydrologic forecasting applications, incorporating uncertainty in initial hydrologic conditions (IHCs) and in hydrometeorological predictions through ensemble methods. In this implementation, IHC uncertainty is estimated by propagating an ensemble of 100 temperature and precipitation time series through conceptual and physically-oriented models. The resulting ensemble of derived IHCs exhibits a broad range of possible soil moisture and snow water equivalent (SWE) states. The PF selects and/or weights and resamples the IHCs that are most consistent with external streamflow observations, and uses the particles to initialize a streamflow forecast ensemble driven by ensemble precipitation and temperature forecasts downscaled from the Global Ensemble Forecast System (GEFS). We apply this method in real-time for several basins in the western US that are important for water resources management, and perform a hindcast experiment to evaluate the utility of PF-based data assimilation on streamflow forecasts skill. This presentation describes findings, including a comparison of sequential and non-sequential particle weighting methods.
NASA Astrophysics Data System (ADS)
Erfanian, A.; Fomenko, L.; Wang, G.
2016-12-01
Multi-model ensemble (MME) average is considered the most reliable for simulating both present-day and future climates. It has been a primary reference for making conclusions in major coordinated studies i.e. IPCC Assessment Reports and CORDEX. The biases of individual models cancel out each other in MME average, enabling the ensemble mean to outperform individual members in simulating the mean climate. This enhancement however comes with tremendous computational cost, which is especially inhibiting for regional climate modeling as model uncertainties can originate from both RCMs and the driving GCMs. Here we propose the Ensemble-based Reconstructed Forcings (ERF) approach to regional climate modeling that achieves a similar level of bias reduction at a fraction of cost compared with the conventional MME approach. The new method constructs a single set of initial and boundary conditions (IBCs) by averaging the IBCs of multiple GCMs, and drives the RCM with this ensemble average of IBCs to conduct a single run. Using a regional climate model (RegCM4.3.4-CLM4.5), we tested the method over West Africa for multiple combination of (up to six) GCMs. Our results indicate that the performance of the ERF method is comparable to that of the MME average in simulating the mean climate. The bias reduction seen in ERF simulations is achieved by using more realistic IBCs in solving the system of equations underlying the RCM physics and dynamics. This endows the new method with a theoretical advantage in addition to reducing computational cost. The ERF output is an unaltered solution of the RCM as opposed to a climate state that might not be physically plausible due to the averaging of multiple solutions with the conventional MME approach. The ERF approach should be considered for use in major international efforts such as CORDEX. Key words: Multi-model ensemble, ensemble analysis, ERF, regional climate modeling
NASA Astrophysics Data System (ADS)
Yuan, J.; Kopp, R. E.
2017-12-01
Quantitative risk analysis of regional climate change is crucial for risk management and impact assessment of climate change. Two major challenges to assessing the risks of climate change are: CMIP5 model runs, which drive EURO-CODEX downscaling runs, do not cover the full range of uncertainty of future projections; Climate models may underestimate the probability of tail risks (i.e. extreme events). To overcome the difficulties, this study offers a viable avenue, where a set of probabilistic climate ensemble is generated using the Surrogate/Model Mixed Ensemble (SMME) method. The probabilistic ensembles for temperature and precipitation are used to assess the range of uncertainty covered by five bias-corrected simulations from the high-resolution (0.11º) EURO-CODEX database, which are selected by the PESETA (The Projection of Economic impacts of climate change in Sectors of the European Union based on bottom-up Analysis) III project. Results show that the distribution of SMME ensemble is notably wider than both distribution of raw ensemble of GCMs and the spread of the five EURO-CORDEX in RCP8.5. Tail risks are well presented by the SMME ensemble. Both SMME ensemble and EURO-CORDEX projections are aggregated to administrative level, and are integrated into impact functions of PESETA III to assess climate risks in Europe. To further evaluate the uncertainties introduced by the downscaling process, we compare the 5 runs from EURO-CORDEX with runs from the corresponding GCMs. Time series of regional mean, spatial patterns, and climate indices are examined for the future climate (2080-2099) deviating from the present climate (1981-2010). The downscaling processes do not appear to be trend-preserving, e.g. the increase in regional mean temperature from EURO-CORDEX is slower than that from the corresponding GCM. The spatial pattern comparison reveals that the differences between each pair of GCM and EURO-CORDEX are small in winter. In summer, the temperatures of EURO-CORDEX are generally lower than those of GCMs, while the drying trends in precipitation of EURO-CORDEX are smaller than those of GCMs. Climate indices are significantly affected by bias-correction and downscaling process. Our study provides valuable information for selecting climate indices in different regions over Europe.
ERIC Educational Resources Information Center
Lowe, Geoffrey M.
2018-01-01
Competition is reported in the general education literature as having a largely detrimental impact upon student engagement and long-term motivation, yet competition has long been an accepted part of the music education ensemble landscape. Adjudicated ensemble competitions and competition-festivals are commonplace in most Australian states, as…
Zhai, Binxu; Chen, Jianguo
2018-04-18
A stacked ensemble model is developed for forecasting and analyzing the daily average concentrations of fine particulate matter (PM 2.5 ) in Beijing, China. Special feature extraction procedures, including those of simplification, polynomial, transformation and combination, are conducted before modeling to identify potentially significant features based on an exploratory data analysis. Stability feature selection and tree-based feature selection methods are applied to select important variables and evaluate the degrees of feature importance. Single models including LASSO, Adaboost, XGBoost and multi-layer perceptron optimized by the genetic algorithm (GA-MLP) are established in the level 0 space and are then integrated by support vector regression (SVR) in the level 1 space via stacked generalization. A feature importance analysis reveals that nitrogen dioxide (NO 2 ) and carbon monoxide (CO) concentrations measured from the city of Zhangjiakou are taken as the most important elements of pollution factors for forecasting PM 2.5 concentrations. Local extreme wind speeds and maximal wind speeds are considered to extend the most effects of meteorological factors to the cross-regional transportation of contaminants. Pollutants found in the cities of Zhangjiakou and Chengde have a stronger impact on air quality in Beijing than other surrounding factors. Our model evaluation shows that the ensemble model generally performs better than a single nonlinear forecasting model when applied to new data with a coefficient of determination (R 2 ) of 0.90 and a root mean squared error (RMSE) of 23.69μg/m 3 . For single pollutant grade recognition, the proposed model performs better when applied to days characterized by good air quality than when applied to days registering high levels of pollution. The overall classification accuracy level is 73.93%, with most misclassifications made among adjacent categories. The results demonstrate the interpretability and generalizability of the stacked ensemble model. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Tehrany, Mahyat Shafapour; Pradhan, Biswajeet; Jebur, Mustafa Neamah
2014-05-01
Flood is one of the most devastating natural disasters that occur frequently in Terengganu, Malaysia. Recently, ensemble based techniques are getting extremely popular in flood modeling. In this paper, weights-of-evidence (WoE) model was utilized first, to assess the impact of classes of each conditioning factor on flooding through bivariate statistical analysis (BSA). Then, these factors were reclassified using the acquired weights and entered into the support vector machine (SVM) model to evaluate the correlation between flood occurrence and each conditioning factor. Through this integration, the weak point of WoE can be solved and the performance of the SVM will be enhanced. The spatial database included flood inventory, slope, stream power index (SPI), topographic wetness index (TWI), altitude, curvature, distance from the river, geology, rainfall, land use/cover (LULC), and soil type. Four kernel types of SVM (linear kernel (LN), polynomial kernel (PL), radial basis function kernel (RBF), and sigmoid kernel (SIG)) were used to investigate the performance of each kernel type. The efficiency of the new ensemble WoE and SVM method was tested using area under curve (AUC) which measured the prediction and success rates. The validation results proved the strength and efficiency of the ensemble method over the individual methods. The best results were obtained from RBF kernel when compared with the other kernel types. Success rate and prediction rate for ensemble WoE and RBF-SVM method were 96.48% and 95.67% respectively. The proposed ensemble flood susceptibility mapping method could assist researchers and local governments in flood mitigation strategies.
A new Method for the Estimation of Initial Condition Uncertainty Structures in Mesoscale Models
NASA Astrophysics Data System (ADS)
Keller, J. D.; Bach, L.; Hense, A.
2012-12-01
The estimation of fast growing error modes of a system is a key interest of ensemble data assimilation when assessing uncertainty in initial conditions. Over the last two decades three methods (and variations of these methods) have evolved for global numerical weather prediction models: ensemble Kalman filter, singular vectors and breeding of growing modes (or now ensemble transform). While the former incorporates a priori model error information and observation error estimates to determine ensemble initial conditions, the latter two techniques directly address the error structures associated with Lyapunov vectors. However, in global models these structures are mainly associated with transient global wave patterns. When assessing initial condition uncertainty in mesoscale limited area models, several problems regarding the aforementioned techniques arise: (a) additional sources of uncertainty on the smaller scales contribute to the error and (b) error structures from the global scale may quickly move through the model domain (depending on the size of the domain). To address the latter problem, perturbation structures from global models are often included in the mesoscale predictions as perturbed boundary conditions. However, the initial perturbations (when used) are often generated with a variant of an ensemble Kalman filter which does not necessarily focus on the large scale error patterns. In the framework of the European regional reanalysis project of the Hans-Ertel-Center for Weather Research we use a mesoscale model with an implemented nudging data assimilation scheme which does not support ensemble data assimilation at all. In preparation of an ensemble-based regional reanalysis and for the estimation of three-dimensional atmospheric covariance structures, we implemented a new method for the assessment of fast growing error modes for mesoscale limited area models. The so-called self-breeding is development based on the breeding of growing modes technique. Initial perturbations are integrated forward for a short time period and then rescaled and added to the initial state again. Iterating this rapid breeding cycle provides estimates for the initial uncertainty structure (or local Lyapunov vectors) given a specific norm. To avoid that all ensemble perturbations converge towards the leading local Lyapunov vector we apply an ensemble transform variant to orthogonalize the perturbations in the sub-space spanned by the ensemble. By choosing different kind of norms to measure perturbation growth, this technique allows for estimating uncertainty patterns targeted at specific sources of errors (e.g. convection, turbulence). With case study experiments we show applications of the self-breeding method for different sources of uncertainty and different horizontal scales.
Drug-target interaction prediction using ensemble learning and dimensionality reduction.
Ezzat, Ali; Wu, Min; Li, Xiao-Li; Kwoh, Chee-Keong
2017-10-01
Experimental prediction of drug-target interactions is expensive, time-consuming and tedious. Fortunately, computational methods help narrow down the search space for interaction candidates to be further examined via wet-lab techniques. Nowadays, the number of attributes/features for drugs and targets, as well as the amount of their interactions, are increasing, making these computational methods inefficient or occasionally prohibitive. This motivates us to derive a reduced feature set for prediction. In addition, since ensemble learning techniques are widely used to improve the classification performance, it is also worthwhile to design an ensemble learning framework to enhance the performance for drug-target interaction prediction. In this paper, we propose a framework for drug-target interaction prediction leveraging both feature dimensionality reduction and ensemble learning. First, we conducted feature subspacing to inject diversity into the classifier ensemble. Second, we applied three different dimensionality reduction methods to the subspaced features. Third, we trained homogeneous base learners with the reduced features and then aggregated their scores to derive the final predictions. For base learners, we selected two classifiers, namely Decision Tree and Kernel Ridge Regression, resulting in two variants of ensemble models, EnsemDT and EnsemKRR, respectively. In our experiments, we utilized AUC (Area under ROC Curve) as an evaluation metric. We compared our proposed methods with various state-of-the-art methods under 5-fold cross validation. Experimental results showed EnsemKRR achieving the highest AUC (94.3%) for predicting drug-target interactions. In addition, dimensionality reduction helped improve the performance of EnsemDT. In conclusion, our proposed methods produced significant improvements for drug-target interaction prediction. Copyright © 2017 Elsevier Inc. All rights reserved.
Internal Spin Control, Squeezing and Decoherence in Ensembles of Alkali Atomic Spins
NASA Astrophysics Data System (ADS)
Norris, Leigh Morgan
Large atomic ensembles interacting with light are one of the most promising platforms for quantum information processing. In the past decade, novel applications for these systems have emerged in quantum communication, quantum computing, and metrology. Essential to all of these applications is the controllability of the atomic ensemble, which is facilitated by a strong coupling between the atoms and light. Non-classical spin squeezed states are a crucial step in attaining greater ensemble control. The degree of entanglement present in these states, furthermore, serves as a benchmark for the strength of the atom-light interaction. Outside the broader context of quantum information processing with atomic ensembles, spin squeezed states have applications in metrology, where their quantum correlations can be harnessed to improve the precision of magnetometers and atomic clocks. This dissertation focuses upon the production of spin squeezed states in large ensembles of cold trapped alkali atoms interacting with optical fields. While most treatments of spin squeezing consider only the case in which the ensemble is composed of two level systems or qubits, we utilize the entire ground manifold of an alkali atom with hyperfine spin f greater than or equal to 1/2, a qudit. Spin squeezing requires non-classical correlations between the constituent atomic spins, which are generated through the atoms' collective coupling to the light. Either through measurement or multiple interactions with the atoms, the light mediates an entangling interaction that produces quantum correlations. Because the spin squeezing treated in this dissertation ultimately originates from the coupling between the light and atoms, conventional approaches of improving this squeezing have focused on increasing the optical density of the ensemble. The greater number of internal degrees of freedom and the controllability of the spin-f ground hyperfine manifold enable novel methods of enhancing squeezing. In particular, we find that state preparation using control of the internal hyperfine spin increases the entangling power of squeezing protocols when f>1/2. Post-processing of the ensemble using additional internal spin control converts this entanglement into metrologically useful spin squeezing. By employing a variation of the Holstein-Primakoff approximation, in which the collective spin observables of the atomic ensemble are treated as quadratures of a bosonic mode, we model entanglement generation, spin squeezing and the effects of internal spin control. The Holstein-Primakoff formalism also enables us to take into account the decoherence of the ensemble due to optical pumping. While most works ignore or treat optical pumping phenomenologically, we employ a master equation derived from first principles. Our analysis shows that state preparation and the hyperfine spin size have a substantial impact upon both the generation of spin squeezing and the decoherence of the ensemble. Through a numerical search, we determine state preparations that enhance squeezing protocols while remaining robust to optical pumping. Finally, most work on spin squeezing in atomic ensembles has treated the light as a plane wave that couples identically to all atoms. In the final part of this dissertation, we go beyond the customary plane wave approximation on the light and employ focused paraxial beams, which are more efficiently mode matched to the radiation pattern of the atomic ensemble. The mathematical formalism and the internal spin control techniques that we applied in the plane wave case are generalized to accommodate the non-homogeneous paraxial probe. We find the optimal geometries of the atomic ensemble and the probe for mode matching and generation of spin squeezing.
Internal Spin Control, Squeezing and Decoherence in Ensembles of Alkali Atomic Spins
NASA Astrophysics Data System (ADS)
Norris, Leigh Morgan
Large atomic ensembles interacting with light are one of the most promising platforms for quantum information processing. In the past decade, novel applications for these systems have emerged in quantum communication, quantum computing, and metrology. Essential to all of these applications is the controllability of the atomic ensemble, which is facilitated by a strong coupling between the atoms and light. Non-classical spin squeezed states are a crucial step in attaining greater ensemble control. The degree of entanglement present in these states, furthermore, serves as a benchmark for the strength of the atom-light interaction. Outside the broader context of quantum information processing with atomic ensembles, spin squeezed states have applications in metrology, where their quantum correlations can be harnessed to improve the precision of magnetometers and atomic clocks. This dissertation focuses upon the production of spin squeezed states in large ensembles of cold trapped alkali atoms interacting with optical fields. While most treatments of spin squeezing consider only the case in which the ensemble is composed of two level systems or qubits, we utilize the entire ground manifold of an alkali atom with hyperfine spin f greater or equal to 1/2, a qudit. Spin squeezing requires non-classical correlations between the constituent atomic spins, which are generated through the atoms' collective coupling to the light. Either through measurement or multiple interactions with the atoms, the light mediates an entangling interaction that produces quantum correlations. Because the spin squeezing treated in this dissertation ultimately originates from the coupling between the light and atoms, conventional approaches of improving this squeezing have focused on increasing the optical density of the ensemble. The greater number of internal degrees of freedom and the controllability of the spin-f ground hyperfine manifold enable novel methods of enhancing squeezing. In particular, we find that state preparation using control of the internal hyperfine spin increases the entangling power of squeezing protocols when f >1/2. Post-processing of the ensemble using additional internal spin control converts this entanglement into metrologically useful spin squeezing. By employing a variation of the Holstein-Primakoff approximation, in which the collective spin observables of the atomic ensemble are treated as quadratures of a bosonic mode, we model entanglement generation, spin squeezing and the effects of internal spin control. The Holstein-Primakoff formalism also enables us to take into account the decoherence of the ensemble due to optical pumping. While most works ignore or treat optical pumping phenomenologically, we employ a master equation derived from first principles. Our analysis shows that state preparation and the hyperfine spin size have a substantial impact upon both the generation of spin squeezing and the decoherence of the ensemble. Through a numerical search, we determine state preparations that enhance squeezing protocols while remaining robust to optical pumping. Finally, most work on spin squeezing in atomic ensembles has treated the light as a plane wave that couples identically to all atoms. In the final part of this dissertation, we go beyond the customary plane wave approximation on the light and employ focused paraxial beams, which are more efficiently mode matched to the radiation pattern of the atomic ensemble. The mathematical formalism and the internal spin control techniques that we applied in the plane wave case are generalized to accommodate the non-homogeneous paraxial probe. We find the optimal geometries of the atomic ensemble and the probe for mode matching and generation of spin squeezing.
On the v-representability of ensemble densities of electron systems
NASA Astrophysics Data System (ADS)
Gonis, A.; Däne, M.
2018-05-01
Analogously to the case at zero temperature, where the density of the ground state of an interacting many-particle system determines uniquely (within an arbitrary additive constant) the external potential acting on the system, the thermal average of the density over an ensemble defined by the Boltzmann distribution at the minimum of the thermodynamic potential, or the free energy, determines the external potential uniquely (and not just modulo a constant) acting on a system described by this thermodynamic potential or free energy. The paper describes a formal procedure that generates the domain of a constrained search over general ensembles (at zero or elevated temperatures) that lead to a given density, including as a special case a density thermally averaged at a given temperature, and in the case of a v-representable density determines the external potential leading to the ensemble density. As an immediate consequence of the general formalism, the concept of v-representability is extended beyond the hitherto discussed case of ground state densities to encompass excited states as well. Specific application to thermally averaged densities solves the v-representability problem in connection with the Mermin functional in a manner analogous to that in which this problem was recently settled with respect to the Hohenberg and Kohn functional. The main formalism is illustrated with numerical results for ensembles of one-dimensional, non-interacting systems of particles under a harmonic potential.
On the v-representability of ensemble densities of electron systems
Gonis, A.; Dane, M.
2017-12-30
Analogously to the case at zero temperature, where the density of the ground state of an interacting many-particle system determines uniquely (within an arbitrary additive constant) the external potential acting on the system, the thermal average of the density over an ensemble defined by the Boltzmann distribution at the minimum of the thermodynamic potential, or the free energy, determines the external potential uniquely (and not just modulo a constant) acting on a system described by this thermodynamic potential or free energy. The study describes a formal procedure that generates the domain of a constrained search over general ensembles (at zeromore » or elevated temperatures) that lead to a given density, including as a special case a density thermally averaged at a given temperature, and in the case of a v-representable density determines the external potential leading to the ensemble density. As an immediate consequence of the general formalism, the concept of v-representability is extended beyond the hitherto discussed case of ground state densities to encompass excited states as well. Specific application to thermally averaged densities solves the v-representability problem in connection with the Mermin functional in a manner analogous to that in which this problem was recently settled with respect to the Hohenberg and Kohn functional. Finally, the main formalism is illustrated with numerical results for ensembles of one-dimensional, non-interacting systems of particles under a harmonic potential.« less
Intercomparison and validation of the mixed layer depth fields of global ocean syntheses
NASA Astrophysics Data System (ADS)
Toyoda, Takahiro; Fujii, Yosuke; Kuragano, Tsurane; Kamachi, Masafumi; Ishikawa, Yoichi; Masuda, Shuhei; Sato, Kanako; Awaji, Toshiyuki; Hernandez, Fabrice; Ferry, Nicolas; Guinehut, Stéphanie; Martin, Matthew J.; Peterson, K. Andrew; Good, Simon A.; Valdivieso, Maria; Haines, Keith; Storto, Andrea; Masina, Simona; Köhl, Armin; Zuo, Hao; Balmaseda, Magdalena; Yin, Yonghong; Shi, Li; Alves, Oscar; Smith, Gregory; Chang, You-Soon; Vernieres, Guillaume; Wang, Xiaochun; Forget, Gael; Heimbach, Patrick; Wang, Ou; Fukumori, Ichiro; Lee, Tong
2017-08-01
Intercomparison and evaluation of the global ocean surface mixed layer depth (MLD) fields estimated from a suite of major ocean syntheses are conducted. Compared with the reference MLDs calculated from individual profiles, MLDs calculated from monthly mean and gridded profiles show negative biases of 10-20 m in early spring related to the re-stratification process of relatively deep mixed layers. Vertical resolution of profiles also influences the MLD estimation. MLDs are underestimated by approximately 5-7 (14-16) m with the vertical resolution of 25 (50) m when the criterion of potential density exceeding the 10-m value by 0.03 kg m-3 is used for the MLD estimation. Using the larger criterion (0.125 kg m-3) generally reduces the underestimations. In addition, positive biases greater than 100 m are found in wintertime subpolar regions when MLD criteria based on temperature are used. Biases of the reanalyses are due to both model errors and errors related to differences between the assimilation methods. The result shows that these errors are partially cancelled out through the ensemble averaging. Moreover, the bias in the ensemble mean field of the reanalyses is smaller than in the observation-only analyses. This is largely attributed to comparably higher resolutions of the reanalyses. The robust reproduction of both the seasonal cycle and interannual variability by the ensemble mean of the reanalyses indicates a great potential of the ensemble mean MLD field for investigating and monitoring upper ocean processes.
Microcanonical ensemble simulation method applied to discrete potential fluids
NASA Astrophysics Data System (ADS)
Sastre, Francisco; Benavides, Ana Laura; Torres-Arenas, José; Gil-Villegas, Alejandro
2015-09-01
In this work we extend the applicability of the microcanonical ensemble simulation method, originally proposed to study the Ising model [A. Hüller and M. Pleimling, Int. J. Mod. Phys. C 13, 947 (2002), 10.1142/S0129183102003693], to the case of simple fluids. An algorithm is developed by measuring the transition rates probabilities between macroscopic states, that has as advantage with respect to conventional Monte Carlo NVT (MC-NVT) simulations that a continuous range of temperatures are covered in a single run. For a given density, this new algorithm provides the inverse temperature, that can be parametrized as a function of the internal energy, and the isochoric heat capacity is then evaluated through a numerical derivative. As an illustrative example we consider a fluid composed of particles interacting via a square-well (SW) pair potential of variable range. Equilibrium internal energies and isochoric heat capacities are obtained with very high accuracy compared with data obtained from MC-NVT simulations. These results are important in the context of the application of the Hüller-Pleimling method to discrete-potential systems, that are based on a generalization of the SW and square-shoulder fluids properties.
Elsawy, Amr S; Eldawlatly, Seif; Taher, Mohamed; Aly, Gamal M
2014-01-01
The current trend to use Brain-Computer Interfaces (BCIs) with mobile devices mandates the development of efficient EEG data processing methods. In this paper, we demonstrate the performance of a Principal Component Analysis (PCA) ensemble classifier for P300-based spellers. We recorded EEG data from multiple subjects using the Emotiv neuroheadset in the context of a classical oddball P300 speller paradigm. We compare the performance of the proposed ensemble classifier to the performance of traditional feature extraction and classifier methods. Our results demonstrate the capability of the PCA ensemble classifier to classify P300 data recorded using the Emotiv neuroheadset with an average accuracy of 86.29% on cross-validation data. In addition, offline testing of the recorded data reveals an average classification accuracy of 73.3% that is significantly higher than that achieved using traditional methods. Finally, we demonstrate the effect of the parameters of the P300 speller paradigm on the performance of the method.
Ensemble perception of color in autistic adults.
Maule, John; Stanworth, Kirstie; Pellicano, Elizabeth; Franklin, Anna
2017-05-01
Dominant accounts of visual processing in autism posit that autistic individuals have an enhanced access to details of scenes [e.g., weak central coherence] which is reflected in a general bias toward local processing. Furthermore, the attenuated priors account of autism predicts that the updating and use of summary representations is reduced in autism. Ensemble perception describes the extraction of global summary statistics of a visual feature from a heterogeneous set (e.g., of faces, sizes, colors), often in the absence of local item representation. The present study investigated ensemble perception in autistic adults using a rapidly presented (500 msec) ensemble of four, eight, or sixteen elements representing four different colors. We predicted that autistic individuals would be less accurate when averaging the ensembles, but more accurate in recognizing individual ensemble colors. The results were consistent with the predictions. Averaging was impaired in autism, but only when ensembles contained four elements. Ensembles of eight or sixteen elements were averaged equally accurately across groups. The autistic group also showed a corresponding advantage in rejecting colors that were not originally seen in the ensemble. The results demonstrate the local processing bias in autism, but also suggest that the global perceptual averaging mechanism may be compromised under some conditions. The theoretical implications of the findings and future avenues for research on summary statistics in autism are discussed. Autism Res 2017, 10: 839-851. © 2016 International Society for Autism Research, Wiley Periodicals, Inc. © 2016 International Society for Autism Research, Wiley Periodicals, Inc.
Ensemble perception of color in autistic adults
Stanworth, Kirstie; Pellicano, Elizabeth; Franklin, Anna
2016-01-01
Dominant accounts of visual processing in autism posit that autistic individuals have an enhanced access to details of scenes [e.g., weak central coherence] which is reflected in a general bias toward local processing. Furthermore, the attenuated priors account of autism predicts that the updating and use of summary representations is reduced in autism. Ensemble perception describes the extraction of global summary statistics of a visual feature from a heterogeneous set (e.g., of faces, sizes, colors), often in the absence of local item representation. The present study investigated ensemble perception in autistic adults using a rapidly presented (500 msec) ensemble of four, eight, or sixteen elements representing four different colors. We predicted that autistic individuals would be less accurate when averaging the ensembles, but more accurate in recognizing individual ensemble colors. The results were consistent with the predictions. Averaging was impaired in autism, but only when ensembles contained four elements. Ensembles of eight or sixteen elements were averaged equally accurately across groups. The autistic group also showed a corresponding advantage in rejecting colors that were not originally seen in the ensemble. The results demonstrate the local processing bias in autism, but also suggest that the global perceptual averaging mechanism may be compromised under some conditions. The theoretical implications of the findings and future avenues for research on summary statistics in autism are discussed. Autism Res 2017, 10: 839–851. © 2016 The Authors Autism Research published by Wiley Periodicals, Inc. on behalf of International Society for Autism Research PMID:27874263
Performance assessment of individual and ensemble data-mining techniques for gully erosion modeling.
Pourghasemi, Hamid Reza; Yousefi, Saleh; Kornejady, Aiding; Cerdà, Artemi
2017-12-31
Gully erosion is identified as an important sediment source in a range of environments and plays a conclusive role in redistribution of eroded soils on a slope. Hence, addressing spatial occurrence pattern of this phenomenon is very important. Different ensemble models and their single counterparts, mostly data mining methods, have been used for gully erosion susceptibility mapping; however, their calibration and validation procedures need to be thoroughly addressed. The current study presents a series of individual and ensemble data mining methods including artificial neural network (ANN), support vector machine (SVM), maximum entropy (ME), ANN-SVM, ANN-ME, and SVM-ME to map gully erosion susceptibility in Aghemam watershed, Iran. To this aim, a gully inventory map along with sixteen gully conditioning factors was used. A 70:30% randomly partitioned sets were used to assess goodness-of-fit and prediction power of the models. The robustness, as the stability of models' performance in response to changes in the dataset, was assessed through three training/test replicates. As a result, conducted preliminary statistical tests showed that ANN has the highest concordance and spatial differentiation with a chi-square value of 36,656 at 95% confidence level, while the ME appeared to have the lowest concordance (1772). The ME model showed an impractical result where 45% of the study area was introduced as highly susceptible to gullying, in contrast, ANN-SVM indicated a practical result with focusing only on 34% of the study area. Through all three replicates, the ANN-SVM ensemble showed the highest goodness-of-fit and predictive power with a respective values of 0.897 (area under the success rate curve) and 0.879 (area under the prediction rate curve), on average, and correspondingly the highest robustness. This attests the important role of ensemble modeling in congruently building accurate and generalized models which emphasizes the necessity to examine different models integrations. The result of this study can prepare an outline for further biophysical designs on gullies scattered in the study area. Copyright © 2017 Elsevier B.V. All rights reserved.
Fractal morphology, imaging and mass spectrometry of single aerosol particles in flight.
Loh, N D; Hampton, C Y; Martin, A V; Starodub, D; Sierra, R G; Barty, A; Aquila, A; Schulz, J; Lomb, L; Steinbrener, J; Shoeman, R L; Kassemeyer, S; Bostedt, C; Bozek, J; Epp, S W; Erk, B; Hartmann, R; Rolles, D; Rudenko, A; Rudek, B; Foucar, L; Kimmel, N; Weidenspointner, G; Hauser, G; Holl, P; Pedersoli, E; Liang, M; Hunter, M S; Hunter, M M; Gumprecht, L; Coppola, N; Wunderer, C; Graafsma, H; Maia, F R N C; Ekeberg, T; Hantke, M; Fleckenstein, H; Hirsemann, H; Nass, K; White, T A; Tobias, H J; Farquar, G R; Benner, W H; Hau-Riege, S P; Reich, C; Hartmann, A; Soltau, H; Marchesini, S; Bajt, S; Barthelmess, M; Bucksbaum, P; Hodgson, K O; Strüder, L; Ullrich, J; Frank, M; Schlichting, I; Chapman, H N; Bogan, M J
2012-06-27
The morphology of micrometre-size particulate matter is of critical importance in fields ranging from toxicology to climate science, yet these properties are surprisingly difficult to measure in the particles' native environment. Electron microscopy requires collection of particles on a substrate; visible light scattering provides insufficient resolution; and X-ray synchrotron studies have been limited to ensembles of particles. Here we demonstrate an in situ method for imaging individual sub-micrometre particles to nanometre resolution in their native environment, using intense, coherent X-ray pulses from the Linac Coherent Light Source free-electron laser. We introduced individual aerosol particles into the pulsed X-ray beam, which is sufficiently intense that diffraction from individual particles can be measured for morphological analysis. At the same time, ion fragments ejected from the beam were analysed using mass spectrometry, to determine the composition of single aerosol particles. Our results show the extent of internal dilation symmetry of individual soot particles subject to non-equilibrium aggregation, and the surprisingly large variability in their fractal dimensions. More broadly, our methods can be extended to resolve both static and dynamic morphology of general ensembles of disordered particles. Such general morphology has implications in topics such as solvent accessibilities in proteins, vibrational energy transfer by the hydrodynamic interaction of amino acids, and large-scale production of nanoscale structures by flame synthesis.
Bayesian refinement of protein structures and ensembles against SAXS data using molecular dynamics
Shevchuk, Roman; Hub, Jochen S.
2017-01-01
Small-angle X-ray scattering is an increasingly popular technique used to detect protein structures and ensembles in solution. However, the refinement of structures and ensembles against SAXS data is often ambiguous due to the low information content of SAXS data, unknown systematic errors, and unknown scattering contributions from the solvent. We offer a solution to such problems by combining Bayesian inference with all-atom molecular dynamics simulations and explicit-solvent SAXS calculations. The Bayesian formulation correctly weights the SAXS data versus prior physical knowledge, it quantifies the precision or ambiguity of fitted structures and ensembles, and it accounts for unknown systematic errors due to poor buffer matching. The method further provides a probabilistic criterion for identifying the number of states required to explain the SAXS data. The method is validated by refining ensembles of a periplasmic binding protein against calculated SAXS curves. Subsequently, we derive the solution ensembles of the eukaryotic chaperone heat shock protein 90 (Hsp90) against experimental SAXS data. We find that the SAXS data of the apo state of Hsp90 is compatible with a single wide-open conformation, whereas the SAXS data of Hsp90 bound to ATP or to an ATP-analogue strongly suggest heterogenous ensembles of a closed and a wide-open state. PMID:29045407
Negative Correlation Learning for Customer Churn Prediction: A Comparison Study
Faris, Hossam
2015-01-01
Recently, telecommunication companies have been paying more attention toward the problem of identification of customer churn behavior. In business, it is well known for service providers that attracting new customers is much more expensive than retaining existing ones. Therefore, adopting accurate models that are able to predict customer churn can effectively help in customer retention campaigns and maximizing the profit. In this paper we will utilize an ensemble of Multilayer perceptrons (MLP) whose training is obtained using negative correlation learning (NCL) for predicting customer churn in a telecommunication company. Experiments results confirm that NCL based MLP ensemble can achieve better generalization performance (high churn rate) compared with ensemble of MLP without NCL (flat ensemble) and other common data mining techniques used for churn analysis. PMID:25879060
Probabilistic flood warning using grand ensemble weather forecasts
NASA Astrophysics Data System (ADS)
He, Y.; Wetterhall, F.; Cloke, H.; Pappenberger, F.; Wilson, M.; Freer, J.; McGregor, G.
2009-04-01
As the severity of floods increases, possibly due to climate and landuse change, there is urgent need for more effective and reliable warning systems. The incorporation of numerical weather predictions (NWP) into a flood warning system can increase forecast lead times from a few hours to a few days. A single NWP forecast from a single forecast centre, however, is insufficient as it involves considerable non-predictable uncertainties and can lead to a high number of false or missed warnings. An ensemble of weather forecasts from one Ensemble Prediction System (EPS), when used on catchment hydrology, can provide improved early flood warning as some of the uncertainties can be quantified. EPS forecasts from a single weather centre only account for part of the uncertainties originating from initial conditions and stochastic physics. Other sources of uncertainties, including numerical implementations and/or data assimilation, can only be assessed if a grand ensemble of EPSs from different weather centres is used. When various models that produce EPS from different weather centres are aggregated, the probabilistic nature of the ensemble precipitation forecasts can be better retained and accounted for. The availability of twelve global EPSs through the 'THORPEX Interactive Grand Global Ensemble' (TIGGE) offers a new opportunity for the design of an improved probabilistic flood forecasting framework. This work presents a case study using the TIGGE database for flood warning on a meso-scale catchment. The upper reach of the River Severn catchment located in the Midlands Region of England is selected due to its abundant data for investigation and its relatively small size (4062 km2) (compared to the resolution of the NWPs). This choice was deliberate as we hypothesize that the uncertainty in the forcing of smaller catchments cannot be represented by a single EPS with a very limited number of ensemble members, but only through the variance given by a large number ensembles and ensemble system. A coupled atmospheric-hydrologic-hydraulic cascade system driven by the TIGGE ensemble forecasts is set up to study the potential benefits of using the TIGGE database in early flood warning. Physically based and fully distributed LISFLOOD suite of models is selected to simulate discharge and flood inundation consecutively. The results show the TIGGE database is a promising tool to produce forecasts of discharge and flood inundation comparable with the observed discharge and simulated inundation driven by the observed discharge. The spread of discharge forecasts varies from centre to centre, but it is generally large, implying a significant level of uncertainties. Precipitation input uncertainties dominate and propagate through the cascade chain. The current NWPs fall short of representing the spatial variability of precipitation on a comparatively small catchment. This perhaps indicates the need to improve NWPs resolution and/or disaggregation techniques to narrow down the spatial gap between meteorology and hydrology. It is not necessarily true that early flood warning becomes more reliable when more ensemble forecasts are employed. It is difficult to identify the best forecast centre(s), but in general the chance of detecting floods is increased by using the TIGGE database. Only one flood event was studied because most of the TIGGE data became available after October 2007. It is necessary to test the TIGGE ensemble forecasts with other flood events in other catchments with different hydrological and climatic regimes before general conclusions can be made on its robustness and applicability.
Ensemble Downscaling of Winter Seasonal Forecasts: The MRED Project
NASA Astrophysics Data System (ADS)
Arritt, R. W.; Mred Team
2010-12-01
The Multi-Regional climate model Ensemble Downscaling (MRED) project is a multi-institutional project that is producing large ensembles of downscaled winter seasonal forecasts from coupled atmosphere-ocean seasonal prediction models. Eight regional climate models each are downscaling 15-member ensembles from the National Centers for Environmental Prediction (NCEP) Climate Forecast System (CFS) and the new NASA seasonal forecast system based on the GEOS5 atmospheric model coupled with the MOM4 ocean model. This produces 240-member ensembles, i.e., 8 regional models x 15 global ensemble members x 2 global models, for each winter season (December-April) of 1982-2003. Results to date show that combined global-regional downscaled forecasts have greatest skill for seasonal precipitation anomalies during strong El Niño events such as 1982-83 and 1997-98. Ensemble means of area-averaged seasonal precipitation for the regional models generally track the corresponding results for the global model, though there is considerable inter-model variability amongst the regional models. For seasons and regions where area mean precipitation is accurately simulated the regional models bring added value by extracting greater spatial detail from the global forecasts, mainly due to better resolution of terrain in the regional models. Our results also emphasize that an ensemble approach is essential to realizing the added value from the combined global-regional modeling system.
Cacha, L A; Parida, S; Dehuri, S; Cho, S-B; Poznanski, R R
2016-12-01
The huge number of voxels in fMRI over time poses a major challenge to for effective analysis. Fast, accurate, and reliable classifiers are required for estimating the decoding accuracy of brain activities. Although machine-learning classifiers seem promising, individual classifiers have their own limitations. To address this limitation, the present paper proposes a method based on the ensemble of neural networks to analyze fMRI data for cognitive state classification for application across multiple subjects. Similarly, the fuzzy integral (FI) approach has been employed as an efficient tool for combining different classifiers. The FI approach led to the development of a classifiers ensemble technique that performs better than any of the single classifier by reducing the misclassification, the bias, and the variance. The proposed method successfully classified the different cognitive states for multiple subjects with high accuracy of classification. Comparison of the performance improvement, while applying ensemble neural networks method, vs. that of the individual neural network strongly points toward the usefulness of the proposed method.
Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choe, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B; Gupta, Neha; Kohane, Isaac S; Green, Robert C; Kong, Sek Won
2014-08-01
As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false-negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous single nucleotide variants (SNVs); 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery in NA12878, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and an ensemble genotyping would be essential to minimize false-positive DNM candidates. © 2014 WILEY PERIODICALS, INC.
Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choi, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B.; Gupta, Neha; Kohane, Isaac S.; Green, Robert C.; Kong, Sek Won
2014-01-01
As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous SNVs; 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and ensemble genotyping would be essential to minimize false positive DNM candidates. PMID:24829188
NASA Astrophysics Data System (ADS)
Wang, Lei; Liu, Zhiwen; Miao, Qiang; Zhang, Xin
2018-06-01
Mode mixing resulting from intermittent signals is an annoying problem associated with the local mean decomposition (LMD) method. Based on noise-assisted approach, ensemble local mean decomposition (ELMD) method alleviates the mode mixing issue of LMD to some degree. However, the product functions (PFs) produced by ELMD often contain considerable residual noise, and thus a relatively large number of ensemble trials are required to eliminate the residual noise. Furthermore, since different realizations of Gaussian white noise are added to the original signal, different trials may generate different number of PFs, making it difficult to take ensemble mean. In this paper, a novel method is proposed called complete ensemble local mean decomposition with adaptive noise (CELMDAN) to solve these two problems. The method adds a particular and adaptive noise at every decomposition stage for each trial. Moreover, a unique residue is obtained after separating each PF, and the obtained residue is used as input for the next stage. Two simulated signals are analyzed to illustrate the advantages of CELMDAN in comparison to ELMD and CEEMDAN. To further demonstrate the efficiency of CELMDAN, the method is applied to diagnose faults for rolling bearings in an experimental case and an engineering case. The diagnosis results indicate that CELMDAN can extract more fault characteristic information with less interference than ELMD.
Uehara, Shota; Tanaka, Shigenori
2017-04-24
Protein flexibility is a major hurdle in current structure-based virtual screening (VS). In spite of the recent advances in high-performance computing, protein-ligand docking methods still demand tremendous computational cost to take into account the full degree of protein flexibility. In this context, ensemble docking has proven its utility and efficiency for VS studies, but it still needs a rational and efficient method to select and/or generate multiple protein conformations. Molecular dynamics (MD) simulations are useful to produce distinct protein conformations without abundant experimental structures. In this study, we present a novel strategy that makes use of cosolvent-based molecular dynamics (CMD) simulations for ensemble docking. By mixing small organic molecules into a solvent, CMD can stimulate dynamic protein motions and induce partial conformational changes of binding pocket residues appropriate for the binding of diverse ligands. The present method has been applied to six diverse target proteins and assessed by VS experiments using many actives and decoys of DEKOIS 2.0. The simulation results have revealed that the CMD is beneficial for ensemble docking. Utilizing cosolvent simulation allows the generation of druggable protein conformations, improving the VS performance compared with the use of a single experimental structure or ensemble docking by standard MD with pure water as the solvent.
Spectral statistics of the uni-modular ensemble
NASA Astrophysics Data System (ADS)
Joyner, Christopher H.; Smilansky, Uzy; Weidenmüller, Hans A.
2017-09-01
We investigate the spectral statistics of Hermitian matrices in which the elements are chosen uniformly from U(1) , called the uni-modular ensemble (UME), in the limit of large matrix size. Using three complimentary methods; a supersymmetric integration method, a combinatorial graph-theoretical analysis and a Brownian motion approach, we are able to derive expressions for 1 / N corrections to the mean spectral moments and also analyse the fluctuations about this mean. By addressing the same ensemble from three different point of view, we can critically compare their relative advantages and derive some new results.
NASA Astrophysics Data System (ADS)
Fayaz, S. M.; Rajanikant, G. K.
2014-07-01
Programmed cell death has been a fascinating area of research since it throws new challenges and questions in spite of the tremendous ongoing research in this field. Recently, necroptosis, a programmed form of necrotic cell death, has been implicated in many diseases including neurological disorders. Receptor interacting serine/threonine protein kinase 1 (RIPK1) is an important regulatory protein involved in the necroptosis and inhibition of this protein is essential to stop necroptotic process and eventually cell death. Current structure-based virtual screening methods involve a wide range of strategies and recently, considering the multiple protein structures for pharmacophore extraction has been emphasized as a way to improve the outcome. However, using the pharmacophoric information completely during docking is very important. Further, in such methods, using the appropriate protein structures for docking is desirable. If not, potential compound hits, obtained through pharmacophore-based screening, may not have correct ranks and scores after docking. Therefore, a comprehensive integration of different ensemble methods is essential, which may provide better virtual screening results. In this study, dual ensemble screening, a novel computational strategy was used to identify diverse and potent inhibitors against RIPK1. All the pharmacophore features present in the binding site were captured using both the apo and holo protein structures and an ensemble pharmacophore was built by combining these features. This ensemble pharmacophore was employed in pharmacophore-based screening of ZINC database. The compound hits, thus obtained, were subjected to ensemble docking. The leads acquired through docking were further validated through feature evaluation and molecular dynamics simulation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Man, Jun; Zhang, Jiangjiang; Li, Weixuan
2016-10-01
The ensemble Kalman filter (EnKF) has been widely used in parameter estimation for hydrological models. The focus of most previous studies was to develop more efficient analysis (estimation) algorithms. On the other hand, it is intuitively understandable that a well-designed sampling (data-collection) strategy should provide more informative measurements and subsequently improve the parameter estimation. In this work, a Sequential Ensemble-based Optimal Design (SEOD) method, coupled with EnKF, information theory and sequential optimal design, is proposed to improve the performance of parameter estimation. Based on the first-order and second-order statistics, different information metrics including the Shannon entropy difference (SD), degrees ofmore » freedom for signal (DFS) and relative entropy (RE) are used to design the optimal sampling strategy, respectively. The effectiveness of the proposed method is illustrated by synthetic one-dimensional and two-dimensional unsaturated flow case studies. It is shown that the designed sampling strategies can provide more accurate parameter estimation and state prediction compared with conventional sampling strategies. Optimal sampling designs based on various information metrics perform similarly in our cases. The effect of ensemble size on the optimal design is also investigated. Overall, larger ensemble size improves the parameter estimation and convergence of optimal sampling strategy. Although the proposed method is applied to unsaturated flow problems in this study, it can be equally applied in any other hydrological problems.« less
Interfacing broadband photonic qubits to on-chip cavity-protected rare-earth ensembles
Zhong, Tian; Kindem, Jonathan M.; Rochman, Jake; Faraon, Andrei
2017-01-01
Ensembles of solid-state optical emitters enable broadband quantum storage and transduction of photonic qubits, with applications in high-rate quantum networks for secure communications and interconnecting future quantum computers. To transfer quantum states using ensembles, rephasing techniques are used to mitigate fast decoherence resulting from inhomogeneous broadening, but these techniques generally limit the bandwidth, efficiency and active times of the quantum interface. Here, we use a dense ensemble of neodymium rare-earth ions strongly coupled to a nanophotonic resonator to demonstrate a significant cavity protection effect at the single-photon level—a technique to suppress ensemble decoherence due to inhomogeneous broadening. The protected Rabi oscillations between the cavity field and the atomic super-radiant state enable ultra-fast transfer of photonic frequency qubits to the ions (∼50 GHz bandwidth) followed by retrieval with 98.7% fidelity. With the prospect of coupling to other long-lived rare-earth spin states, this technique opens the possibilities for broadband, always-ready quantum memories and fast optical-to-microwave transducers. PMID:28090078
Ensemble stump classifiers and gene expression signatures in lung cancer.
Frey, Lewis; Edgerton, Mary; Fisher, Douglas; Levy, Shawn
2007-01-01
Microarray data sets for cancer tumor tissue generally have very few samples, each sample having thousands of probes (i.e., continuous variables). The sparsity of samples makes it difficult for machine learning techniques to discover probes relevant to the classification of tumor tissue. By combining data from different platforms (i.e., data sources), data sparsity is reduced, but this typically requires normalizing data from the different platforms, which can be non-trivial. This paper proposes a variant on the idea of ensemble learners to circumvent the need for normalization. To facilitate comprehension we build ensembles of very simple classifiers known as decision stumps--decision trees of one test each. The Ensemble Stump Classifier (ESC) identifies an mRNA signature having three probes and high accuracy for distinguishing between adenocarcinoma and squamous cell carcinoma of the lung across four data sets. In terms of accuracy, ESC outperforms a decision tree classifier on all four data sets, outperforms ensemble decision trees on three data sets, and simple stump classifiers on two data sets.
Ensemble of classifiers for confidence-rated classification of NDE signal
NASA Astrophysics Data System (ADS)
Banerjee, Portia; Safdarnejad, Seyed; Udpa, Lalita; Udpa, Satish
2016-02-01
Ensemble of classifiers in general, aims to improve classification accuracy by combining results from multiple weak hypotheses into a single strong classifier through weighted majority voting. Improved versions of ensemble of classifiers generate self-rated confidence scores which estimate the reliability of each of its prediction and boost the classifier using these confidence-rated predictions. However, such a confidence metric is based only on the rate of correct classification. In existing works, although ensemble of classifiers has been widely used in computational intelligence, the effect of all factors of unreliability on the confidence of classification is highly overlooked. With relevance to NDE, classification results are affected by inherent ambiguity of classifica-tion, non-discriminative features, inadequate training samples and noise due to measurement. In this paper, we extend the existing ensemble classification by maximizing confidence of every classification decision in addition to minimizing the classification error. Initial results of the approach on data from eddy current inspection show improvement in classification performance of defect and non-defect indications.
Interfacing broadband photonic qubits to on-chip cavity-protected rare-earth ensembles
NASA Astrophysics Data System (ADS)
Zhong, Tian; Kindem, Jonathan M.; Rochman, Jake; Faraon, Andrei
2017-01-01
Ensembles of solid-state optical emitters enable broadband quantum storage and transduction of photonic qubits, with applications in high-rate quantum networks for secure communications and interconnecting future quantum computers. To transfer quantum states using ensembles, rephasing techniques are used to mitigate fast decoherence resulting from inhomogeneous broadening, but these techniques generally limit the bandwidth, efficiency and active times of the quantum interface. Here, we use a dense ensemble of neodymium rare-earth ions strongly coupled to a nanophotonic resonator to demonstrate a significant cavity protection effect at the single-photon level--a technique to suppress ensemble decoherence due to inhomogeneous broadening. The protected Rabi oscillations between the cavity field and the atomic super-radiant state enable ultra-fast transfer of photonic frequency qubits to the ions (~50 GHz bandwidth) followed by retrieval with 98.7% fidelity. With the prospect of coupling to other long-lived rare-earth spin states, this technique opens the possibilities for broadband, always-ready quantum memories and fast optical-to-microwave transducers.
Efficient Transfer Entropy Analysis of Non-Stationary Neural Time Series
Vicente, Raul; Díaz-Pernas, Francisco J.; Wibral, Michael
2014-01-01
Information theory allows us to investigate information processing in neural systems in terms of information transfer, storage and modification. Especially the measure of information transfer, transfer entropy, has seen a dramatic surge of interest in neuroscience. Estimating transfer entropy from two processes requires the observation of multiple realizations of these processes to estimate associated probability density functions. To obtain these necessary observations, available estimators typically assume stationarity of processes to allow pooling of observations over time. This assumption however, is a major obstacle to the application of these estimators in neuroscience as observed processes are often non-stationary. As a solution, Gomez-Herrero and colleagues theoretically showed that the stationarity assumption may be avoided by estimating transfer entropy from an ensemble of realizations. Such an ensemble of realizations is often readily available in neuroscience experiments in the form of experimental trials. Thus, in this work we combine the ensemble method with a recently proposed transfer entropy estimator to make transfer entropy estimation applicable to non-stationary time series. We present an efficient implementation of the approach that is suitable for the increased computational demand of the ensemble method's practical application. In particular, we use a massively parallel implementation for a graphics processing unit to handle the computationally most heavy aspects of the ensemble method for transfer entropy estimation. We test the performance and robustness of our implementation on data from numerical simulations of stochastic processes. We also demonstrate the applicability of the ensemble method to magnetoencephalographic data. While we mainly evaluate the proposed method for neuroscience data, we expect it to be applicable in a variety of fields that are concerned with the analysis of information transfer in complex biological, social, and artificial systems. PMID:25068489
Currency crisis indication by using ensembles of support vector machine classifiers
NASA Astrophysics Data System (ADS)
Ramli, Nor Azuana; Ismail, Mohd Tahir; Wooi, Hooy Chee
2014-07-01
There are many methods that had been experimented in the analysis of currency crisis. However, not all methods could provide accurate indications. This paper introduces an ensemble of classifiers by using Support Vector Machine that's never been applied in analyses involving currency crisis before with the aim of increasing the indication accuracy. The proposed ensemble classifiers' performances are measured using percentage of accuracy, root mean squared error (RMSE), area under the Receiver Operating Characteristics (ROC) curve and Type II error. The performances of an ensemble of Support Vector Machine classifiers are compared with the single Support Vector Machine classifier and both of classifiers are tested on the data set from 27 countries with 12 macroeconomic indicators for each country. From our analyses, the results show that the ensemble of Support Vector Machine classifiers outperforms single Support Vector Machine classifier on the problem involving indicating a currency crisis in terms of a range of standard measures for comparing the performance of classifiers.
A variational ensemble scheme for noisy image data assimilation
NASA Astrophysics Data System (ADS)
Yang, Yin; Robinson, Cordelia; Heitz, Dominique; Mémin, Etienne
2014-05-01
Data assimilation techniques aim at recovering a system state variables trajectory denoted as X, along time from partially observed noisy measurements of the system denoted as Y. These procedures, which couple dynamics and noisy measurements of the system, fulfill indeed a twofold objective. On one hand, they provide a denoising - or reconstruction - procedure of the data through a given model framework and on the other hand, they provide estimation procedures for unknown parameters of the dynamics. A standard variational data assimilation problem can be formulated as the minimization of the following objective function with respect to the initial discrepancy, η, from the background initial guess: δ« J(η(x)) = 1∥Xb (x) - X (t ,x)∥2 + 1 tf∥H(X (t,x ))- Y (t,x)∥2dt. 2 0 0 B 2 t0 R (1) where the observation operator H links the state variable and the measurements. The cost function can be interpreted as the log likelihood function associated to the a posteriori distribution of the state given the past history of measurements and the background. In this work, we aim at studying ensemble based optimal control strategies for data assimilation. Such formulation nicely combines the ingredients of ensemble Kalman filters and variational data assimilation (4DVar). It is also formulated as the minimization of the objective function (1), but similarly to ensemble filter, it introduces in its objective function an empirical ensemble-based background-error covariance defined as: B ≡ <(Xb -
DOE Office of Scientific and Technical Information (OSTI.GOV)
D'Elia, M.; Edwards, H. C.; Hu, J.
Previous work has demonstrated that propagating groups of samples, called ensembles, together through forward simulations can dramatically reduce the aggregate cost of sampling-based uncertainty propagation methods [E. Phipps, M. D'Elia, H. C. Edwards, M. Hoemmen, J. Hu, and S. Rajamanickam, SIAM J. Sci. Comput., 39 (2017), pp. C162--C193]. However, critical to the success of this approach when applied to challenging problems of scientific interest is the grouping of samples into ensembles to minimize the total computational work. For example, the total number of linear solver iterations for ensemble systems may be strongly influenced by which samples form the ensemble whenmore » applying iterative linear solvers to parameterized and stochastic linear systems. In this paper we explore sample grouping strategies for local adaptive stochastic collocation methods applied to PDEs with uncertain input data, in particular canonical anisotropic diffusion problems where the diffusion coefficient is modeled by truncated Karhunen--Loève expansions. Finally, we demonstrate that a measure of the total anisotropy of the diffusion coefficient is a good surrogate for the number of linear solver iterations for each sample and therefore provides a simple and effective metric for grouping samples.« less
D'Elia, M.; Edwards, H. C.; Hu, J.; ...
2018-01-18
Previous work has demonstrated that propagating groups of samples, called ensembles, together through forward simulations can dramatically reduce the aggregate cost of sampling-based uncertainty propagation methods [E. Phipps, M. D'Elia, H. C. Edwards, M. Hoemmen, J. Hu, and S. Rajamanickam, SIAM J. Sci. Comput., 39 (2017), pp. C162--C193]. However, critical to the success of this approach when applied to challenging problems of scientific interest is the grouping of samples into ensembles to minimize the total computational work. For example, the total number of linear solver iterations for ensemble systems may be strongly influenced by which samples form the ensemble whenmore » applying iterative linear solvers to parameterized and stochastic linear systems. In this paper we explore sample grouping strategies for local adaptive stochastic collocation methods applied to PDEs with uncertain input data, in particular canonical anisotropic diffusion problems where the diffusion coefficient is modeled by truncated Karhunen--Loève expansions. Finally, we demonstrate that a measure of the total anisotropy of the diffusion coefficient is a good surrogate for the number of linear solver iterations for each sample and therefore provides a simple and effective metric for grouping samples.« less
NASA Astrophysics Data System (ADS)
Hacker, Joshua; Vandenberghe, Francois; Jung, Byoung-Jo; Snyder, Chris
2017-04-01
Effective assimilation of cloud-affected radiance observations from space-borne imagers, with the aim of improving cloud analysis and forecasting, has proven to be difficult. Large observation biases, nonlinear observation operators, and non-Gaussian innovation statistics present many challenges. Ensemble-variational data assimilation (EnVar) systems offer the benefits of flow-dependent background error statistics from an ensemble, and the ability of variational minimization to handle nonlinearity. The specific benefits of ensemble statistics, relative to static background errors more commonly used in variational systems, have not been quantified for the problem of assimilating cloudy radiances. A simple experiment framework is constructed with a regional NWP model and operational variational data assimilation system, to provide the basis understanding the importance of ensemble statistics in cloudy radiance assimilation. Restricting the observations to those corresponding to clouds in the background forecast leads to innovations that are more Gaussian. The number of large innovations is reduced compared to the more general case of all observations, but not eliminated. The Huber norm is investigated to handle the fat tails of the distributions, and allow more observations to be assimilated without the need for strict background checks that eliminate them. Comparing assimilation using only ensemble background error statistics with assimilation using only static background error statistics elucidates the importance of the ensemble statistics. Although the cost functions in both experiments converge to similar values after sufficient outer-loop iterations, the resulting cloud water, ice, and snow content are greater in the ensemble-based analysis. The subsequent forecasts from the ensemble-based analysis also retain more condensed water species, indicating that the local environment is more supportive of clouds. In this presentation we provide details that explain the apparent benefit from using ensembles for cloudy radiance assimilation in an EnVar context.
A Simple Approach to Account for Climate Model Interdependence in Multi-Model Ensembles
NASA Astrophysics Data System (ADS)
Herger, N.; Abramowitz, G.; Angelil, O. M.; Knutti, R.; Sanderson, B.
2016-12-01
Multi-model ensembles are an indispensable tool for future climate projection and its uncertainty quantification. Ensembles containing multiple climate models generally have increased skill, consistency and reliability. Due to the lack of agreed-on alternatives, most scientists use the equally-weighted multi-model mean as they subscribe to model democracy ("one model, one vote").Different research groups are known to share sections of code, parameterizations in their model, literature, or even whole model components. Therefore, individual model runs do not represent truly independent estimates. Ignoring this dependence structure might lead to a false model consensus, wrong estimation of uncertainty and effective number of independent models.Here, we present a way to partially address this problem by selecting a subset of CMIP5 model runs so that its climatological mean minimizes the RMSE compared to a given observation product. Due to the cancelling out of errors, regional biases in the ensemble mean are reduced significantly.Using a model-as-truth experiment we demonstrate that those regional biases persist into the future and we are not fitting noise, thus providing improved observationally-constrained projections of the 21st century. The optimally selected ensemble shows significantly higher global mean surface temperature projections than the original ensemble, where all the model runs are considered. Moreover, the spread is decreased well beyond that expected from the decreased ensemble size.Several previous studies have recommended an ensemble selection approach based on performance ranking of the model runs. Here, we show that this approach can perform even worse than randomly selecting ensemble members and can thus be harmful. We suggest that accounting for interdependence in the ensemble selection process is a necessary step for robust projections for use in impact assessments, adaptation and mitigation of climate change.
Sampling-based ensemble segmentation against inter-operator variability
NASA Astrophysics Data System (ADS)
Huo, Jing; Okada, Kazunori; Pope, Whitney; Brown, Matthew
2011-03-01
Inconsistency and a lack of reproducibility are commonly associated with semi-automated segmentation methods. In this study, we developed an ensemble approach to improve reproducibility and applied it to glioblastoma multiforme (GBM) brain tumor segmentation on T1-weigted contrast enhanced MR volumes. The proposed approach combines samplingbased simulations and ensemble segmentation into a single framework; it generates a set of segmentations by perturbing user initialization and user-specified internal parameters, then fuses the set of segmentations into a single consensus result. Three combination algorithms were applied: majority voting, averaging and expectation-maximization (EM). The reproducibility of the proposed framework was evaluated by a controlled experiment on 16 tumor cases from a multicenter drug trial. The ensemble framework had significantly better reproducibility than the individual base Otsu thresholding method (p<.001).
Ensemble approach for differentiation of malignant melanoma
NASA Astrophysics Data System (ADS)
Rastgoo, Mojdeh; Morel, Olivier; Marzani, Franck; Garcia, Rafael
2015-04-01
Melanoma is the deadliest type of skin cancer, yet it is the most treatable kind depending on its early diagnosis. The early prognosis of melanoma is a challenging task for both clinicians and dermatologists. Due to the importance of early diagnosis and in order to assist the dermatologists, we propose an automated framework based on ensemble learning methods and dermoscopy images to differentiate melanoma from dysplastic and benign lesions. The evaluation of our framework on the recent and public dermoscopy benchmark (PH2 dataset) indicates the potential of proposed method. Our evaluation, using only global features, revealed that ensembles such as random forest perform better than single learner. Using random forest ensemble and combination of color and texture features, our framework achieved the highest sensitivity of 94% and specificity of 92%.
Petit and grand ensemble Monte Carlo calculations of the thermodynamics of the lattice gas
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murch, G.E.; Thorn, R.J.
1978-11-01
A direct Monte Carlo method for estimating the chemical potential in the petit canonical ensemble was applied to the simple cubic Ising-like lattice gas. The method is based on a simple relationship between the chemical potential and the potential energy distribution in a lattice gas at equilibrium as derived independently by Widom, and Jackson and Klein. Results are presented here for the chemical potential at various compositions and temperatures above and below the zero field ferromagnetic and antiferromagnetic critical points. The same lattice gas model was reconstructed in the form of a restricted grand canonical ensemble and results at severalmore » temperatures were compared with those from the petit canonical ensemble. The agreement was excellent in these cases.« less
Sequence Based Prediction of Antioxidant Proteins Using a Classifier Selection Strategy
Zhang, Lina; Zhang, Chengjin; Gao, Rui; Yang, Runtao; Song, Qing
2016-01-01
Antioxidant proteins perform significant functions in maintaining oxidation/antioxidation balance and have potential therapies for some diseases. Accurate identification of antioxidant proteins could contribute to revealing physiological processes of oxidation/antioxidation balance and developing novel antioxidation-based drugs. In this study, an ensemble method is presented to predict antioxidant proteins with hybrid features, incorporating SSI (Secondary Structure Information), PSSM (Position Specific Scoring Matrix), RSA (Relative Solvent Accessibility), and CTD (Composition, Transition, Distribution). The prediction results of the ensemble predictor are determined by an average of prediction results of multiple base classifiers. Based on a classifier selection strategy, we obtain an optimal ensemble classifier composed of RF (Random Forest), SMO (Sequential Minimal Optimization), NNA (Nearest Neighbor Algorithm), and J48 with an accuracy of 0.925. A Relief combined with IFS (Incremental Feature Selection) method is adopted to obtain optimal features from hybrid features. With the optimal features, the ensemble method achieves improved performance with a sensitivity of 0.95, a specificity of 0.93, an accuracy of 0.94, and an MCC (Matthew’s Correlation Coefficient) of 0.880, far better than the existing method. To evaluate the prediction performance objectively, the proposed method is compared with existing methods on the same independent testing dataset. Encouragingly, our method performs better than previous studies. In addition, our method achieves more balanced performance with a sensitivity of 0.878 and a specificity of 0.860. These results suggest that the proposed ensemble method can be a potential candidate for antioxidant protein prediction. For public access, we develop a user-friendly web server for antioxidant protein identification that is freely accessible at http://antioxidant.weka.cc. PMID:27662651
NASA Astrophysics Data System (ADS)
Noh, S. J.; Rakovec, O.; Kumar, R.; Samaniego, L. E.
2015-12-01
Accurate and reliable streamflow prediction is essential to mitigate social and economic damage coming from water-related disasters such as flood and drought. Sequential data assimilation (DA) may facilitate improved streamflow prediction using real-time observations to correct internal model states. In conventional DA methods such as state updating, parametric uncertainty is often ignored mainly due to practical limitations of methodology to specify modeling uncertainty with limited ensemble members. However, if parametric uncertainty related with routing and runoff components is not incorporated properly, predictive uncertainty by model ensemble may be insufficient to capture dynamics of observations, which may deteriorate predictability. Recently, a multi-scale parameter regionalization (MPR) method was proposed to make hydrologic predictions at different scales using a same set of model parameters without losing much of the model performance. The MPR method incorporated within the mesoscale hydrologic model (mHM, http://www.ufz.de/mhm) could effectively represent and control uncertainty of high-dimensional parameters in a distributed model using global parameters. In this study, we evaluate impacts of streamflow data assimilation over European river basins. Especially, a multi-parametric ensemble approach is tested to consider the effects of parametric uncertainty in DA. Because augmentation of parameters is not required within an assimilation window, the approach could be more stable with limited ensemble members and have potential for operational uses. To consider the response times and non-Gaussian characteristics of internal hydrologic processes, lagged particle filtering is utilized. The presentation will be focused on gains and limitations of streamflow data assimilation and multi-parametric ensemble method over large-scale basins.
Uncertainty Quantification in Alchemical Free Energy Methods.
Bhati, Agastya P; Wan, Shunzhou; Hu, Yuan; Sherborne, Brad; Coveney, Peter V
2018-06-12
Alchemical free energy methods have gained much importance recently from several reports of improved ligand-protein binding affinity predictions based on their implementation using molecular dynamics simulations. A large number of variants of such methods implementing different accelerated sampling techniques and free energy estimators are available, each claimed to be better than the others in its own way. However, the key features of reproducibility and quantification of associated uncertainties in such methods have barely been discussed. Here, we apply a systematic protocol for uncertainty quantification to a number of popular alchemical free energy methods, covering both absolute and relative free energy predictions. We show that a reliable measure of error estimation is provided by ensemble simulation-an ensemble of independent MD simulations-which applies irrespective of the free energy method. The need to use ensemble methods is fundamental and holds regardless of the duration of time of the molecular dynamics simulations performed.
ERIC Educational Resources Information Center
Healy, Daniel J.
2014-01-01
The jazz ensemble represents an important performance opportunity in many school music programs. Due to the cultural history of jazz as an improvisatory art form, school jazz ensemble directors must address methods of teaching improvisation concepts to young students. Progress has been made in the field of prescribed improvisation activities and…
xEMD procedures as a data - Assisted filtering method
NASA Astrophysics Data System (ADS)
Machrowska, Anna; Jonak, Józef
2018-01-01
The article presents the possibility of using Empirical Mode Decomposition (EMD), Ensemble Empirical Mode Decomposition (EEMD), Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) and Improved Complete Ensemble Empirical Mode Decomposition (ICEEMD) algorithms for mechanical system condition monitoring applications. There were presented the results of the xEMD procedures used for vibration signals of system in different states of wear.
Dominating Scale-Free Networks Using Generalized Probabilistic Methods
Molnár,, F.; Derzsy, N.; Czabarka, É.; Székely, L.; Szymanski, B. K.; Korniss, G.
2014-01-01
We study ensemble-based graph-theoretical methods aiming to approximate the size of the minimum dominating set (MDS) in scale-free networks. We analyze both analytical upper bounds of dominating sets and numerical realizations for applications. We propose two novel probabilistic dominating set selection strategies that are applicable to heterogeneous networks. One of them obtains the smallest probabilistic dominating set and also outperforms the deterministic degree-ranked method. We show that a degree-dependent probabilistic selection method becomes optimal in its deterministic limit. In addition, we also find the precise limit where selecting high-degree nodes exclusively becomes inefficient for network domination. We validate our results on several real-world networks, and provide highly accurate analytical estimates for our methods. PMID:25200937
NASA Technical Reports Server (NTRS)
Abeles, F. J.
1980-01-01
Each of the subsystems comprising the protective ensemble for firefighters is described. These include: (1) the garment system which includes turnout gear, helmets, faceshields, coats, pants, gloves, and boots; (2) the self-contained breathing system; (3) the lighting system; and (4) the communication system. The design selection rationale is discussed and the drawings used to fabricate the prototype ensemble are provided. The specifications presented were developed using the requirements and test method of the protective ensemble standard. Approximate retail prices are listed.
Causal network in a deafferented non-human primate brain.
Balasubramanian, Karthikeyan; Takahashi, Kazutaka; Hatsopoulos, Nicholas G
2015-01-01
De-afferented/efferented neural ensembles can undergo causal changes when interfaced to neuroprosthetic devices. These changes occur via recruitment or isolation of neurons, alterations in functional connectivity within the ensemble and/or changes in the role of neurons, i.e., excitatory/inhibitory. In this work, emergence of a causal network and changes in the dynamics are demonstrated for a deafferented brain region exposed to BMI (brain-machine interface) learning. The BMI was controlling a robot for reach-and-grasp behavior. And, the motor cortical regions used for the BMI were deafferented due to chronic amputation, and ensembles of neurons were decoded for velocity control of the multi-DOF robot. A generalized linear model-framework based Granger causality (GLM-GC) technique was used in estimating the ensemble connectivity. Model selection was based on the AIC (Akaike Information Criterion).
Nonuniform fluids in the grand canonical ensemble
DOE Office of Scientific and Technical Information (OSTI.GOV)
Percus, J.K.
1982-01-01
Nonuniform simple classical fluids are considered quite generally. The grand canonical ensemble is particularly suitable, conceptually, in the leading approximation of local thermodynamics, which figuratively divides the system into approximately uniform spatial subsystems. The procedure is reviewed by which this approach is systematically corrected for slowly varying density profiles, and a model is suggested that carries the correction into the domain of local fluctuations. The latter is assessed for substrate bounded fluids, as well as for two-phase interfaces. The peculiarities of the grand ensemble in a two-phase region stem from the inherent very large number fluctuations. A primitive model showsmore » how these are quenched in the canonical ensemble. This is taken advantage of by applying the Kac-Siegert representation of the van der Waals decomposition with petit canonical corrections, to the two-phase regime.« less
Extracting Drug-Drug Interactions with Word and Character-Level Recurrent Neural Networks
Kavuluru, Ramakanth; Rios, Anthony; Tran, Tung
2017-01-01
Drug-drug interactions (DDIs) are known to be responsible for nearly a third of all adverse drug reactions. Hence several current efforts focus on extracting signal from EMRs to prioritize DDIs that need further exploration. To this end, being able to extract explicit mentions of DDIs in free text narratives is an important task. In this paper, we explore recurrent neural network (RNN) architectures to detect and classify DDIs from unstructured text using the DDIExtraction dataset from the SemEval 2013 (task 9) shared task. Our methods are in line with those used in other recent deep learning efforts for relation extraction including DDI extraction. However, to our knowledge, we are the first to investigate the potential of character-level RNNs (Char-RNNs) for DDI extraction (and relation extraction in general). Furthermore, we explore a simple but effective model bootstrapping method to (a). build model averaging ensembles, (b). derive confidence intervals around mean micro-F scores (MMF), and (c). assess the average behavior of our methods. Without any rule based filtering of negative examples, a popular heuristic used by most earlier efforts, we achieve an MMF of 69.13. By adding simple replicable heuristics to filter negative instances we are able to achieve an MMF of 70.38. Furthermore, our best ensembles produce micro F-scores of 70.81 (without filtering) and 72.13 (with filtering), which are superior to metrics reported in published results. Although Char-RNNs turnout to be inferior to regular word based RNN models in overall comparisons, we find that ensembling models from both architectures results in nontrivial gains over simply using either alone, indicating that they complement each other. PMID:29034375
2015-11-10
of the ensemble method o the estimation of sensitivities was demonstrated in meteorological Ancell and Hakim, 2007 ; Torn and Hakim, 2008) and...to predetermined low- dimensional subspaces spanned either by the reduced-order approx- imations of the model Green’s functions ( Stammer and Wunsch...2005; Qui et al., 2007 ; Hoteit, 2008). In fact, the 4dEnVar technique pursues a similar, but more general approach, pa- rameterizing the search
NASA Technical Reports Server (NTRS)
Keppenne, C. L.; Rienecker, M.; Borovikov, A. Y.
1999-01-01
Two massively parallel data assimilation systems in which the model forecast-error covariances are estimated from the distribution of an ensemble of model integrations are applied to the assimilation of 97-98 TOPEX/POSEIDON altimetry and TOGA/TAO temperature data into a Pacific basin version the NASA Seasonal to Interannual Prediction Project (NSIPP)ls quasi-isopycnal ocean general circulation model. in the first system, ensemble of model runs forced by an ensemble of atmospheric model simulations is used to calculate asymptotic error statistics. The data assimilation then occurs in the reduced phase space spanned by the corresponding leading empirical orthogonal functions. The second system is an ensemble Kalman filter in which new error statistics are computed during each assimilation cycle from the time-dependent ensemble distribution. The data assimilation experiments are conducted on NSIPP's 512-processor CRAY T3E. The two data assimilation systems are validated by withholding part of the data and quantifying the extent to which the withheld information can be inferred from the assimilation of the remaining data. The pros and cons of each system are discussed.
Quantifying rapid changes in cardiovascular state with a moving ensemble average.
Cieslak, Matthew; Ryan, William S; Babenko, Viktoriya; Erro, Hannah; Rathbun, Zoe M; Meiring, Wendy; Kelsey, Robert M; Blascovich, Jim; Grafton, Scott T
2018-04-01
MEAP, the moving ensemble analysis pipeline, is a new open-source tool designed to perform multisubject preprocessing and analysis of cardiovascular data, including electrocardiogram (ECG), impedance cardiogram (ICG), and continuous blood pressure (BP). In addition to traditional ensemble averaging, MEAP implements a moving ensemble averaging method that allows for the continuous estimation of indices related to cardiovascular state, including cardiac output, preejection period, heart rate variability, and total peripheral resistance, among others. Here, we define the moving ensemble technique mathematically, highlighting its differences from fixed-window ensemble averaging. We describe MEAP's interface and features for signal processing, artifact correction, and cardiovascular-based fMRI analysis. We demonstrate the accuracy of MEAP's novel B point detection algorithm on a large collection of hand-labeled ICG waveforms. As a proof of concept, two subjects completed a series of four physical and cognitive tasks (cold pressor, Valsalva maneuver, video game, random dot kinetogram) on 3 separate days while ECG, ICG, and BP were recorded. Critically, the moving ensemble method reliably captures the rapid cyclical cardiovascular changes related to the baroreflex during the Valsalva maneuver and the classic cold pressor response. Cardiovascular measures were seen to vary considerably within repetitions of the same cognitive task for each individual, suggesting that a carefully designed paradigm could be used to capture fast-acting event-related changes in cardiovascular state. © 2017 Society for Psychophysiological Research.
SVM and SVM Ensembles in Breast Cancer Prediction.
Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong
2017-01-01
Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.
SVM and SVM Ensembles in Breast Cancer Prediction
Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong
2017-01-01
Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers. PMID:28060807
Information-Theoretic Uncertainty of SCFG-Modeled Folding Space of The Non-coding RNA
Manzourolajdad, Amirhossein; Wang, Yingfeng; Shaw, Timothy I.; Malmberg, Russell L.
2012-01-01
RNA secondary structure ensembles define probability distributions for alternative equilibrium secondary structures of an RNA sequence. Shannon’s Entropy is a measure for the amount of diversity present in any ensemble. In this work, Shannon’s entropy of the SCFG ensemble on an RNA sequence is derived and implemented in polynomial time for both structurally ambiguous and unambiguous grammars. Micro RNA sequences generally have low folding entropy, as previously discovered. Surprisingly, signs of significantly high folding entropy were observed in certain ncRNA families. More effective models coupled with targeted randomization tests can lead to a better insight into folding features of these families. PMID:23160142
Alves, Pedro; Liu, Shuang; Wang, Daifeng; Gerstein, Mark
2018-01-01
Machine learning is an integral part of computational biology, and has already shown its use in various applications, such as prognostic tests. In the last few years in the non-biological machine learning community, ensembling techniques have shown their power in data mining competitions such as the Netflix challenge; however, such methods have not found wide use in computational biology. In this work, we endeavor to show how ensembling techniques can be applied to practical problems, including problems in the field of bioinformatics, and how they often outperform other machine learning techniques in both predictive power and robustness. Furthermore, we develop a methodology of ensembling, Multi-Swarm Ensemble (MSWE) by using multiple particle swarm optimizations and demonstrate its ability to further enhance the performance of ensembles.
Discrete post-processing of total cloud cover ensemble forecasts
NASA Astrophysics Data System (ADS)
Hemri, Stephan; Haiden, Thomas; Pappenberger, Florian
2017-04-01
This contribution presents an approach to post-process ensemble forecasts for the discrete and bounded weather variable of total cloud cover. Two methods for discrete statistical post-processing of ensemble predictions are tested. The first approach is based on multinomial logistic regression, the second involves a proportional odds logistic regression model. Applying them to total cloud cover raw ensemble forecasts from the European Centre for Medium-Range Weather Forecasts improves forecast skill significantly. Based on station-wise post-processing of raw ensemble total cloud cover forecasts for a global set of 3330 stations over the period from 2007 to early 2014, the more parsimonious proportional odds logistic regression model proved to slightly outperform the multinomial logistic regression model. Reference Hemri, S., Haiden, T., & Pappenberger, F. (2016). Discrete post-processing of total cloud cover ensemble forecasts. Monthly Weather Review 144, 2565-2577.
Ensembles of NLP Tools for Data Element Extraction from Clinical Notes
Kuo, Tsung-Ting; Rao, Pallavi; Maehara, Cleo; Doan, Son; Chaparro, Juan D.; Day, Michele E.; Farcas, Claudiu; Ohno-Machado, Lucila; Hsu, Chun-Nan
2016-01-01
Natural Language Processing (NLP) is essential for concept extraction from narrative text in electronic health records (EHR). To extract numerous and diverse concepts, such as data elements (i.e., important concepts related to a certain medical condition), a plausible solution is to combine various NLP tools into an ensemble to improve extraction performance. However, it is unclear to what extent ensembles of popular NLP tools improve the extraction of numerous and diverse concepts. Therefore, we built an NLP ensemble pipeline to synergize the strength of popular NLP tools using seven ensemble methods, and to quantify the improvement in performance achieved by ensembles in the extraction of data elements for three very different cohorts. Evaluation results show that the pipeline can improve the performance of NLP tools, but there is high variability depending on the cohort. PMID:28269947
Ensembles of NLP Tools for Data Element Extraction from Clinical Notes.
Kuo, Tsung-Ting; Rao, Pallavi; Maehara, Cleo; Doan, Son; Chaparro, Juan D; Day, Michele E; Farcas, Claudiu; Ohno-Machado, Lucila; Hsu, Chun-Nan
2016-01-01
Natural Language Processing (NLP) is essential for concept extraction from narrative text in electronic health records (EHR). To extract numerous and diverse concepts, such as data elements (i.e., important concepts related to a certain medical condition), a plausible solution is to combine various NLP tools into an ensemble to improve extraction performance. However, it is unclear to what extent ensembles of popular NLP tools improve the extraction of numerous and diverse concepts. Therefore, we built an NLP ensemble pipeline to synergize the strength of popular NLP tools using seven ensemble methods, and to quantify the improvement in performance achieved by ensembles in the extraction of data elements for three very different cohorts. Evaluation results show that the pipeline can improve the performance of NLP tools, but there is high variability depending on the cohort.
Improving land resource evaluation using fuzzy neural network ensembles
Xue, Yue-Ju; HU, Y.-M.; Liu, S.-G.; YANG, J.-F.; CHEN, Q.-C.; BAO, S.-T.
2007-01-01
Land evaluation factors often contain continuous-, discrete- and nominal-valued attributes. In traditional land evaluation, these different attributes are usually graded into categorical indexes by land resource experts, and the evaluation results rely heavily on experts' experiences. In order to overcome the shortcoming, we presented a fuzzy neural network ensemble method that did not require grading the evaluation factors into categorical indexes and could evaluate land resources by using the three kinds of attribute values directly. A fuzzy back propagation neural network (BPNN), a fuzzy radial basis function neural network (RBFNN), a fuzzy BPNN ensemble, and a fuzzy RBFNN ensemble were used to evaluate the land resources in Guangdong Province. The evaluation results by using the fuzzy BPNN ensemble and the fuzzy RBFNN ensemble were much better than those by using the single fuzzy BPNN and the single fuzzy RBFNN, and the error rate of the single fuzzy RBFNN or fuzzy RBFNN ensemble was lower than that of the single fuzzy BPNN or fuzzy BPNN ensemble, respectively. By using the fuzzy neural network ensembles, the validity of land resource evaluation was improved and reliance on land evaluators' experiences was considerably reduced. ?? 2007 Soil Science Society of China.
Multiscale Monte Carlo equilibration: Two-color QCD with two fermion flavors
Detmold, William; Endres, Michael G.
2016-12-02
In this study, we demonstrate the applicability of a recently proposed multiscale thermalization algorithm to two-color quantum chromodynamics (QCD) with two mass-degenerate fermion flavors. The algorithm involves refining an ensemble of gauge configurations that had been generated using a renormalization group (RG) matched coarse action, thereby producing a fine ensemble that is close to the thermalized distribution of a target fine action; the refined ensemble is subsequently rethermalized using conventional algorithms. Although the generalization of this algorithm from pure Yang-Mills theory to QCD with dynamical fermions is straightforward, we find that in the latter case, the method is susceptible tomore » numerical instabilities during the initial stages of rethermalization when using the hybrid Monte Carlo algorithm. We find that these instabilities arise from large fermion forces in the evolution, which are attributed to an accumulation of spurious near-zero modes of the Dirac operator. We propose a simple strategy for curing this problem, and demonstrate that rapid thermalization--as probed by a variety of gluonic and fermionic operators--is possible with the use of this solution. Also, we study the sensitivity of rethermalization rates to the RG matching of the coarse and fine actions, and identify effective matching conditions based on a variety of measured scales.« less
NASA Astrophysics Data System (ADS)
Zhumagulov, Yaroslav V.; Krasavin, Andrey V.; Kashurnikov, Vladimir A.
2018-05-01
The method is developed for calculation of electronic properties of an ensemble of metal nanoclusters with the use of cluster perturbation theory. This method is applied to the system of gold nanoclusters. The Greens function of single nanocluster is obtained by ab initio calculations within the framework of the density functional theory, and then is used in Dyson equation to group nanoclusters together and to compute the Greens function as well as the electron density of states of the whole ensemble. The transition from insulator state of a single nanocluster to metallic state of bulk gold is observed.
The Principle of Energetic Consistency
NASA Technical Reports Server (NTRS)
Cohn, Stephen E.
2009-01-01
A basic result in estimation theory is that the minimum variance estimate of the dynamical state, given the observations, is the conditional mean estimate. This result holds independently of the specifics of any dynamical or observation nonlinearity or stochasticity, requiring only that the probability density function of the state, conditioned on the observations, has two moments. For nonlinear dynamics that conserve a total energy, this general result implies the principle of energetic consistency: if the dynamical variables are taken to be the natural energy variables, then the sum of the total energy of the conditional mean and the trace of the conditional covariance matrix (the total variance) is constant between observations. Ensemble Kalman filtering methods are designed to approximate the evolution of the conditional mean and covariance matrix. For them the principle of energetic consistency holds independently of ensemble size, even with covariance localization. However, full Kalman filter experiments with advection dynamics have shown that a small amount of numerical dissipation can cause a large, state-dependent loss of total variance, to the detriment of filter performance. The principle of energetic consistency offers a simple way to test whether this spurious loss of variance limits ensemble filter performance in full-blown applications. The classical second-moment closure (third-moment discard) equations also satisfy the principle of energetic consistency, independently of the rank of the conditional covariance matrix. Low-rank approximation of these equations offers an energetically consistent, computationally viable alternative to ensemble filtering. Current formulations of long-window, weak-constraint, four-dimensional variational methods are designed to approximate the conditional mode rather than the conditional mean. Thus they neglect the nonlinear bias term in the second-moment closure equation for the conditional mean. The principle of energetic consistency implies that, to precisely the extent that growing modes are important in data assimilation, this term is also important.
Arshad, Sannia; Rho, Seungmin
2014-01-01
We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes. PMID:25295302
Khalid, Shehzad; Arshad, Sannia; Jabbar, Sohail; Rho, Seungmin
2014-01-01
We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes.
Ensemble method for dengue prediction
Baugher, Benjamin; Moniz, Linda J.; Bagley, Thomas; Babin, Steven M.; Guven, Erhan
2018-01-01
Background In the 2015 NOAA Dengue Challenge, participants made three dengue target predictions for two locations (Iquitos, Peru, and San Juan, Puerto Rico) during four dengue seasons: 1) peak height (i.e., maximum weekly number of cases during a transmission season; 2) peak week (i.e., week in which the maximum weekly number of cases occurred); and 3) total number of cases reported during a transmission season. A dengue transmission season is the 12-month period commencing with the location-specific, historical week with the lowest number of cases. At the beginning of the Dengue Challenge, participants were provided with the same input data for developing the models, with the prediction testing data provided at a later date. Methods Our approach used ensemble models created by combining three disparate types of component models: 1) two-dimensional Method of Analogues models incorporating both dengue and climate data; 2) additive seasonal Holt-Winters models with and without wavelet smoothing; and 3) simple historical models. Of the individual component models created, those with the best performance on the prior four years of data were incorporated into the ensemble models. There were separate ensembles for predicting each of the three targets at each of the two locations. Principal findings Our ensemble models scored higher for peak height and total dengue case counts reported in a transmission season for Iquitos than all other models submitted to the Dengue Challenge. However, the ensemble models did not do nearly as well when predicting the peak week. Conclusions The Dengue Challenge organizers scored the dengue predictions of the Challenge participant groups. Our ensemble approach was the best in predicting the total number of dengue cases reported for transmission season and peak height for Iquitos, Peru. PMID:29298320
Heat Strain Evaluation of U.S. Navy Steam Suit Ensembles
2016-05-01
method for measuring the thermal insulation of clothing using a heated manikin. West Conshohocken, PA: ASTM International. 2. Castellani, J.W., Young...TECHNICAL REPORT NO. T16-13 DATE May 2016 ADA HEAT STRAIN EVALUATION OF U.S. NAVY STEAM SUIT ENSEMBLES DISCLAIMER The opinions or...USARIEM TECHNICAL REPORT T16-13 HEAT STRAIN EVALUATION OF U.S. NAVY STEAM SUIT ENSEMBLES
Representation of photon limited data in emission tomography using origin ensembles
NASA Astrophysics Data System (ADS)
Sitek, A.
2008-06-01
Representation and reconstruction of data obtained by emission tomography scanners are challenging due to high noise levels in the data. Typically, images obtained using tomographic measurements are represented using grids. In this work, we define images as sets of origins of events detected during tomographic measurements; we call these origin ensembles (OEs). A state in the ensemble is characterized by a vector of 3N parameters Y, where the parameters are the coordinates of origins of detected events in a three-dimensional space and N is the number of detected events. The 3N-dimensional probability density function (PDF) for that ensemble is derived, and we present an algorithm for OE image estimation from tomographic measurements. A displayable image (e.g. grid based image) is derived from the OE formulation by calculating ensemble expectations based on the PDF using the Markov chain Monte Carlo method. The approach was applied to computer-simulated 3D list-mode positron emission tomography data. The reconstruction errors for a 10 000 000 event acquisition for simulated ranged from 0.1 to 34.8%, depending on object size and sampling density. The method was also applied to experimental data and the results of the OE method were consistent with those obtained by a standard maximum-likelihood approach. The method is a new approach to representation and reconstruction of data obtained by photon-limited emission tomography measurements.
Quasi-most unstable modes: a window to 'À la carte' ensemble diversity?
NASA Astrophysics Data System (ADS)
Homar Santaner, Victor; Stensrud, David J.
2010-05-01
The atmospheric scientific community is nowadays facing the ambitious challenge of providing useful forecasts of atmospheric events that produce high societal impact. The low level of social resilience to false alarms creates tremendous pressure on forecasting offices to issue accurate, timely and reliable warnings.Currently, no operational numerical forecasting system is able to respond to the societal demand for high-resolution (in time and space) predictions in the 12-72h time span. The main reasons for such deficiencies are the lack of adequate observations and the high non-linearity of the numerical models that are currently used. The whole weather forecasting problem is intrinsically probabilistic and current methods aim at coping with the various sources of uncertainties and the error propagation throughout the forecasting system. This probabilistic perspective is often created by generating ensembles of deterministic predictions that are aimed at sampling the most important sources of uncertainty in the forecasting system. The ensemble generation/sampling strategy is a crucial aspect of their performance and various methods have been proposed. Although global forecasting offices have been using ensembles of perturbed initial conditions for medium-range operational forecasts since 1994, no consensus exists regarding the optimum sampling strategy for high resolution short-range ensemble forecasts. Bred vectors, however, have been hypothesized to better capture the growing modes in the highly nonlinear mesoscale dynamics of severe episodes than singular vectors or observation perturbations. Yet even this technique is not able to produce enough diversity in the ensembles to accurately and routinely predict extreme phenomena such as severe weather. Thus, we propose a new method to generate ensembles of initial conditions perturbations that is based on the breeding technique. Given a standard bred mode, a set of customized perturbations is derived with specified amplitudes and horizontal scales. This allows the ensemble to excite growing modes across a wider range of scales. Results show that this approach produces significantly more spread in the ensemble prediction than standard bred modes alone. Several examples that illustrate the benefits from this approach for severe weather forecasts will be provided.
NASA Astrophysics Data System (ADS)
Boé, Julien; Terray, Laurent
2014-05-01
Ensemble approaches for climate change projections have become ubiquitous. Because of large model-to-model variations and, generally, lack of rationale for the choice of a particular climate model against others, it is widely accepted that future climate change and its impacts should not be estimated based on a single climate model. Generally, as a default approach, the multi-model ensemble mean (MMEM) is considered to provide the best estimate of climate change signals. The MMEM approach is based on the implicit hypothesis that all the models provide equally credible projections of future climate change. This hypothesis is unlikely to be true and ideally one would want to give more weight to more realistic models. A major issue with this alternative approach lies in the assessment of the relative credibility of future climate projections from different climate models, as they can only be evaluated against present-day observations: which present-day metric(s) should be used to decide which models are "good" and which models are "bad" in the future climate? Once a supposedly informative metric has been found, other issues arise. What is the best statistical method to combine multiple models results taking into account their relative credibility measured by a given metric? How to be sure in the end that the metric-based estimate of future climate change is not in fact less realistic than the MMEM? It is impossible to provide strict answers to those questions in the climate change context. Yet, in this presentation, we propose a methodological approach based on a perfect model framework that could bring some useful elements of answer to the questions previously mentioned. The basic idea is to take a random climate model in the ensemble and treat it as if it were the truth (results of this model, in both past and future climate, are called "synthetic observations"). Then, all the other members from the multi-model ensemble are used to derive thanks to a metric-based approach a posterior estimate of climate change, based on the synthetic observation of the metric. Finally, it is possible to compare the posterior estimate to the synthetic observation of future climate change to evaluate the skill of the method. The main objective of this presentation is to describe and apply this perfect model framework to test different methodological issues associated with non-uniform model weighting and similar metric-based approaches. The methodology presented is general, but will be applied to the specific case of summer temperature change in France, for which previous works have suggested potentially useful metrics associated with soil-atmosphere and cloud-temperature interactions. The relative performances of different simple statistical approaches to combine multiple model results based on metrics will be tested. The impact of ensemble size, observational errors, internal variability, and model similarity will be characterized. The potential improvements associated with metric-based approaches compared to the MMEM is terms of errors and uncertainties will be quantified.
An Ensemble Framework Coping with Instability in the Gene Selection Process.
Castellanos-Garzón, José A; Ramos, Juan; López-Sánchez, Daniel; de Paz, Juan F; Corchado, Juan M
2018-03-01
This paper proposes an ensemble framework for gene selection, which is aimed at addressing instability problems presented in the gene filtering task. The complex process of gene selection from gene expression data faces different instability problems from the informative gene subsets found by different filter methods. This makes the identification of significant genes by the experts difficult. The instability of results can come from filter methods, gene classifier methods, different datasets of the same disease and multiple valid groups of biomarkers. Even though there is a wide number of proposals, the complexity imposed by this problem remains a challenge today. This work proposes a framework involving five stages of gene filtering to discover biomarkers for diagnosis and classification tasks. This framework performs a process of stable feature selection, facing the problems above and, thus, providing a more suitable and reliable solution for clinical and research purposes. Our proposal involves a process of multistage gene filtering, in which several ensemble strategies for gene selection were added in such a way that different classifiers simultaneously assess gene subsets to face instability. Firstly, we apply an ensemble of recent gene selection methods to obtain diversity in the genes found (stability according to filter methods). Next, we apply an ensemble of known classifiers to filter genes relevant to all classifiers at a time (stability according to classification methods). The achieved results were evaluated in two different datasets of the same disease (pancreatic ductal adenocarcinoma), in search of stability according to the disease, for which promising results were achieved.
Using ensemble of classifiers for predicting HIV protease cleavage sites in proteins.
Nanni, Loris; Lumini, Alessandra
2009-03-01
The focus of this work is the use of ensembles of classifiers for predicting HIV protease cleavage sites in proteins. Due to the complex relationships in the biological data, several recent works show that often ensembles of learning algorithms outperform stand-alone methods. We show that the fusion of approaches based on different encoding models can be useful for improving the performance of this classification problem. In particular, in this work four different feature encodings for peptides are described and tested. An extensive evaluation on a large dataset according to a blind testing protocol is reported which demonstrates how different feature extraction methods and classifiers can be combined for obtaining a robust and reliable system. The comparison with other stand-alone approaches allows quantifying the performance improvement obtained by the ensembles proposed in this work.
Effects of Ensemble Configuration on Estimates of Regional Climate Uncertainties
DOE Office of Scientific and Technical Information (OSTI.GOV)
Goldenson, N.; Mauger, G.; Leung, L. R.
Internal variability in the climate system can contribute substantial uncertainty in climate projections, particularly at regional scales. Internal variability can be quantified using large ensembles of simulations that are identical but for perturbed initial conditions. Here we compare methods for quantifying internal variability. Our study region spans the west coast of North America, which is strongly influenced by El Niño and other large-scale dynamics through their contribution to large-scale internal variability. Using a statistical framework to simultaneously account for multiple sources of uncertainty, we find that internal variability can be quantified consistently using a large ensemble or an ensemble ofmore » opportunity that includes small ensembles from multiple models and climate scenarios. The latter also produce estimates of uncertainty due to model differences. We conclude that projection uncertainties are best assessed using small single-model ensembles from as many model-scenario pairings as computationally feasible, which has implications for ensemble design in large modeling efforts.« less
NASA Astrophysics Data System (ADS)
Pribram-Jones, Aurora
Warm dense matter (WDM) is a high energy phase between solids and plasmas, with characteristics of both. It is present in the centers of giant planets, within the earth's core, and on the path to ignition of inertial confinement fusion. The high temperatures and pressures of warm dense matter lead to complications in its simulation, as both classical and quantum effects must be included. One of the most successful simulation methods is density functional theory-molecular dynamics (DFT-MD). Despite great success in a diverse array of applications, DFT-MD remains computationally expensive and it neglects the explicit temperature dependence of electron-electron interactions known to exist within exact DFT. Finite-temperature density functional theory (FT DFT) is an extension of the wildly successful ground-state DFT formalism via thermal ensembles, broadening its quantum mechanical treatment of electrons to include systems at non-zero temperatures. Exact mathematical conditions have been used to predict the behavior of approximations in limiting conditions and to connect FT DFT to the ground-state theory. An introduction to FT DFT is given within the context of ensemble DFT and the larger field of DFT is discussed for context. Ensemble DFT is used to describe ensembles of ground-state and excited systems. Exact conditions in ensemble DFT and the performance of approximations depend on ensemble weights. Using an inversion method, exact Kohn-Sham ensemble potentials are found and compared to approximations. The symmetry eigenstate Hartree-exchange approximation is in good agreement with exact calculations because of its inclusion of an ensemble derivative discontinuity. Since ensemble weights in FT DFT are temperature-dependent Fermi weights, this insight may help develop approximations well-suited to both ground-state and FT DFT. A novel, highly efficient approach to free energy calculations, finite-temperature potential functional theory, is derived, which has the potential to transform the simulation of warm dense matter. As a semiclassical method, it connects the normally disparate regimes of cold condensed matter physics and hot plasma physics. This orbital-free approach captures the smooth classical density envelope and quantum density oscillations that are both crucial to accurate modeling of materials where temperature and pressure effects are influential.
Understanding the Central Equatorial African long-term drought using AMIP-type simulations
NASA Astrophysics Data System (ADS)
Hua, Wenjian; Zhou, Liming; Chen, Haishan; Nicholson, Sharon E.; Jiang, Yan; Raghavendra, Ajay
2018-02-01
Previous studies show that Indo-Pacific sea surface temperature (SST) variations may help to explain the observed long-term drought during April-May-June (AMJ) since the 1990s over Central equatorial Africa (CEA). However, the underlying physical mechanisms for this drought are still not clear due to observation limitations. Here we use the AMIP-type simulations with 24 ensemble members forced by observed SSTs from the ECHAM4.5 model to explore the likely physical processes that determine the rainfall variations over CEA. We not only examine the ensemble mean (EM), but also compare the "good" and "poor" ensemble members to understand the intra-ensemble variability. In general, EM and the "good" ensemble member can simulate the drought and associated reduced vertical velocity and anomalous anti-cyclonic circulation in the lower troposphere. However, the "poor" ensemble members cannot simulate the drought and associated circulation patterns. These contrasts indicate that the drought is tightly associated with the tropical Walker circulation and atmospheric teleconnection patterns. If the observational circulation patterns cannot be reproduced, the CEA drought will not be captured. Despite the large intra-ensemble spread, the model simulations indicate an essential role of SST forcing in causing the drought. These results suggest that the long-term drought may result from tropical Indo-Pacific SST variations associated with the enhanced and westward extended tropical Walker circulation.
SQUEEZE-E: The Optimal Solution for Molecular Simulations with Periodic Boundary Conditions.
Wassenaar, Tsjerk A; de Vries, Sjoerd; Bonvin, Alexandre M J J; Bekker, Henk
2012-10-09
In molecular simulations of macromolecules, it is desirable to limit the amount of solvent in the system to avoid spending computational resources on uninteresting solvent-solvent interactions. As a consequence, periodic boundary conditions are commonly used, with a simulation box chosen as small as possible, for a given minimal distance between images. Here, we describe how such a simulation cell can be set up for ensembles, taking into account a priori available or estimable information regarding conformational flexibility. Doing so ensures that any conformation present in the input ensemble will satisfy the distance criterion during the simulation. This helps avoid periodicity artifacts due to conformational changes. The method introduces three new approaches in computational geometry: (1) The first is the derivation of an optimal packing of ensembles, for which the mathematical framework is described. (2) A new method for approximating the α-hull and the contact body for single bodies and ensembles is presented, which is orders of magnitude faster than existing routines, allowing the calculation of packings of large ensembles and/or large bodies. 3. A routine is described for searching a combination of three vectors on a discretized contact body forming a reduced base for a lattice with minimal cell volume. The new algorithms reduce the time required to calculate packings of single bodies from minutes or hours to seconds. The use and efficacy of the method is demonstrated for ensembles obtained from NMR, MD simulations, and elastic network modeling. An implementation of the method has been made available online at http://haddock.chem.uu.nl/services/SQUEEZE/ and has been made available as an option for running simulations through the weNMR GRID MD server at http://haddock.science.uu.nl/enmr/services/GROMACS/main.php .
NWP model forecast skill optimization via closure parameter variations
NASA Astrophysics Data System (ADS)
Järvinen, H.; Ollinaho, P.; Laine, M.; Solonen, A.; Haario, H.
2012-04-01
We present results of a novel approach to tune predictive skill of numerical weather prediction (NWP) models. These models contain tunable parameters which appear in parameterizations schemes of sub-grid scale physical processes. The current practice is to specify manually the numerical parameter values, based on expert knowledge. We developed recently a concept and method (QJRMS 2011) for on-line estimation of the NWP model parameters via closure parameter variations. The method called EPPES ("Ensemble prediction and parameter estimation system") utilizes ensemble prediction infra-structure for parameter estimation in a very cost-effective way: practically no new computations are introduced. The approach provides an algorithmic decision making tool for model parameter optimization in operational NWP. In EPPES, statistical inference about the NWP model tunable parameters is made by (i) generating an ensemble of predictions so that each member uses different model parameter values, drawn from a proposal distribution, and (ii) feeding-back the relative merits of the parameter values to the proposal distribution, based on evaluation of a suitable likelihood function against verifying observations. In this presentation, the method is first illustrated in low-order numerical tests using a stochastic version of the Lorenz-95 model which effectively emulates the principal features of ensemble prediction systems. The EPPES method correctly detects the unknown and wrongly specified parameters values, and leads to an improved forecast skill. Second, results with an ensemble prediction system emulator, based on the ECHAM5 atmospheric GCM show that the model tuning capability of EPPES scales up to realistic models and ensemble prediction systems. Finally, preliminary results of EPPES in the context of ECMWF forecasting system are presented.
Ensemble-Based Parameter Estimation in a Coupled General Circulation Model
Liu, Y.; Liu, Z.; Zhang, S.; ...
2014-09-10
Parameter estimation provides a potentially powerful approach to reduce model bias for complex climate models. Here, in a twin experiment framework, the authors perform the first parameter estimation in a fully coupled ocean–atmosphere general circulation model using an ensemble coupled data assimilation system facilitated with parameter estimation. The authors first perform single-parameter estimation and then multiple-parameter estimation. In the case of the single-parameter estimation, the error of the parameter [solar penetration depth (SPD)] is reduced by over 90% after ~40 years of assimilation of the conventional observations of monthly sea surface temperature (SST) and salinity (SSS). The results of multiple-parametermore » estimation are less reliable than those of single-parameter estimation when only the monthly SST and SSS are assimilated. Assimilating additional observations of atmospheric data of temperature and wind improves the reliability of multiple-parameter estimation. The errors of the parameters are reduced by 90% in ~8 years of assimilation. Finally, the improved parameters also improve the model climatology. With the optimized parameters, the bias of the climatology of SST is reduced by ~90%. Altogether, this study suggests the feasibility of ensemble-based parameter estimation in a fully coupled general circulation model.« less
Ensemble positive unlabeled learning for disease gene identification.
Yang, Peng; Li, Xiaoli; Chua, Hon-Nian; Kwoh, Chee-Keong; Ng, See-Kiong
2014-01-01
An increasing number of genes have been experimentally confirmed in recent years as causative genes to various human diseases. The newly available knowledge can be exploited by machine learning methods to discover additional unknown genes that are likely to be associated with diseases. In particular, positive unlabeled learning (PU learning) methods, which require only a positive training set P (confirmed disease genes) and an unlabeled set U (the unknown candidate genes) instead of a negative training set N, have been shown to be effective in uncovering new disease genes in the current scenario. Using only a single source of data for prediction can be susceptible to bias due to incompleteness and noise in the genomic data and a single machine learning predictor prone to bias caused by inherent limitations of individual methods. In this paper, we propose an effective PU learning framework that integrates multiple biological data sources and an ensemble of powerful machine learning classifiers for disease gene identification. Our proposed method integrates data from multiple biological sources for training PU learning classifiers. A novel ensemble-based PU learning method EPU is then used to integrate multiple PU learning classifiers to achieve accurate and robust disease gene predictions. Our evaluation experiments across six disease groups showed that EPU achieved significantly better results compared with various state-of-the-art prediction methods as well as ensemble learning classifiers. Through integrating multiple biological data sources for training and the outputs of an ensemble of PU learning classifiers for prediction, we are able to minimize the potential bias and errors in individual data sources and machine learning algorithms to achieve more accurate and robust disease gene predictions. In the future, our EPU method provides an effective framework to integrate the additional biological and computational resources for better disease gene predictions.
Upgrades to the REA method for producing probabilistic climate change projections
NASA Astrophysics Data System (ADS)
Xu, Ying; Gao, Xuejie; Giorgi, Filippo
2010-05-01
We present an augmented version of the Reliability Ensemble Averaging (REA) method designed to generate probabilistic climate change information from ensembles of climate model simulations. Compared to the original version, the augmented one includes consideration of multiple variables and statistics in the calculation of the performance-based weights. In addition, the model convergence criterion previously employed is removed. The method is applied to the calculation of changes in mean and variability for temperature and precipitation over different sub-regions of East Asia based on the recently completed CMIP3 multi-model ensemble. Comparison of the new and old REA methods, along with the simple averaging procedure, and the use of different combinations of performance metrics shows that at fine sub-regional scales the choice of weighting is relevant. This is mostly because the models show a substantial spread in performance for the simulation of precipitation statistics, a result that supports the use of model weighting as a useful option to account for wide ranges of quality of models. The REA method, and in particular the upgraded one, provides a simple and flexible framework for assessing the uncertainty related to the aggregation of results from ensembles of models in order to produce climate change information at the regional scale. KEY WORDS: REA method, Climate change, CMIP3
A Maximum Likelihood Ensemble Data Assimilation Method Tailored to the Inner Radiation Belt
NASA Astrophysics Data System (ADS)
Guild, T. B.; O'Brien, T. P., III; Mazur, J. E.
2014-12-01
The Earth's radiation belts are composed of energetic protons and electrons whose fluxes span many orders of magnitude, whose distributions are log-normal, and where data-model differences can be large and also log-normal. This physical system thus challenges standard data assimilation methods relying on underlying assumptions of Gaussian distributions of measurements and data-model differences, where innovations to the model are small. We have therefore developed a data assimilation method tailored to these properties of the inner radiation belt, analogous to the ensemble Kalman filter but for the unique cases of non-Gaussian model and measurement errors, and non-linear model and measurement distributions. We apply this method to the inner radiation belt proton populations, using the SIZM inner belt model [Selesnick et al., 2007] and SAMPEX/PET and HEO proton observations to select the most likely ensemble members contributing to the state of the inner belt. We will describe the algorithm, the method of generating ensemble members, our choice of minimizing the difference between instrument counts not phase space densities, and demonstrate the method with our reanalysis of the inner radiation belt throughout solar cycle 23. We will report on progress to continue our assimilation into solar cycle 24 using the Van Allen Probes/RPS observations.
Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V
2012-01-01
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
Sampling the isothermal-isobaric ensemble by Langevin dynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gao, Xingyu; Institute of Applied Physics and Computational Mathematics, Fenghao East Road 2, Beijing 100094; CAEP Software Center for High Performance Numerical Simulation, Huayuan Road 6, Beijing 100088
2016-03-28
We present a new method of conducting fully flexible-cell molecular dynamics simulation in isothermal-isobaric ensemble based on Langevin equations of motion. The stochastic coupling to all particle and cell degrees of freedoms is introduced in a correct way, in the sense that the stationary configurational distribution is proved to be consistent with that of the isothermal-isobaric ensemble. In order to apply the proposed method in computer simulations, a second order symmetric numerical integration scheme is developed by Trotter’s splitting of the single-step propagator. Moreover, a practical guide of choosing working parameters is suggested for user specified thermo- and baro-coupling timemore » scales. The method and software implementation are carefully validated by a numerical example.« less
NASA Astrophysics Data System (ADS)
Sastre, Francisco; Moreno-Hilario, Elizabeth; Sotelo-Serna, Maria Guadalupe; Gil-Villegas, Alejandro
2018-02-01
The microcanonical-ensemble computer simulation method (MCE) is used to evaluate the perturbation terms Ai of the Helmholtz free energy of a square-well (SW) fluid. The MCE method offers a very efficient and accurate procedure for the determination of perturbation terms of discrete-potential systems such as the SW fluid and surpass the standard NVT canonical ensemble Monte Carlo method, allowing the calculation of the first six expansion terms. Results are presented for the case of a SW potential with attractive ranges 1.1 ≤ λ ≤ 1.8. Using semi-empirical representation of the MCE values for Ai, we also discuss the accuracy in the determination of the phase diagram of this system.
NASA Astrophysics Data System (ADS)
Verkade, J. S.; Brown, J. D.; Davids, F.; Reggiani, P.; Weerts, A. H.
2017-12-01
Two statistical post-processing approaches for estimation of predictive hydrological uncertainty are compared: (i) 'dressing' of a deterministic forecast by adding a single, combined estimate of both hydrological and meteorological uncertainty and (ii) 'dressing' of an ensemble streamflow forecast by adding an estimate of hydrological uncertainty to each individual streamflow ensemble member. Both approaches aim to produce an estimate of the 'total uncertainty' that captures both the meteorological and hydrological uncertainties. They differ in the degree to which they make use of statistical post-processing techniques. In the 'lumped' approach, both sources of uncertainty are lumped by post-processing deterministic forecasts using their verifying observations. In the 'source-specific' approach, the meteorological uncertainties are estimated by an ensemble of weather forecasts. These ensemble members are routed through a hydrological model and a realization of the probability distribution of hydrological uncertainties (only) is then added to each ensemble member to arrive at an estimate of the total uncertainty. The techniques are applied to one location in the Meuse basin and three locations in the Rhine basin. Resulting forecasts are assessed for their reliability and sharpness, as well as compared in terms of multiple verification scores including the relative mean error, Brier Skill Score, Mean Continuous Ranked Probability Skill Score, Relative Operating Characteristic Score and Relative Economic Value. The dressed deterministic forecasts are generally more reliable than the dressed ensemble forecasts, but the latter are sharper. On balance, however, they show similar quality across a range of verification metrics, with the dressed ensembles coming out slightly better. Some additional analyses are suggested. Notably, these include statistical post-processing of the meteorological forecasts in order to increase their reliability, thus increasing the reliability of the streamflow forecasts produced with ensemble meteorological forcings.
Impacts of snow cover fraction data assimilation on modeled energy and moisture budgets
NASA Astrophysics Data System (ADS)
Arsenault, Kristi R.; Houser, Paul R.; De Lannoy, Gabriëlle J. M.; Dirmeyer, Paul A.
2013-07-01
Two data assimilation (DA) methods, a simple rule-based direct insertion (DI) approach and a one-dimensional ensemble Kalman filter (EnKF) method, are evaluated by assimilating snow cover fraction observations into the Community Land surface Model. The ensemble perturbation needed for the EnKF resulted in negative snowpack biases. Therefore, a correction is made to the ensemble bias using an approach that constrains the ensemble forecasts with a single unperturbed deterministic LSM run. This is shown to improve the final snow state analyses. The EnKF method produces slightly better results in higher elevation locations, whereas results indicate that the DI method has a performance advantage in lower elevation regions. In addition, the two DA methods are evaluated in terms of their overall impacts on the other land surface state variables (e.g., soil moisture) and fluxes (e.g., latent heat flux). The EnKF method is shown to have less impact overall than the DI method and causes less distortion of the hydrological budget. However, the land surface model adjusts more slowly to the smaller EnKF increments, which leads to smaller but slightly more persistent moisture budget errors than found with the DI updates. The DI method can remove almost instantly much of the modeled snowpack, but this also allows the model system to quickly revert to hydrological balance for nonsnowpack conditions.
Nanni, Loris; Lumini, Alessandra
2009-01-01
The focuses of this work are: to propose a novel method for building an ensemble of classifiers for peptide classification based on substitution matrices; to show the importance to select a proper set of the parameters of the classifiers that build the ensemble of learning systems. The HIV-1 protease cleavage site prediction problem is here studied. The results obtained by a blind testing protocol are reported, the comparison with other state-of-the-art approaches, based on ensemble of classifiers, allows to quantify the performance improvement obtained by the systems proposed in this paper. The simulation based on experimentally determined protease cleavage data has demonstrated the success of these new ensemble algorithms. Particularly interesting it is to note that also if the HIV-1 protease cleavage site prediction problem is considered linearly separable we obtain the best performance using an ensemble of non-linear classifiers.
Sanchez-Martinez, M; Crehuet, R
2014-12-21
We present a method based on the maximum entropy principle that can re-weight an ensemble of protein structures based on data from residual dipolar couplings (RDCs). The RDCs of intrinsically disordered proteins (IDPs) provide information on the secondary structure elements present in an ensemble; however even two sets of RDCs are not enough to fully determine the distribution of conformations, and the force field used to generate the structures has a pervasive influence on the refined ensemble. Two physics-based coarse-grained force fields, Profasi and Campari, are able to predict the secondary structure elements present in an IDP, but even after including the RDC data, the re-weighted ensembles differ between both force fields. Thus the spread of IDP ensembles highlights the need for better force fields. We distribute our algorithm in an open-source Python code.
A study of fuzzy logic ensemble system performance on face recognition problem
NASA Astrophysics Data System (ADS)
Polyakova, A.; Lipinskiy, L.
2017-02-01
Some problems are difficult to solve by using a single intelligent information technology (IIT). The ensemble of the various data mining (DM) techniques is a set of models which are able to solve the problem by itself, but the combination of which allows increasing the efficiency of the system as a whole. Using the IIT ensembles can improve the reliability and efficiency of the final decision, since it emphasizes on the diversity of its components. The new method of the intellectual informational technology ensemble design is considered in this paper. It is based on the fuzzy logic and is designed to solve the classification and regression problems. The ensemble consists of several data mining algorithms: artificial neural network, support vector machine and decision trees. These algorithms and their ensemble have been tested by solving the face recognition problems. Principal components analysis (PCA) is used for feature selection.
Creating ensembles of decision trees through sampling
Kamath, Chandrika; Cantu-Paz, Erick
2005-08-30
A system for decision tree ensembles that includes a module to read the data, a module to sort the data, a module to evaluate a potential split of the data according to some criterion using a random sample of the data, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method is based on statistical sampling techniques and includes the steps of reading the data; sorting the data; evaluating a potential split according to some criterion using a random sample of the data, splitting the data, and combining multiple decision trees in ensembles.
An optimized ensemble local mean decomposition method for fault detection of mechanical components
NASA Astrophysics Data System (ADS)
Zhang, Chao; Li, Zhixiong; Hu, Chao; Chen, Shuai; Wang, Jianguo; Zhang, Xiaogang
2017-03-01
Mechanical transmission systems have been widely adopted in most of industrial applications, and issues related to the maintenance of these systems have attracted considerable attention in the past few decades. The recently developed ensemble local mean decomposition (ELMD) method shows satisfactory performance in fault detection of mechanical components for preventing catastrophic failures and reducing maintenance costs. However, the performance of ELMD often heavily depends on proper selection of its model parameters. To this end, this paper proposes an optimized ensemble local mean decomposition (OELMD) method to determinate an optimum set of ELMD parameters for vibration signal analysis. In OELMD, an error index termed the relative root-mean-square error (Relative RMSE) is used to evaluate the decomposition performance of ELMD with a certain amplitude of the added white noise. Once a maximum Relative RMSE, corresponding to an optimal noise amplitude, is determined, OELMD then identifies optimal noise bandwidth and ensemble number based on the Relative RMSE and signal-to-noise ratio (SNR), respectively. Thus, all three critical parameters of ELMD (i.e. noise amplitude and bandwidth, and ensemble number) are optimized by OELMD. The effectiveness of OELMD was evaluated using experimental vibration signals measured from three different mechanical components (i.e. the rolling bearing, gear and diesel engine) under faulty operation conditions.
NASA Astrophysics Data System (ADS)
Kunii, Masaru; Saito, Kazuo; Seko, Hiromu; Hara, Masahiro; Hara, Tabito; Yamaguchi, Munehiko; Gong, Jiandong; Charron, Martin; Du, Jun; Wang, Yong; Chen, Dehui
2011-05-01
During the period around the Beijing 2008 Olympic Games, the Beijing 2008 Olympics Research and Development Project (B08RDP) was conducted as part of the World Weather Research Program short-range weather forecasting research project. Mesoscale ensemble prediction (MEP) experiments were carried out by six organizations in near-real time, in order to share their experiences in the development of MEP systems. The purpose of this study is to objectively verify these experiments and to clarify the problems associated with the current MEP systems through the same experiences. Verification was performed using the MEP outputs interpolated into a common verification domain with a horizontal resolution of 15 km. For all systems, the ensemble spreads grew as the forecast time increased, and the ensemble mean improved the forecast errors compared with individual control forecasts in the verification against the analysis fields. However, each system exhibited individual characteristics according to the MEP method. Some participants used physical perturbation methods. The significance of these methods was confirmed by the verification. However, the mean error (ME) of the ensemble forecast in some systems was worse than that of the individual control forecast. This result suggests that it is necessary to pay careful attention to physical perturbations.
Entropy: Thermodynamic definition and quantum expression
NASA Astrophysics Data System (ADS)
Gyftopoulos, Elias P.; Çubukçu, Erol
1997-04-01
Numerous expressions exist in the scientific literature purporting to represent entropy. Are they all acceptable? To answer this question, we review the thermodynamic definition of entropy, and establish eight criteria that must be satisfied by it. The definition and criteria are obtained by using solely the general, nonstatistical statements of the first and second laws presented in Thermodynamics: Foundations and Applications [Elias P. Gyftopoulos and Gian Paolo Beretta (Macmillan, New York, 1991)]. We apply the eight criteria to each of the entropy expressions proposed in the literature and find that only the relation S=-kTrρln ρ satisfies all the criteria, provided that the density operator ρ corresponds to a homogeneous ensemble of identical systems, identically prepared. Homogeneous ensemble means that every member of the ensemble is described by the same density operator ρ as any other member, that is, the ensemble is not a statistical mixture of projectors (wave functions).
A Fractional Cartesian Composition Model for Semi-Spatial Comparative Visualization Design.
Kolesar, Ivan; Bruckner, Stefan; Viola, Ivan; Hauser, Helwig
2017-01-01
The study of spatial data ensembles leads to substantial visualization challenges in a variety of applications. In this paper, we present a model for comparative visualization that supports the design of according ensemble visualization solutions by partial automation. We focus on applications, where the user is interested in preserving selected spatial data characteristics of the data as much as possible-even when many ensemble members should be jointly studied using comparative visualization. In our model, we separate the design challenge into a minimal set of user-specified parameters and an optimization component for the automatic configuration of the remaining design variables. We provide an illustrated formal description of our model and exemplify our approach in the context of several application examples from different domains in order to demonstrate its generality within the class of comparative visualization problems for spatial data ensembles.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Theophilou, Iris; Helbig, Nicole; Lathiotakis, Nektarios N.
Functionals of the one-body reduced density matrix (1-RDM) are routinely minimized under Coleman’s ensemble N-representability conditions. Recently, the topic of pure-state N-representability conditions, also known as generalized Pauli constraints, received increased attention following the discovery of a systematic way to derive them for any number of electrons and any finite dimensionality of the Hilbert space. The target of this work is to assess the potential impact of the enforcement of the pure-state conditions on the results of reduced density-matrix functional theory calculations. In particular, we examine whether the standard minimization of typical 1-RDM functionals under the ensemble N-representability conditions violatesmore » the pure-state conditions for prototype 3-electron systems. We also enforce the pure-state conditions, in addition to the ensemble ones, for the same systems and functionals and compare the correlation energies and optimal occupation numbers with those obtained by the enforcement of the ensemble conditions alone.« less
NASA Astrophysics Data System (ADS)
Chan, Yi-Tung; Wang, Shuenn-Jyi; Tsai, Chung-Hsien
2017-09-01
Public safety is a matter of national security and people's livelihoods. In recent years, intelligent video-surveillance systems have become important active-protection systems. A surveillance system that provides early detection and threat assessment could protect people from crowd-related disasters and ensure public safety. Image processing is commonly used to extract features, e.g., people, from a surveillance video. However, little research has been conducted on the relationship between foreground detection and feature extraction. Most current video-surveillance research has been developed for restricted environments, in which the extracted features are limited by having information from a single foreground; they do not effectively represent the diversity of crowd behavior. This paper presents a general framework based on extracting ensemble features from the foreground of a surveillance video to analyze a crowd. The proposed method can flexibly integrate different foreground-detection technologies to adapt to various monitored environments. Furthermore, the extractable representative features depend on the heterogeneous foreground data. Finally, a classification algorithm is applied to these features to automatically model crowd behavior and distinguish an abnormal event from normal patterns. The experimental results demonstrate that the proposed method's performance is both comparable to that of state-of-the-art methods and satisfies the requirements of real-time applications.
Designing boosting ensemble of relational fuzzy systems.
Scherer, Rafał
2010-10-01
A method frequently used in classification systems for improving classification accuracy is to combine outputs of several classifiers. Among various types of classifiers, fuzzy ones are tempting because of using intelligible fuzzy if-then rules. In the paper we build an AdaBoost ensemble of relational neuro-fuzzy classifiers. Relational fuzzy systems bond input and output fuzzy linguistic values by a binary relation; thus, fuzzy rules have additional, comparing to traditional fuzzy systems, weights - elements of a fuzzy relation matrix. Thanks to this the system is better adjustable to data during learning. In the paper an ensemble of relational fuzzy systems is proposed. The problem is that such an ensemble contains separate rule bases which cannot be directly merged. As systems are separate, we cannot treat fuzzy rules coming from different systems as rules from the same (single) system. In the paper, the problem is addressed by a novel design of fuzzy systems constituting the ensemble, resulting in normalization of individual rule bases during learning. The method described in the paper is tested on several known benchmarks and compared with other machine learning solutions from the literature.
Annealed importance sampling with constant cooling rate
NASA Astrophysics Data System (ADS)
Giovannelli, Edoardo; Cardini, Gianni; Gellini, Cristina; Pietraperzia, Giangaetano; Chelli, Riccardo
2015-02-01
Annealed importance sampling is a simulation method devised by Neal [Stat. Comput. 11, 125 (2001)] to assign weights to configurations generated by simulated annealing trajectories. In particular, the equilibrium average of a generic physical quantity can be computed by a weighted average exploiting weights and estimates of this quantity associated to the final configurations of the annealed trajectories. Here, we review annealed importance sampling from the perspective of nonequilibrium path-ensemble averages [G. E. Crooks, Phys. Rev. E 61, 2361 (2000)]. The equivalence of Neal's and Crooks' treatments highlights the generality of the method, which goes beyond the mere thermal-based protocols. Furthermore, we show that a temperature schedule based on a constant cooling rate outperforms stepwise cooling schedules and that, for a given elapsed computer time, performances of annealed importance sampling are, in general, improved by increasing the number of intermediate temperatures.
NASA Technical Reports Server (NTRS)
Keppenne, Christian; Vernieres, Guillaume; Rienecker, Michele; Jacob, Jossy; Kovach, Robin
2011-01-01
Satellite altimetry measurements have provided global, evenly distributed observations of the ocean surface since 1993. However, the difficulties introduced by the presence of model biases and the requirement that data assimilation systems extrapolate the sea surface height (SSH) information to the subsurface in order to estimate the temperature, salinity and currents make it difficult to optimally exploit these measurements. This talk investigates the potential of the altimetry data assimilation once the biases are accounted for with an ad hoc bias estimation scheme. Either steady-state or state-dependent multivariate background-error covariances from an ensemble of model integrations are used to address the problem of extrapolating the information to the sub-surface. The GMAO ocean data assimilation system applied to an ensemble of coupled model instances using the GEOS-5 AGCM coupled to MOM4 is used in the investigation. To model the background error covariances, the system relies on a hybrid ensemble approach in which a small number of dynamically evolved model trajectories is augmented on the one hand with past instances of the state vector along each trajectory and, on the other, with a steady state ensemble of error estimates from a time series of short-term model forecasts. A state-dependent adaptive error-covariance localization and inflation algorithm controls how the SSH information is extrapolated to the sub-surface. A two-step predictor corrector approach is used to assimilate future information. Independent (not-assimilated) temperature and salinity observations from Argo floats are used to validate the assimilation. A two-step projection method in which the system first calculates a SSH increment and then projects this increment vertically onto the temperature, salt and current fields is found to be most effective in reconstructing the sub-surface information. The performance of the system in reconstructing the sub-surface fields is particularly impressive for temperature, but not as satisfactory for salt.
Numerical Error Estimation with UQ
NASA Astrophysics Data System (ADS)
Ackmann, Jan; Korn, Peter; Marotzke, Jochem
2014-05-01
Ocean models are still in need of means to quantify model errors, which are inevitably made when running numerical experiments. The total model error can formally be decomposed into two parts, the formulation error and the discretization error. The formulation error arises from the continuous formulation of the model not fully describing the studied physical process. The discretization error arises from having to solve a discretized model instead of the continuously formulated model. Our work on error estimation is concerned with the discretization error. Given a solution of a discretized model, our general problem statement is to find a way to quantify the uncertainties due to discretization in physical quantities of interest (diagnostics), which are frequently used in Geophysical Fluid Dynamics. The approach we use to tackle this problem is called the "Goal Error Ensemble method". The basic idea of the Goal Error Ensemble method is that errors in diagnostics can be translated into a weighted sum of local model errors, which makes it conceptually based on the Dual Weighted Residual method from Computational Fluid Dynamics. In contrast to the Dual Weighted Residual method these local model errors are not considered deterministically but interpreted as local model uncertainty and described stochastically by a random process. The parameters for the random process are tuned with high-resolution near-initial model information. However, the original Goal Error Ensemble method, introduced in [1], was successfully evaluated only in the case of inviscid flows without lateral boundaries in a shallow-water framework and is hence only of limited use in a numerical ocean model. Our work consists in extending the method to bounded, viscous flows in a shallow-water framework. As our numerical model, we use the ICON-Shallow-Water model. In viscous flows our high-resolution information is dependent on the viscosity parameter, making our uncertainty measures viscosity-dependent. We will show that we can choose a sensible parameter by using the Reynolds-number as a criteria. Another topic, we will discuss is the choice of the underlying distribution of the random process. This is especially of importance in the scope of lateral boundaries. We will present resulting error estimates for different height- and velocity-based diagnostics applied to the Munk gyre experiment. References [1] F. RAUSER: Error Estimation in Geophysical Fluid Dynamics through Learning; PhD Thesis, IMPRS-ESM, Hamburg, 2010 [2] F. RAUSER, J. MAROTZKE, P. KORN: Ensemble-type numerical uncertainty quantification from single model integrations; SIAM/ASA Journal on Uncertainty Quantification, submitted
Pauci ex tanto numero: reducing redundancy in multi-model ensembles
NASA Astrophysics Data System (ADS)
Solazzo, E.; Riccio, A.; Kioutsioukis, I.; Galmarini, S.
2013-02-01
We explicitly address the fundamental issue of member diversity in multi-model ensembles. To date no attempts in this direction are documented within the air quality (AQ) community, although the extensive use of ensembles in this field. Common biases and redundancy are the two issues directly deriving from lack of independence, undermining the significance of a multi-model ensemble, and are the subject of this study. Shared biases among models will determine a biased ensemble, making therefore essential the errors of the ensemble members to be independent so that bias can cancel out. Redundancy derives from having too large a portion of common variance among the members of the ensemble, producing overconfidence in the predictions and underestimation of the uncertainty. The two issues of common biases and redundancy are analysed in detail using the AQMEII ensemble of AQ model results for four air pollutants in two European regions. We show that models share large portions of bias and variance, extending well beyond those induced by common inputs. We make use of several techniques to further show that subsets of models can explain the same amount of variance as the full ensemble with the advantage of being poorly correlated. Selecting the members for generating skilful, non-redundant ensembles from such subsets proved, however, non-trivial. We propose and discuss various methods of member selection and rate the ensemble performance they produce. In most cases, the full ensemble is outscored by the reduced ones. We conclude that, although independence of outputs may not always guarantee enhancement of scores (but this depends upon the skill being investigated) we discourage selecting the members of the ensemble simply on the basis of scores, that is, independence and skills need to be considered disjointly.
The NRL relocatable ocean/acoustic ensemble forecast system
NASA Astrophysics Data System (ADS)
Rowley, C.; Martin, P.; Cummings, J.; Jacobs, G.; Coelho, E.; Bishop, C.; Hong, X.; Peggion, G.; Fabre, J.
2009-04-01
A globally relocatable regional ocean nowcast/forecast system has been developed to support rapid implementation of new regional forecast domains. The system is in operational use at the Naval Oceanographic Office for a growing number of regional and coastal implementations. The new system is the basis for an ocean acoustic ensemble forecast and adaptive sampling capability. We present an overview of the forecast system and the ocean ensemble and adaptive sampling methods. The forecast system consists of core ocean data analysis and forecast modules, software for domain configuration, surface and boundary condition forcing processing, and job control, and global databases for ocean climatology, bathymetry, tides, and river locations and transports. The analysis component is the Navy Coupled Ocean Data Assimilation (NCODA) system, a 3D multivariate optimum interpolation system that produces simultaneous analyses of temperature, salinity, geopotential, and vector velocity using remotely-sensed SST, SSH, and sea ice concentration, plus in situ observations of temperature, salinity, and currents from ships, buoys, XBTs, CTDs, profiling floats, and autonomous gliders. The forecast component is the Navy Coastal Ocean Model (NCOM). The system supports one-way nesting and multiple assimilation methods. The ensemble system uses the ensemble transform technique with error variance estimates from the NCODA analysis to represent initial condition error. Perturbed surface forcing or an atmospheric ensemble is used to represent errors in surface forcing. The ensemble transform Kalman filter is used to assess the impact of adaptive observations on future analysis and forecast uncertainty for both ocean and acoustic properties.
Simulation studies of the fidelity of biomolecular structure ensemble recreation
NASA Astrophysics Data System (ADS)
Lätzer, Joachim; Eastwood, Michael P.; Wolynes, Peter G.
2006-12-01
We examine the ability of Bayesian methods to recreate structural ensembles for partially folded molecules from averaged data. Specifically we test the ability of various algorithms to recreate different transition state ensembles for folding proteins using a multiple replica simulation algorithm using input from "gold standard" reference ensembles that were first generated with a Gō-like Hamiltonian having nonpairwise additive terms. A set of low resolution data, which function as the "experimental" ϕ values, were first constructed from this reference ensemble. The resulting ϕ values were then treated as one would treat laboratory experimental data and were used as input in the replica reconstruction algorithm. The resulting ensembles of structures obtained by the replica algorithm were compared to the gold standard reference ensemble, from which those "data" were, in fact, obtained. It is found that for a unimodal transition state ensemble with a low barrier, the multiple replica algorithm does recreate the reference ensemble fairly successfully when no experimental error is assumed. The Kolmogorov-Smirnov test as well as principal component analysis show that the overlap of the recovered and reference ensembles is significantly enhanced when multiple replicas are used. Reduction of the multiple replica ensembles by clustering successfully yields subensembles with close similarity to the reference ensembles. On the other hand, for a high barrier transition state with two distinct transition state ensembles, the single replica algorithm only samples a few structures of one of the reference ensemble basins. This is due to the fact that the ϕ values are intrinsically ensemble averaged quantities. The replica algorithm with multiple copies does sample both reference ensemble basins. In contrast to the single replica case, the multiple replicas are constrained to reproduce the average ϕ values, but allow fluctuations in ϕ for each individual copy. These fluctuations facilitate a more faithful sampling of the reference ensemble basins. Finally, we test how robustly the reconstruction algorithm can function by introducing errors in ϕ comparable in magnitude to those suggested by some authors. In this circumstance we observe that the chances of ensemble recovery with the replica algorithm are poor using a single replica, but are improved when multiple copies are used. A multimodal transition state ensemble, however, turns out to be more sensitive to large errors in ϕ (if appropriately gauged) and attempts at successful recreation of the reference ensemble with simple replica algorithms can fall short.
Monte Carlo replica-exchange based ensemble docking of protein conformations.
Zhang, Zhe; Ehmann, Uwe; Zacharias, Martin
2017-05-01
A replica-exchange Monte Carlo (REMC) ensemble docking approach has been developed that allows efficient exploration of protein-protein docking geometries. In addition to Monte Carlo steps in translation and orientation of binding partners, possible conformational changes upon binding are included based on Monte Carlo selection of protein conformations stored as ordered pregenerated conformational ensembles. The conformational ensembles of each binding partner protein were generated by three different approaches starting from the unbound partner protein structure with a range spanning a root mean square deviation of 1-2.5 Å with respect to the unbound structure. Because MC sampling is performed to select appropriate partner conformations on the fly the approach is not limited by the number of conformations in the ensemble compared to ensemble docking of each conformer pair in ensemble cross docking. Although only a fraction of generated conformers was in closer agreement with the bound structure the REMC ensemble docking approach achieved improved docking results compared to REMC docking with only the unbound partner structures or using docking energy minimization methods. The approach has significant potential for further improvement in combination with more realistic structural ensembles and better docking scoring functions. Proteins 2017; 85:924-937. © 2016 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Modality-Driven Classification and Visualization of Ensemble Variance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bensema, Kevin; Gosink, Luke; Obermaier, Harald
Advances in computational power now enable domain scientists to address conceptual and parametric uncertainty by running simulations multiple times in order to sufficiently sample the uncertain input space. While this approach helps address conceptual and parametric uncertainties, the ensemble datasets produced by this technique present a special challenge to visualization researchers as the ensemble dataset records a distribution of possible values for each location in the domain. Contemporary visualization approaches that rely solely on summary statistics (e.g., mean and variance) cannot convey the detailed information encoded in ensemble distributions that are paramount to ensemble analysis; summary statistics provide no informationmore » about modality classification and modality persistence. To address this problem, we propose a novel technique that classifies high-variance locations based on the modality of the distribution of ensemble predictions. Additionally, we develop a set of confidence metrics to inform the end-user of the quality of fit between the distribution at a given location and its assigned class. We apply a similar method to time-varying ensembles to illustrate the relationship between peak variance and bimodal or multimodal behavior. These classification schemes enable a deeper understanding of the behavior of the ensemble members by distinguishing between distributions that can be described by a single tendency and distributions which reflect divergent trends in the ensemble.« less
Skill of Global Raw and Postprocessed Ensemble Predictions of Rainfall over Northern Tropical Africa
NASA Astrophysics Data System (ADS)
Vogel, Peter; Knippertz, Peter; Fink, Andreas H.; Schlueter, Andreas; Gneiting, Tilmann
2018-04-01
Accumulated precipitation forecasts are of high socioeconomic importance for agriculturally dominated societies in northern tropical Africa. In this study, we analyze the performance of nine operational global ensemble prediction systems (EPSs) relative to climatology-based forecasts for 1 to 5-day accumulated precipitation based on the monsoon seasons 2007-2014 for three regions within northern tropical Africa. To assess the full potential of raw ensemble forecasts across spatial scales, we apply state-of-the-art statistical postprocessing methods in form of Bayesian Model Averaging (BMA) and Ensemble Model Output Statistics (EMOS), and verify against station and spatially aggregated, satellite-based gridded observations. Raw ensemble forecasts are uncalibrated, unreliable, and underperform relative to climatology, independently of region, accumulation time, monsoon season, and ensemble. Differences between raw ensemble and climatological forecasts are large, and partly stem from poor prediction for low precipitation amounts. BMA and EMOS postprocessed forecasts are calibrated, reliable, and strongly improve on the raw ensembles, but - somewhat disappointingly - typically do not outperform climatology. Most EPSs exhibit slight improvements over the period 2007-2014, but overall have little added value compared to climatology. We suspect that the parametrization of convection is a potential cause for the sobering lack of ensemble forecast skill in a region dominated by mesoscale convective systems.
NASA Astrophysics Data System (ADS)
Millar, R.; Ingram, W.; Allen, M. R.; Lowe, J.
2013-12-01
Temperature and precipitation patterns are the climate variables with the greatest impacts on both natural and human systems. Due to the small spatial scales and the many interactions involved in the global hydrological cycle, in general circulation models (GCMs) representations of precipitation changes are subject to considerable uncertainty. Quantifying and understanding the causes of uncertainty (and identifying robust features of predictions) in both global and local precipitation change is an essential challenge of climate science. We have used the huge distributed computing capacity of the climateprediction.net citizen science project to examine parametric uncertainty in an ensemble of 20,000 perturbed-physics versions of the HadCM3 general circulation model. The ensemble has been selected to have a control climate in top-of-atmosphere energy balance [Yamazaki et al. 2013, J.G.R.]. We force this ensemble with several idealised climate-forcing scenarios including carbon dioxide step and transient profiles, solar radiation management geoengineering experiments with stratospheric aerosols, and short-lived climate forcing agents. We will present the results from several of these forcing scenarios under GCM parametric uncertainty. We examine the global mean precipitation energy budget to understand the robustness of a simple non-linear global precipitation model [Good et al. 2012, Clim. Dyn.] as a better explanation of precipitation changes in transient climate projections under GCM parametric uncertainty than a simple linear tropospheric energy balance model. We will also present work investigating robust conclusions about precipitation changes in a balanced ensemble of idealised solar radiation management scenarios [Kravitz et al. 2011, Atmos. Sci. Let.].
Ensemble of classifiers for ontology enrichment
NASA Astrophysics Data System (ADS)
Semenova, A. V.; Kureichik, V. M.
2018-05-01
A classifier is a basis of ontology learning systems. Classification of text documents is used in many applications, such as information retrieval, information extraction, definition of spam. A new ensemble of classifiers based on SVM (a method of support vectors), LSTM (neural network) and word embedding are suggested. An experiment was conducted on open data, which allows us to conclude that the proposed classification method is promising. The implementation of the proposed classifier is performed in the Matlab using the functions of the Text Analytics Toolbox. The principal difference between the proposed ensembles of classifiers is the high quality of classification of data at acceptable time costs.
Multi-Optimisation Consensus Clustering
NASA Astrophysics Data System (ADS)
Li, Jian; Swift, Stephen; Liu, Xiaohui
Ensemble Clustering has been developed to provide an alternative way of obtaining more stable and accurate clustering results. It aims to avoid the biases of individual clustering algorithms. However, it is still a challenge to develop an efficient and robust method for Ensemble Clustering. Based on an existing ensemble clustering method, Consensus Clustering (CC), this paper introduces an advanced Consensus Clustering algorithm called Multi-Optimisation Consensus Clustering (MOCC), which utilises an optimised Agreement Separation criterion and a Multi-Optimisation framework to improve the performance of CC. Fifteen different data sets are used for evaluating the performance of MOCC. The results reveal that MOCC can generate more accurate clustering results than the original CC algorithm.
Method of inducing surface ensembles on a metal catalyst
Miller, Steven S.
1989-01-01
A method of inducing surface ensembles on a transition metal catalyst used in the conversion of a reactant gas or gas mixture, such as carbon monoxide and hydrogen into hydrocarbons (the Fischer-Tropsch reaction) is disclosed which comprises adding a Lewis base to the syngas (CO+H.sub.2) mixture before reaction takes place. The formation of surface ensembles in this manner restricts the number and types of reaction pathways which will be utilized, thus greatly narrowing the product distribution and maximizing the efficiency of the Fischer-Tropsch reaction. Similarly, amines may also be produced by the conversion of reactant gas or gases, such as nitrogen, hydrogen, or hydrocarbon constituents.
Method of inducing surface ensembles on a metal catalyst
Miller, S.S.
1987-10-02
A method of inducing surface ensembles on a transition metal catalyst used in the conversion of a reactant gas or gas mixture, such as carbon monoxide and hydrogen into hydrocarbons (the Fischer-Tropsch reaction) is disclosed which comprises adding a Lewis base to the syngas (CO + H/sub 2/) mixture before reaction takes place. The formation of surface ensembles in this manner restricts the number and types of reaction pathways which will be utilized, thus greatly narrowing the product distribution and maximizing the efficiency of the Fischer-Tropsch reaction. Similarly, amines may also be produced by the conversion of reactant gas or gases, such as nitrogen, hydrogen, or hydrocarbon constituents.
Skill of ENSEMBLES seasonal re-forecasts for malaria prediction in West Africa
NASA Astrophysics Data System (ADS)
Jones, A. E.; Morse, A. P.
2012-12-01
This study examines the performance of malaria-relevant climate variables from the ENSEMBLES seasonal ensemble re-forecasts for sub-Saharan West Africa, using a dynamic malaria model to transform temperature and rainfall forecasts into simulated malaria incidence and verifying these forecasts against simulations obtained by driving the malaria model with General Circulation Model-derived reanalysis. Two subregions of forecast skill are identified: the highlands of Cameroon, where low temperatures limit simulated malaria during the forecast period and interannual variability in simulated malaria is closely linked to variability in temperature, and northern Nigeria/southern Niger, where simulated malaria variability is strongly associated with rainfall variability during the peak rain months.
An ensemble rank learning approach for gene prioritization.
Lee, Po-Feng; Soo, Von-Wun
2013-01-01
Several different computational approaches have been developed to solve the gene prioritization problem. We intend to use the ensemble boosting learning techniques to combine variant computational approaches for gene prioritization in order to improve the overall performance. In particular we add a heuristic weighting function to the Rankboost algorithm according to: 1) the absolute ranks generated by the adopted methods for a certain gene, and 2) the ranking relationship between all gene-pairs from each prioritization result. We select 13 known prostate cancer genes in OMIM database as training set and protein coding gene data in HGNC database as test set. We adopt the leave-one-out strategy for the ensemble rank boosting learning. The experimental results show that our ensemble learning approach outperforms the four gene-prioritization methods in ToppGene suite in the ranking results of the 13 known genes in terms of mean average precision, ROC and AUC measures.
On extending Kohn-Sham density functionals to systems with fractional number of electrons.
Li, Chen; Lu, Jianfeng; Yang, Weitao
2017-06-07
We analyze four ways of formulating the Kohn-Sham (KS) density functionals with a fractional number of electrons, through extending the constrained search space from the Kohn-Sham and the generalized Kohn-Sham (GKS) non-interacting v-representable density domain for integer systems to four different sets of densities for fractional systems. In particular, these density sets are (I) ensemble interacting N-representable densities, (II) ensemble non-interacting N-representable densities, (III) non-interacting densities by the Janak construction, and (IV) non-interacting densities whose composing orbitals satisfy the Aufbau occupation principle. By proving the equivalence of the underlying first order reduced density matrices associated with these densities, we show that sets (I), (II), and (III) are equivalent, and all reduce to the Janak construction. Moreover, for functionals with the ensemble v-representable assumption at the minimizer, (III) reduces to (IV) and thus justifies the previous use of the Aufbau protocol within the (G)KS framework in the study of the ground state of fractional electron systems, as defined in the grand canonical ensemble at zero temperature. By further analyzing the Aufbau solution for different density functional approximations (DFAs) in the (G)KS scheme, we rigorously prove that there can be one and only one fractional occupation for the Hartree Fock functional, while there can be multiple fractional occupations for general DFAs in the presence of degeneracy. This has been confirmed by numerical calculations using the local density approximation as a representative of general DFAs. This work thus clarifies important issues on density functional theory calculations for fractional electron systems.
NASA Astrophysics Data System (ADS)
Chardon, J.; Mathevet, T.; Le Lay, M.; Gailhard, J.
2012-04-01
In the context of a national energy company (EDF : Electricité de France), hydro-meteorological forecasts are necessary to ensure safety and security of installations, meet environmental standards and improve water ressources management and decision making. Hydrological ensemble forecasts allow a better representation of meteorological and hydrological forecasts uncertainties and improve human expertise of hydrological forecasts, which is essential to synthesize available informations, coming from different meteorological and hydrological models and human experience. An operational hydrological ensemble forecasting chain has been developed at EDF since 2008 and is being used since 2010 on more than 30 watersheds in France. This ensemble forecasting chain is characterized ensemble pre-processing (rainfall and temperature) and post-processing (streamflow), where a large human expertise is solicited. The aim of this paper is to compare 2 hydrological ensemble post-processing methods developed at EDF in order improve ensemble forecasts reliability (similar to Monatanari &Brath, 2004; Schaefli et al., 2007). The aim of the post-processing methods is to dress hydrological ensemble forecasts with hydrological model uncertainties, based on perfect forecasts. The first method (called empirical approach) is based on a statistical modelisation of empirical error of perfect forecasts, by streamflow sub-samples of quantile class and lead-time. The second method (called dynamical approach) is based on streamflow sub-samples of quantile class and streamflow variation, and lead-time. On a set of 20 watersheds used for operational forecasts, results show that both approaches are necessary to ensure a good post-processing of hydrological ensemble, allowing a good improvement of reliability, skill and sharpness of ensemble forecasts. The comparison of the empirical and dynamical approaches shows the limits of the empirical approach which is not able to take into account hydrological dynamic and processes, i. e. sample heterogeneity. For a same streamflow range corresponds different processes such as rising limbs or recession, where uncertainties are different. The dynamical approach improves reliability, skills and sharpness of forecasts and globally reduces confidence intervals width. When compared in details, the dynamical approach allows a noticeable reduction of confidence intervals during recessions where uncertainty is relatively lower and a slight increase of confidence intervals during rising limbs or snowmelt where uncertainty is greater. The dynamic approach, validated by forecaster's experience that considered the empirical approach not discriminative enough, improved forecaster's confidence and communication of uncertainties. Montanari, A. and Brath, A., (2004). A stochastic approach for assessing the uncertainty of rainfall-runoff simulations. Water Resources Research, 40, W01106, doi:10.1029/2003WR002540. Schaefli, B., Balin Talamba, D. and Musy, A., (2007). Quantifying hydrological modeling errors through a mixture of normal distributions. Journal of Hydrology, 332, 303-315.
NASA Astrophysics Data System (ADS)
Khajehei, Sepideh; Moradkhani, Hamid
2015-04-01
Producing reliable and accurate hydrologic ensemble forecasts are subject to various sources of uncertainty, including meteorological forcing, initial conditions, model structure, and model parameters. Producing reliable and skillful precipitation ensemble forecasts is one approach to reduce the total uncertainty in hydrological applications. Currently, National Weather Prediction (NWP) models are developing ensemble forecasts for various temporal ranges. It is proven that raw products from NWP models are biased in mean and spread. Given the above state, there is a need for methods that are able to generate reliable ensemble forecasts for hydrological applications. One of the common techniques is to apply statistical procedures in order to generate ensemble forecast from NWP-generated single-value forecasts. The procedure is based on the bivariate probability distribution between the observation and single-value precipitation forecast. However, one of the assumptions of the current method is fitting Gaussian distribution to the marginal distributions of observed and modeled climate variable. Here, we have described and evaluated a Bayesian approach based on Copula functions to develop an ensemble precipitation forecast from the conditional distribution of single-value precipitation forecasts. Copula functions are known as the multivariate joint distribution of univariate marginal distributions, which are presented as an alternative procedure in capturing the uncertainties related to meteorological forcing. Copulas are capable of modeling the joint distribution of two variables with any level of correlation and dependency. This study is conducted over a sub-basin in the Columbia River Basin in USA using the monthly precipitation forecasts from Climate Forecast System (CFS) with 0.5x0.5 Deg. spatial resolution to reproduce the observations. The verification is conducted on a different period and the superiority of the procedure is compared with Ensemble Pre-Processor approach currently used by National Weather Service River Forecast Centers in USA.
NASA Astrophysics Data System (ADS)
Tian, D.; Medina, H.
2017-12-01
Post-processing of medium range reference evapotranspiration (ETo) forecasts based on numerical weather prediction (NWP) models has the potential of improving the quality and utility of these forecasts. This work compares the performance of several post-processing methods for correcting ETo forecasts over the continental U.S. generated from The Observing System Research and Predictability Experiment (THORPEX) Interactive Grand Global Ensemble (TIGGE) database using data from Europe (EC), the United Kingdom (MO), and the United States (NCEP). The pondered post-processing techniques are: simple bias correction, the use of multimodels, the Ensemble Model Output Statistics (EMOS, Gneitting et al., 2005) and the Bayesian Model Averaging (BMA, Raftery et al., 2005). ETo estimates based on quality-controlled U.S. Regional Climate Reference Network measurements, and computed with the FAO 56 Penman Monteith equation, are adopted as baseline. EMOS and BMA are generally the most efficient post-processing techniques of the ETo forecasts. Nevertheless, the simple bias correction of the best model is commonly much more rewarding than using multimodel raw forecasts. Our results demonstrate the potential of different forecasting and post-processing frameworks in operational evapotranspiration and irrigation advisory systems at national scale.
NASA Astrophysics Data System (ADS)
Orellana, Laura; Yoluk, Ozge; Carrillo, Oliver; Orozco, Modesto; Lindahl, Erik
2016-08-01
Protein conformational changes are at the heart of cell functions, from signalling to ion transport. However, the transient nature of the intermediates along transition pathways hampers their experimental detection, making the underlying mechanisms elusive. Here we retrieve dynamic information on the actual transition routes from principal component analysis (PCA) of structurally-rich ensembles and, in combination with coarse-grained simulations, explore the conformational landscapes of five well-studied proteins. Modelling them as elastic networks in a hybrid elastic-network Brownian dynamics simulation (eBDIMS), we generate trajectories connecting stable end-states that spontaneously sample the crystallographic motions, predicting the structures of known intermediates along the paths. We also show that the explored non-linear routes can delimit the lowest energy passages between end-states sampled by atomistic molecular dynamics. The integrative methodology presented here provides a powerful framework to extract and expand dynamic pathway information from the Protein Data Bank, as well as to validate sampling methods in general.
Active relearning for robust supervised classification of pulmonary emphysema
NASA Astrophysics Data System (ADS)
Raghunath, Sushravya; Rajagopalan, Srinivasan; Karwoski, Ronald A.; Bartholmai, Brian J.; Robb, Richard A.
2012-03-01
Radiologists are adept at recognizing the appearance of lung parenchymal abnormalities in CT scans. However, the inconsistent differential diagnosis, due to subjective aggregation, mandates supervised classification. Towards optimizing Emphysema classification, we introduce a physician-in-the-loop feedback approach in order to minimize uncertainty in the selected training samples. Using multi-view inductive learning with the training samples, an ensemble of Support Vector Machine (SVM) models, each based on a specific pair-wise dissimilarity metric, was constructed in less than six seconds. In the active relearning phase, the ensemble-expert label conflicts were resolved by an expert. This just-in-time feedback with unoptimized SVMs yielded 15% increase in classification accuracy and 25% reduction in the number of support vectors. The generality of relearning was assessed in the optimized parameter space of six different classifiers across seven dissimilarity metrics. The resultant average accuracy improved to 21%. The co-operative feedback method proposed here could enhance both diagnostic and staging throughput efficiency in chest radiology practice.
Seasonal Drought Prediction: Advances, Challenges, and Future Prospects
NASA Astrophysics Data System (ADS)
Hao, Zengchao; Singh, Vijay P.; Xia, Youlong
2018-03-01
Drought prediction is of critical importance to early warning for drought managements. This review provides a synthesis of drought prediction based on statistical, dynamical, and hybrid methods. Statistical drought prediction is achieved by modeling the relationship between drought indices of interest and a suite of potential predictors, including large-scale climate indices, local climate variables, and land initial conditions. Dynamical meteorological drought prediction relies on seasonal climate forecast from general circulation models (GCMs), which can be employed to drive hydrological models for agricultural and hydrological drought prediction with the predictability determined by both climate forcings and initial conditions. Challenges still exist in drought prediction at long lead time and under a changing environment resulting from natural and anthropogenic factors. Future research prospects to improve drought prediction include, but are not limited to, high-quality data assimilation, improved model development with key processes related to drought occurrence, optimal ensemble forecast to select or weight ensembles, and hybrid drought prediction to merge statistical and dynamical forecasts.
EMPIRE and pyenda: Two ensemble-based data assimilation systems written in Fortran and Python
NASA Astrophysics Data System (ADS)
Geppert, Gernot; Browne, Phil; van Leeuwen, Peter Jan; Merker, Claire
2017-04-01
We present and compare the features of two ensemble-based data assimilation frameworks, EMPIRE and pyenda. Both frameworks allow to couple models to the assimilation codes using the Message Passing Interface (MPI), leading to extremely efficient and fast coupling between models and the data-assimilation codes. The Fortran-based system EMPIRE (Employing Message Passing Interface for Researching Ensembles) is optimized for parallel, high-performance computing. It currently includes a suite of data assimilation algorithms including variants of the ensemble Kalman and several the particle filters. EMPIRE is targeted at models of all kinds of complexity and has been coupled to several geoscience models, eg. the Lorenz-63 model, a barotropic vorticity model, the general circulation model HadCM3, the ocean model NEMO, and the land-surface model JULES. The Python-based system pyenda (Python Ensemble Data Assimilation) allows Fortran- and Python-based models to be used for data assimilation. Models can be coupled either using MPI or by using a Python interface. Using Python allows quick prototyping and pyenda is aimed at small to medium scale models. pyenda currently includes variants of the ensemble Kalman filter and has been coupled to the Lorenz-63 model, an advection-based precipitation nowcasting scheme, and the dynamic global vegetation model JSBACH.
2015-06-19
effective and scientifically valid method of making comparisons of clothing and equipment changes prior to conducting human research. predictive modeling...valid method of making comparisons of clothing and equipment changes prior to conducting human research. 2 INTRODUCTION Modern day...clothing and equipment changes prior to conducting human research. METHODS Ensembles Three different body armor (BA) plus clothing ensembles were
Scalable and balanced dynamic hybrid data assimilation
NASA Astrophysics Data System (ADS)
Kauranne, Tuomo; Amour, Idrissa; Gunia, Martin; Kallio, Kari; Lepistö, Ahti; Koponen, Sampsa
2017-04-01
Scalability of complex weather forecasting suites is dependent on the technical tools available for implementing highly parallel computational kernels, but to an equally large extent also on the dependence patterns between various components of the suite, such as observation processing, data assimilation and the forecast model. Scalability is a particular challenge for 4D variational assimilation methods that necessarily couple the forecast model into the assimilation process and subject this combination to an inherently serial quasi-Newton minimization process. Ensemble based assimilation methods are naturally more parallel, but large models force ensemble sizes to be small and that results in poor assimilation accuracy, somewhat akin to shooting with a shotgun in a million-dimensional space. The Variational Ensemble Kalman Filter (VEnKF) is an ensemble method that can attain the accuracy of 4D variational data assimilation with a small ensemble size. It achieves this by processing a Gaussian approximation of the current error covariance distribution, instead of a set of ensemble members, analogously to the Extended Kalman Filter EKF. Ensemble members are re-sampled every time a new set of observations is processed from a new approximation of that Gaussian distribution which makes VEnKF a dynamic assimilation method. After this a smoothing step is applied that turns VEnKF into a dynamic Variational Ensemble Kalman Smoother VEnKS. In this smoothing step, the same process is iterated with frequent re-sampling of the ensemble but now using past iterations as surrogate observations until the end result is a smooth and balanced model trajectory. In principle, VEnKF could suffer from similar scalability issues as 4D-Var. However, this can be avoided by isolating the forecast model completely from the minimization process by implementing the latter as a wrapper code whose only link to the model is calling for many parallel and totally independent model runs, all of them implemented as parallel model runs themselves. The only bottleneck in the process is the gathering and scattering of initial and final model state snapshots before and after the parallel runs which requires a very efficient and low-latency communication network. However, the volume of data communicated is small and the intervening minimization steps are only 3D-Var, which means their computational load is negligible compared with the fully parallel model runs. We present example results of scalable VEnKF with the 4D lake and shallow sea model COHERENS, assimilating simultaneously continuous in situ measurements in a single point and infrequent satellite images that cover a whole lake, with the fully scalable VEnKF.
Ensemble modeling of very small ZnO nanoparticles.
Niederdraenk, Franziska; Seufert, Knud; Stahl, Andreas; Bhalerao-Panajkar, Rohini S; Marathe, Sonali; Kulkarni, Sulabha K; Neder, Reinhard B; Kumpf, Christian
2011-01-14
The detailed structural characterization of nanoparticles is a very important issue since it enables a precise understanding of their electronic, optical and magnetic properties. Here we introduce a new method for modeling the structure of very small particles by means of powder X-ray diffraction. Using thioglycerol-capped ZnO nanoparticles with a diameter of less than 3 nm as an example we demonstrate that our ensemble modeling method is superior to standard XRD methods like, e.g., Rietveld refinement. Besides fundamental properties (size, anisotropic shape and atomic structure) more sophisticated properties like imperfections in the lattice, a size distribution as well as strain and relaxation effects in the particles and-in particular-at their surface (surface relaxation effects) can be obtained. Ensemble properties, i.e., distributions of the particle size and other properties, can also be investigated which makes this method superior to imaging techniques like (high resolution) transmission electron microscopy or atomic force microscopy, in particular for very small nanoparticles. For the particles under study an excellent agreement of calculated and experimental X-ray diffraction patterns could be obtained with an ensemble of anisotropic polyhedral particles of three dominant sizes, wurtzite structure and a significant relaxation of Zn atoms close to the surface.
Girsanov reweighting for path ensembles and Markov state models
NASA Astrophysics Data System (ADS)
Donati, L.; Hartmann, C.; Keller, B. G.
2017-06-01
The sensitivity of molecular dynamics on changes in the potential energy function plays an important role in understanding the dynamics and function of complex molecules. We present a method to obtain path ensemble averages of a perturbed dynamics from a set of paths generated by a reference dynamics. It is based on the concept of path probability measure and the Girsanov theorem, a result from stochastic analysis to estimate a change of measure of a path ensemble. Since Markov state models (MSMs) of the molecular dynamics can be formulated as a combined phase-space and path ensemble average, the method can be extended to reweight MSMs by combining it with a reweighting of the Boltzmann distribution. We demonstrate how to efficiently implement the Girsanov reweighting in a molecular dynamics simulation program by calculating parts of the reweighting factor "on the fly" during the simulation, and we benchmark the method on test systems ranging from a two-dimensional diffusion process and an artificial many-body system to alanine dipeptide and valine dipeptide in implicit and explicit water. The method can be used to study the sensitivity of molecular dynamics on external perturbations as well as to reweight trajectories generated by enhanced sampling schemes to the original dynamics.
Improving ensemble decision tree performance using Adaboost and Bagging
NASA Astrophysics Data System (ADS)
Hasan, Md. Rajib; Siraj, Fadzilah; Sainin, Mohd Shamrie
2015-12-01
Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.
Comparison of different filter methods for data assimilation in the unsaturated zone
NASA Astrophysics Data System (ADS)
Lange, Natascha; Berkhahn, Simon; Erdal, Daniel; Neuweiler, Insa
2016-04-01
The unsaturated zone is an important compartment, which plays a role for the division of terrestrial water fluxes into surface runoff, groundwater recharge and evapotranspiration. For data assimilation in coupled systems it is therefore important to have a good representation of the unsaturated zone in the model. Flow processes in the unsaturated zone have all the typical features of flow in porous media: Processes can have long memory and as observations are scarce, hydraulic model parameters cannot be determined easily. However, they are important for the quality of model predictions. On top of that, the established flow models are highly non-linear. For these reasons, the use of the popular Ensemble Kalman filter as a data assimilation method to estimate state and parameters in unsaturated zone models could be questioned. With respect to the long process memory in the subsurface, it has been suggested that iterative filters and smoothers may be more suitable for parameter estimation in unsaturated media. We test the performance of different iterative filters and smoothers for data assimilation with a focus on parameter updates in the unsaturated zone. In particular we compare the Iterative Ensemble Kalman Filter and Smoother as introduced by Bocquet and Sakov (2013) as well as the Confirming Ensemble Kalman Filter and the modified Restart Ensemble Kalman Filter proposed by Song et al. (2014) to the original Ensemble Kalman Filter (Evensen, 2009). This is done with simple test cases generated numerically. We consider also test examples with layering structure, as a layering structure is often found in natural soils. We assume that observations are water content, obtained from TDR probes or other observation methods sampling relatively small volumes. Particularly in larger data assimilation frameworks, a reasonable balance between computational effort and quality of results has to be found. Therefore, we compare computational costs of the different methods as well as the quality of open loop model predictions and the estimated parameters. Bocquet, M. and P. Sakov, 2013: Joint state and parameter estimation with an iterative ensemble Kalman smoother, Nonlinear Processes in Geophysics 20(5): 803-818. Evensen, G., 2009: Data assimilation: The ensemble Kalman filter. Springer Science & Business Media. Song, X.H., L.S. Shi, M. Ye, J.Z. Yang and I.M. Navon, 2014: Numerical comparison of iterative ensemble Kalman filters for unsaturated flow inverse modeling. Vadose Zone Journal 13(2), 10.2136/vzj2013.05.0083.
An Ensemble-Based Smoother with Retrospectively Updated Weights for Highly Nonlinear Systems
NASA Technical Reports Server (NTRS)
Chin, T. M.; Turmon, M. J.; Jewell, J. B.; Ghil, M.
2006-01-01
Monte Carlo computational methods have been introduced into data assimilation for nonlinear systems in order to alleviate the computational burden of updating and propagating the full probability distribution. By propagating an ensemble of representative states, algorithms like the ensemble Kalman filter (EnKF) and the resampled particle filter (RPF) rely on the existing modeling infrastructure to approximate the distribution based on the evolution of this ensemble. This work presents an ensemble-based smoother that is applicable to the Monte Carlo filtering schemes like EnKF and RPF. At the minor cost of retrospectively updating a set of weights for ensemble members, this smoother has demonstrated superior capabilities in state tracking for two highly nonlinear problems: the double-well potential and trivariate Lorenz systems. The algorithm does not require retrospective adaptation of the ensemble members themselves, and it is thus suited to a streaming operational mode. The accuracy of the proposed backward-update scheme in estimating non-Gaussian distributions is evaluated by comparison to the more accurate estimates provided by a Markov chain Monte Carlo algorithm.
Bayesian Tracking of Emerging Epidemics Using Ensemble Optimal Statistical Interpolation
Cobb, Loren; Krishnamurthy, Ashok; Mandel, Jan; Beezley, Jonathan D.
2014-01-01
We present a preliminary test of the Ensemble Optimal Statistical Interpolation (EnOSI) method for the statistical tracking of an emerging epidemic, with a comparison to its popular relative for Bayesian data assimilation, the Ensemble Kalman Filter (EnKF). The spatial data for this test was generated by a spatial susceptible-infectious-removed (S-I-R) epidemic model of an airborne infectious disease. Both tracking methods in this test employed Poisson rather than Gaussian noise, so as to handle epidemic data more accurately. The EnOSI and EnKF tracking methods worked well on the main body of the simulated spatial epidemic, but the EnOSI was able to detect and track a distant secondary focus of infection that the EnKF missed entirely. PMID:25113590
Clustering-Based Ensemble Learning for Activity Recognition in Smart Homes
Jurek, Anna; Nugent, Chris; Bi, Yaxin; Wu, Shengli
2014-01-01
Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks. PMID:25014095
Clustering-based ensemble learning for activity recognition in smart homes.
Jurek, Anna; Nugent, Chris; Bi, Yaxin; Wu, Shengli
2014-07-10
Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks.
Village Building Identification Based on Ensemble Convolutional Neural Networks
Guo, Zhiling; Chen, Qi; Xu, Yongwei; Shibasaki, Ryosuke; Shao, Xiaowei
2017-01-01
In this study, we present the Ensemble Convolutional Neural Network (ECNN), an elaborate CNN frame formulated based on ensembling state-of-the-art CNN models, to identify village buildings from open high-resolution remote sensing (HRRS) images. First, to optimize and mine the capability of CNN for village mapping and to ensure compatibility with our classification targets, a few state-of-the-art models were carefully optimized and enhanced based on a series of rigorous analyses and evaluations. Second, rather than directly implementing building identification by using these models, we exploited most of their advantages by ensembling their feature extractor parts into a stronger model called ECNN based on the multiscale feature learning method. Finally, the generated ECNN was applied to a pixel-level classification frame to implement object identification. The proposed method can serve as a viable tool for village building identification with high accuracy and efficiency. The experimental results obtained from the test area in Savannakhet province, Laos, prove that the proposed ECNN model significantly outperforms existing methods, improving overall accuracy from 96.64% to 99.26%, and kappa from 0.57 to 0.86. PMID:29084154
Ligand-biased ensemble receptor docking (LigBEnD): a hybrid ligand/receptor structure-based approach
NASA Astrophysics Data System (ADS)
Lam, Polo C.-H.; Abagyan, Ruben; Totrov, Maxim
2018-01-01
Ligand docking to flexible protein molecules can be efficiently carried out through ensemble docking to multiple protein conformations, either from experimental X-ray structures or from in silico simulations. The success of ensemble docking often requires the careful selection of complementary protein conformations, through docking and scoring of known co-crystallized ligands. False positives, in which a ligand in a wrong pose achieves a better docking score than that of native pose, arise as additional protein conformations are added. In the current study, we developed a new ligand-biased ensemble receptor docking method and composite scoring function which combine the use of ligand-based atomic property field (APF) method with receptor structure-based docking. This method helps us to correctly dock 30 out of 36 ligands presented by the D3R docking challenge. For the six mis-docked ligands, the cognate receptor structures prove to be too different from the 40 available experimental Pocketome conformations used for docking and could be identified only by receptor sampling beyond experimentally explored conformational subspace.
Predicting drug-induced liver injury using ensemble learning methods and molecular fingerprints.
Ai, Haixin; Chen, Wen; Zhang, Li; Huang, Liangchao; Yin, Zimo; Hu, Huan; Zhao, Qi; Zhao, Jian; Liu, Hongsheng
2018-05-21
Drug-induced liver injury (DILI) is a major safety concern in the drug-development process, and various methods have been proposed to predict the hepatotoxicity of compounds during the early stages of drug trials. In this study, we developed an ensemble model using three machine learning algorithms and 12 molecular fingerprints from a dataset containing 1,241 diverse compounds. The ensemble model achieved an average accuracy of 71.1±2.6%, sensitivity of 79.9±3.6%, specificity of 60.3±4.8%, and area under the receiver operating characteristic curve (AUC) of 0.764±0.026 in five-fold cross-validation and an accuracy of 84.3%, sensitivity of 86.9%, specificity of 75.4%, and AUC of 0.904 in an external validation dataset of 286 compounds collected from the Liver Toxicity Knowledge Base (LTKB). Compared with previous methods, the ensemble model achieved relatively high accuracy and sensitivity. We also identified several substructures related to DILI. In addition, we provide a web server offering access to our models (http://ccsipb.lnu.edu.cn/toxicity/HepatoPred-EL/).
A hybrid filtering method based on a novel empirical mode decomposition for friction signals
NASA Astrophysics Data System (ADS)
Li, Chengwei; Zhan, Liwei
2015-12-01
During a measurement, the measured signal usually contains noise. To remove the noise and preserve the important feature of the signal, we introduce a hybrid filtering method that uses a new intrinsic mode function (NIMF) and a modified Hausdorff distance. The NIMF is defined as the difference between the noisy signal and each intrinsic mode function (IMF), which is obtained by empirical mode decomposition (EMD), ensemble EMD, complementary ensemble EMD, or complete ensemble EMD with adaptive noise (CEEMDAN). The relevant mode selecting is based on the similarity between the first NIMF and the rest of the NIMFs. With this filtering method, the EMD and improved versions are used to filter the simulation and friction signals. The friction signal between an airplane tire and the runaway is recorded during a simulated airplane touchdown and features spikes of various amplitudes and noise. The filtering effectiveness of the four hybrid filtering methods are compared and discussed. The results show that the filtering method based on CEEMDAN outperforms other signal filtering methods.
Device and Method for Gathering Ensemble Data Sets
NASA Technical Reports Server (NTRS)
Racette, Paul E. (Inventor)
2014-01-01
An ensemble detector uses calibrated noise references to produce ensemble sets of data from which properties of non-stationary processes may be extracted. The ensemble detector comprising: a receiver; a switching device coupled to the receiver, the switching device configured to selectively connect each of a plurality of reference noise signals to the receiver; and a gain modulation circuit coupled to the receiver and configured to vary a gain of the receiver based on a forcing signal; whereby the switching device selectively connects each of the plurality of reference noise signals to the receiver to produce an output signal derived from the plurality of reference noise signals and the forcing signal.
Contact planarization of ensemble nanowires
NASA Astrophysics Data System (ADS)
Chia, A. C. E.; LaPierre, R. R.
2011-06-01
The viability of four organic polymers (S1808, SC200, SU8 and Cyclotene) as filling materials to achieve planarization of ensemble nanowire arrays is reported. Analysis of the porosity, surface roughness and thermal stability of each filling material was performed. Sonication was used as an effective method to remove the tops of the nanowires (NWs) to achieve complete planarization. Ensemble nanowire devices were fully fabricated and I-V measurements confirmed that Cyclotene effectively planarizes the NWs while still serving the role as an insulating layer between the top and bottom contacts. These processes and analysis can be easily implemented into future characterization and fabrication of ensemble NWs for optoelectronic device applications.
Contact planarization of ensemble nanowires.
Chia, A C E; LaPierre, R R
2011-06-17
The viability of four organic polymers (S1808, SC200, SU8 and Cyclotene) as filling materials to achieve planarization of ensemble nanowire arrays is reported. Analysis of the porosity, surface roughness and thermal stability of each filling material was performed. Sonication was used as an effective method to remove the tops of the nanowires (NWs) to achieve complete planarization. Ensemble nanowire devices were fully fabricated and I-V measurements confirmed that Cyclotene effectively planarizes the NWs while still serving the role as an insulating layer between the top and bottom contacts. These processes and analysis can be easily implemented into future characterization and fabrication of ensemble NWs for optoelectronic device applications.
An analytical approach to gravitational lensing by an ensemble of axisymmetric lenses
NASA Technical Reports Server (NTRS)
Lee, Man Hoi; Spergel, David N.
1990-01-01
The problem of gravitational lensing by an ensemble of identical axisymmetric lenses randomly distributed on a single lens plane is considered and a formal expression is derived for the joint probability density of finding shear and convergence at a random point on the plane. The amplification probability for a source can be accurately estimated from the distribution in shear and convergence. This method is applied to two cases: lensing by an ensemble of point masses and by an ensemble of objects with Gaussian surface mass density. There is no convergence for point masses whereas shear is negligible for wide Gaussian lenses.
NASA Astrophysics Data System (ADS)
Shen, Feifei; Xu, Dongmei; Xue, Ming; Min, Jinzhong
2017-07-01
This study examines the impacts of assimilating radar radial velocity (Vr) data for the simulation of hurricane Ike (2008) with two different ensemble generation techniques in the framework of the hybrid ensemble-variational (EnVar) data assimilation system of Weather Research and Forecasting model. For the generation of ensemble perturbations we apply two techniques, the ensemble transform Kalman filter (ETKF) and the ensemble of data assimilation (EDA). For the ETKF-EnVar, the forecast ensemble perturbations are updated by the ETKF, while for the EDA-EnVar, the hybrid is employed to update each ensemble member with perturbed observations. The ensemble mean is analyzed by the hybrid method with flow-dependent ensemble covariance for both EnVar. The sensitivity of analyses and forecasts to the two applied ensemble generation techniques is investigated in our current study. It is found that the EnVar system is rather stable with different ensemble update techniques in terms of its skill on improving the analyses and forecasts. The EDA-EnVar-based ensemble perturbations are likely to include slightly less organized spatial structures than those in ETKF-EnVar, and the perturbations of the latter are constructed more dynamically. Detailed diagnostics reveal that both of the EnVar schemes not only produce positive temperature increments around the hurricane center but also systematically adjust the hurricane location with the hurricane-specific error covariance. On average, the analysis and forecast from the ETKF-EnVar have slightly smaller errors than that from the EDA-EnVar in terms of track, intensity, and precipitation forecast. Moreover, ETKF-EnVar yields better forecasts when verified against conventional observations.
Pauci ex tanto numero: reduce redundancy in multi-model ensembles
NASA Astrophysics Data System (ADS)
Solazzo, E.; Riccio, A.; Kioutsioukis, I.; Galmarini, S.
2013-08-01
We explicitly address the fundamental issue of member diversity in multi-model ensembles. To date, no attempts in this direction have been documented within the air quality (AQ) community despite the extensive use of ensembles in this field. Common biases and redundancy are the two issues directly deriving from lack of independence, undermining the significance of a multi-model ensemble, and are the subject of this study. Shared, dependant biases among models do not cancel out but will instead determine a biased ensemble. Redundancy derives from having too large a portion of common variance among the members of the ensemble, producing overconfidence in the predictions and underestimation of the uncertainty. The two issues of common biases and redundancy are analysed in detail using the AQMEII ensemble of AQ model results for four air pollutants in two European regions. We show that models share large portions of bias and variance, extending well beyond those induced by common inputs. We make use of several techniques to further show that subsets of models can explain the same amount of variance as the full ensemble with the advantage of being poorly correlated. Selecting the members for generating skilful, non-redundant ensembles from such subsets proved, however, non-trivial. We propose and discuss various methods of member selection and rate the ensemble performance they produce. In most cases, the full ensemble is outscored by the reduced ones. We conclude that, although independence of outputs may not always guarantee enhancement of scores (but this depends upon the skill being investigated), we discourage selecting the members of the ensemble simply on the basis of scores; that is, independence and skills need to be considered disjointly.
Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP.
Shim, Yoonsik; Philippides, Andrew; Staras, Kevin; Husbands, Phil
2016-10-01
We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP). The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM) networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.
Complete analysis of ensemble inequivalence in the Blume-Emery-Griffiths model
NASA Astrophysics Data System (ADS)
Hovhannisyan, V. V.; Ananikian, N. S.; Campa, A.; Ruffo, S.
2017-12-01
We study inequivalence of canonical and microcanonical ensembles in the mean-field Blume-Emery-Griffiths model. This generalizes previous results obtained for the Blume-Capel model. The phase diagram strongly depends on the value of the biquadratic exchange interaction K , the additional feature present in the Blume-Emery-Griffiths model. At small values of K , as for the Blume-Capel model, lines of first- and second-order phase transitions between a ferromagnetic and a paramagnetic phase are present, separated by a tricritical point whose location is different in the two ensembles. At higher values of K the phase diagram changes substantially, with the appearance of a triple point in the canonical ensemble, which does not find any correspondence in the microcanonical ensemble. Moreover, one of the first-order lines that starts from the triple point ends in a critical point, whose position in the phase diagram is different in the two ensembles. This line separates two paramagnetic phases characterized by a different value of the quadrupole moment. These features were not previously studied for other models and substantially enrich the landscape of ensemble inequivalence, identifying new aspects that had been discussed in a classification of phase transitions based on singularity theory. Finally, we discuss ergodicity breaking, which is highlighted by the presence of gaps in the accessible values of magnetization at low energies: it also displays new interesting patterns that are not present in the Blume-Capel model.
The Nature and Variability of Ensemble Sensitivity Fields that Diagnose Severe Convection
NASA Astrophysics Data System (ADS)
Ancell, B. C.
2017-12-01
Ensemble sensitivity analysis (ESA) is a statistical technique that uses information from an ensemble of forecasts to reveal relationships between chosen forecast metrics and the larger atmospheric state at various forecast times. A number of studies have employed ESA from the perspectives of dynamical interpretation, observation targeting, and ensemble subsetting toward improved probabilistic prediction of high-impact events, mostly at synoptic scales. We tested ESA using convective forecast metrics at the 2016 HWT Spring Forecast Experiment to understand the utility of convective ensemble sensitivity fields in improving forecasts of severe convection and its individual hazards. The main purpose of this evaluation was to understand the temporal coherence and general characteristics of convective sensitivity fields toward future use in improving ensemble predictability within an operational framework.The magnitude and coverage of simulated reflectivity, updraft helicity, and surface wind speed were used as response functions, and the sensitivity of these functions to winds, temperatures, geopotential heights, and dew points at different atmospheric levels and at different forecast times were evaluated on a daily basis throughout the HWT Spring Forecast experiment. These sensitivities were calculated within the Texas Tech real-time ensemble system, which possesses 42 members that run twice daily to 48-hr forecast time. Here we summarize both the findings regarding the nature of the sensitivity fields and the evaluation of the participants that reflects their opinions of the utility of operational ESA. The future direction of ESA for operational use will also be discussed.
Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP
Staras, Kevin
2016-01-01
We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP). The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM) networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture. PMID:27760125
In the Beginning of the Middle: Curriculum Considerations for Middle School General Music
ERIC Educational Resources Information Center
Giebelhausen, Robin
2015-01-01
Middle school general music is an experience that numerous music educators feel underprepared to teach. Because many undergraduate programs spend little time on this teaching scenario and because the challenges of middle school general music are different from those of elementary general music or middle school ensembles, teachers often lack the…
Filatov, Michael; Liu, Fang; Kim, Kwang S.; ...
2016-12-22
Here, the spin-restricted ensemble-referenced Kohn-Sham (REKS) method is based on an ensemble representation of the density and is capable of correctly describing the non-dynamic electron correlation stemming from (near-)degeneracy of several electronic configurations. The existing REKS methodology describes systems with two electrons in two fractionally occupied orbitals. In this work, the REKS methodology is extended to treat systems with four fractionally occupied orbitals accommodating four electrons and self-consistent implementation of the REKS(4,4) method with simultaneous optimization of the orbitals and their fractional occupation numbers is reported. The new method is applied to a number of molecular systems where simultaneous dissociationmore » of several chemical bonds takes place, as well as to the singlet ground states of organic tetraradicals 2,4-didehydrometaxylylene and 1,4,6,9-spiro[4.4]nonatetrayl.« less
Valdes-Abellan, Javier; Pachepsky, Yakov; Martinez, Gonzalo
2018-01-01
Data assimilation is becoming a promising technique in hydrologic modelling to update not only model states but also to infer model parameters, specifically to infer soil hydraulic properties in Richard-equation-based soil water models. The Ensemble Kalman Filter method is one of the most widely employed method among the different data assimilation alternatives. In this study the complete Matlab© code used to study soil data assimilation efficiency under different soil and climatic conditions is shown. The code shows the method how data assimilation through EnKF was implemented. Richards equation was solved by the used of Hydrus-1D software which was run from Matlab. •MATLAB routines are released to be used/modified without restrictions for other researchers•Data assimilation Ensemble Kalman Filter method code.•Soil water Richard equation flow solved by Hydrus-1D.
Watershed scale response to climate change--Trout Lake Basin, Wisconsin
Walker, John F.; Hunt, Randall J.; Hay, Lauren E.; Markstrom, Steven L.
2012-01-01
Fourteen basins for which the Precipitation Runoff Modeling System has been calibrated and evaluated were selected as study sites. Precipitation Runoff Modeling System is a deterministic, distributed parameter watershed model developed to evaluate the effects of various combinations of precipitation, temperature, and land use on streamflow and general basin hydrology. Output from five General Circulation Model simulations and four emission scenarios were used to develop an ensemble of climate-change scenarios for each basin. These ensembles were simulated with the corresponding Precipitation Runoff Modeling System model. This fact sheet summarizes the hydrologic effect and sensitivity of the Precipitation Runoff Modeling System simulations to climate change for the Trout River Basin at Trout Lake in northern Wisconsin.
Watershed scale response to climate change--Clear Creek Basin, Iowa
Christiansen, Daniel E.; Hay, Lauren E.; Markstrom, Steven L.
2012-01-01
Fourteen basins for which the Precipitation Runoff Modeling System has been calibrated and evaluated were selected as study sites. Precipitation Runoff Modeling System is a deterministic, distributed parameter watershed model developed to evaluate the effects of various combinations of precipitation, temperature, and land use on streamflow and general basin hydrology. Output from five General Circulation Model simulations and four emission scenarios were used to develop an ensemble of climate-change scenarios for each basin. These ensembles were simulated with the corresponding Precipitation Runoff Modeling System model. This fact sheet summarizes the hydrologic effect and sensitivity of the Precipitation Runoff Modeling System simulations to climate change for the Clear Creek Basin, near Coralville, Iowa.
Watershed scale response to climate change--Feather River Basin, California
Koczot, Kathryn M.; Markstrom, Steven L.; Hay, Lauren E.
2012-01-01
Fourteen basins for which the Precipitation Runoff Modeling System has been calibrated and evaluated were selected as study sites. Precipitation Runoff Modeling System is a deterministic, distributed parameter watershed model developed to evaluate the effects of various combinations of precipitation, temperature, and land use on streamflow and general basin hydrology. Output from five General Circulation Model simulations and four emission scenarios were used to develop an ensemble of climate-change scenarios for each basin. These ensembles were simulated with the corresponding Precipitation Runoff Modeling System model. This fact sheet summarizes the hydrologic effect and sensitivity of the Precipitation Runoff Modeling System simulations to climate change for the Feather River Basin, California.
Watershed scale response to climate change--South Fork Flathead River Basin, Montana
Chase, Katherine J.; Hay, Lauren E.; Markstrom, Steven L.
2012-01-01
Fourteen basins for which the Precipitation Runoff Modeling System has been calibrated and evaluated were selected as study sites. Precipitation Runoff Modeling System is a deterministic, distributed parameter watershed model developed to evaluate the effects of various combinations of precipitation, temperature, and land use on streamflow and general basin hydrology. Output from five General Circulation Model simulations and four emission scenarios were used to develop an ensemble of climate-change scenarios for each basin. These ensembles were simulated with the corresponding Precipitation Runoff Modeling System model. This fact sheet summarizes the hydrologic effect and sensitivity of the Precipitation Runoff Modeling System simulations to climate change for the South Fork Flathead River Basin, Montana.
Watershed scale response to climate change--Cathance Stream Basin, Maine
Dudley, Robert W.; Hay, Lauren E.; Markstrom, Steven L.; Hodgkins, Glenn A.
2012-01-01
Fourteen basins for which the Precipitation Runoff Modeling System has been calibrated and evaluated were selected as study sites. Precipitation Runoff Modeling System is a deterministic, distributed parameter watershed model developed to evaluate the effects of various combinations of precipitation, temperature, and land use on streamflow and general basin hydrology. Output from five General Circulation Model simulations and four emission scenarios were used to develop an ensemble of climate-change scenarios for each basin. These ensembles were simulated with the corresponding Precipitation Runoff Modeling System model. This fact sheet summarizes the hydrologic effect and sensitivity of the Precipitation Runoff Modeling System simulations to climate change for the Cathance Stream Basin, Maine.
Watershed scale response to climate change--Pomperaug River Watershed, Connecticut
Bjerklie, David M.; Hay, Lauren E.; Markstrom, Steven L.
2012-01-01
Fourteen basins for which the Precipitation Runoff Modeling System has been calibrated and evaluated were selected as study sites. Precipitation Runoff Modeling System is a deterministic, distributed parameter watershed model developed to evaluate the effects of various combinations of precipitation, temperature, and land use on streamflow and general basin hydrology. Output from five General Circulation Model simulations and four emission scenarios were used to develop an ensemble of climate-change scenarios for each basin. These ensembles were simulated with the corresponding Precipitation Runoff Modeling System model. This fact sheet summarizes the hydrologic effect and sensitivity of the Precipitation Runoff Modeling System simulations to climate change for the Pomperaug River Basin at Southbury, Connecticut.
Watershed scale response to climate change--Starkweather Coulee Basin, North Dakota
Vining, Kevin C.; Hay, Lauren E.; Markstrom, Steven L.
2012-01-01
Fourteen basins for which the Precipitation Runoff Modeling System has been calibrated and evaluated were selected as study sites. Precipitation Runoff Modeling System is a deterministic, distributed parameter watershed model developed to evaluate the effects of various combinations of precipitation, temperature, and land use on streamflow and general basin hydrology. Output from five General Circulation Model simulations and four emission scenarios were used to develop an ensemble of climate-change scenarios for each basin. These ensembles were simulated with the corresponding Precipitation Runoff Modeling System model. This fact sheet summarizes the hydrologic effect and sensitivity of the Precipitation Runoff Modeling System simulations to climate change for the Starkweather Coulee Basin near Webster, North Dakota.
Watershed scale response to climate change--Sagehen Creek Basin, California
Markstrom, Steven L.; Hay, Lauren E.; Regan, R. Steven
2012-01-01
Fourteen basins for which the Precipitation Runoff Modeling System has been calibrated and evaluated were selected as study sites. Precipitation Runoff Modeling System is a deterministic, distributed parameter watershed model developed to evaluate the effects of various combinations of precipitation, temperature, and land use on streamflow and general basin hydrology. Output from five General Circulation Model simulations and four emission scenarios were used to develop an ensemble of climate-change scenarios for each basin. These ensembles were simulated with the corresponding Precipitation Runoff Modeling System model. This fact sheet summarizes the hydrologic effect and sensitivity of the Precipitation Runoff Modeling System simulations to climate change for the Sagehen Creek Basin near Truckee, California.
Watershed scale response to climate change--Sprague River Basin, Oregon
Risley, John; Hay, Lauren E.; Markstrom, Steven L.
2012-01-01
Fourteen basins for which the Precipitation Runoff Modeling System has been calibrated and evaluated were selected as study sites. Precipitation Runoff Modeling System is a deterministic, distributed parameter watershed model developed to evaluate the effects of various combinations of precipitation, temperature, and land use on streamflow and general basin hydrology. Output from five General Circulation Model simulations and four emission scenarios were used to develop an ensemble of climate-change scenarios for each basin. These ensembles were simulated with the corresponding Precipitation Runoff Modeling System model. This fact sheet summarizes the hydrologic effect and sensitivity of the Precipitation Runoff Modeling System simulations to climate change for the Sprague River Basin near Chiloquin, Oregon.
Watershed scale response to climate change--Black Earth Creek Basin, Wisconsin
Hunt, Randall J.; Walker, John F.; Westenbroek, Steven M.; Hay, Lauren E.; Markstrom, Steven L.
2012-01-01
Fourteen basins for which the Precipitation Runoff Modeling System has been calibrated and evaluated were selected as study sites. Precipitation Runoff Modeling System is a deterministic, distributed parameter watershed model developed to evaluate the effects of various combinations of precipitation, temperature, and land use on streamflow and general basin hydrology. Output from five General Circulation Model simulations and four emission scenarios were used to develop an ensemble of climate-change scenarios for each basin. These ensembles were simulated with the corresponding Precipitation Runoff Modeling System model. This fact sheet summarizes the hydrologic effect and sensitivity of the Precipitation Runoff Modeling System simulations to climate change for the Black Earth Creek Basin, Wisconsin.
Watershed scale response to climate change--East River Basin, Colorado
Battaglin, William A.; Hay, Lauren E.; Markstrom, Steven L.
2012-01-01
Fourteen basins for which the Precipitation Runoff Modeling System has been calibrated and evaluated were selected as study sites. Precipitation Runoff Modeling System is a deterministic, distributed parameter watershed model developed to evaluate the effects of various combinations of precipitation, temperature, and land use on streamflow and general basin hydrology. Output from five General Circulation Model simulations and four emission scenarios were used to develop an ensemble of climate-change scenarios for each basin. These ensembles were simulated with the corresponding Precipitation Runoff Modeling System model. This fact sheet summarizes the hydrologic effect and sensitivity of the Precipitation Runoff Modeling System simulations to climate change for the East River Basin, Colorado.
Watershed scale response to climate change--Naches River Basin, Washington
Mastin, Mark C.; Hay, Lauren E.; Markstrom, Steven L.
2012-01-01
Fourteen basins for which the Precipitation Runoff Modeling System has been calibrated and evaluated were selected as study sites. Precipitation Runoff Modeling System is a deterministic, distributed parameter watershed model developed to evaluate the effects of various combinations of precipitation, temperature, and land use on streamflow and general basin hydrology. Output from five General Circulation Model simulations and four emission scenarios were used to develop an ensemble of climate-change scenarios for each basin. These ensembles were simulated with the corresponding Precipitation Runoff Modeling System model. This fact sheet summarizes the hydrologic effect and sensitivity of the Precipitation Runoff Modeling System simulations to climate change for the Naches River Basin below Tieton River in Washington.
Generalized Gibbs ensembles for quantum field theories
NASA Astrophysics Data System (ADS)
Essler, F. H. L.; Mussardo, G.; Panfil, M.
2015-05-01
We consider the nonequilibrium dynamics in quantum field theories (QFTs). After being prepared in a density matrix that is not an eigenstate of the Hamiltonian, such systems are expected to relax locally to a stationary state. In the presence of local conservation laws, these stationary states are believed to be described by appropriate generalized Gibbs ensembles. Here we demonstrate that in order to obtain a correct description of the stationary state, it is necessary to take into account conservation laws that are not (ultra)local in the usual sense of QFTs, but fulfill a significantly weaker form of locality. We discuss the implications of our results for integrable QFTs in one spatial dimension.
Watershed scale response to climate change--Flint River Basin, Georgia
Hay, Lauren E.; Markstrom, Steven L.
2012-01-01
Fourteen basins for which the Precipitation Runoff Modeling System has been calibrated and evaluated were selected as study sites. Precipitation Runoff Modeling System is a deterministic, distributed parameter watershed model developed to evaluate the effects of various combinations of precipitation, temperature, and land use on streamflow and general basin hydrology. Output from five General Circulation Model simulations and four emission scenarios were used to develop an ensemble of climate-change scenarios for each basin. These ensembles were simulated with the corresponding Precipitation Runoff Modeling System model. This fact sheet summarizes the hydrologic effect and sensitivity of the Precipitation Runoff Modeling System simulations to climate change for the Flint River Basin at Montezuma, Georgia.
Scattering by ensembles of small particles experiment, theory and application
NASA Technical Reports Server (NTRS)
Gustafson, B. A. S.
1980-01-01
A hypothetical self consistent picture of evolution of prestellar intertellar dust through a comet phase leads to predictions about the composition of the circum-solar dust cloud. Scattering properties of thus resulting conglomerates with a bird's-nest type of structure are investigated using a micro-wave analogue technique. Approximate theoretical methods of general interest are developed which compared favorably with the experimental results. The principal features of scattering of visible radiation by zodiacal light particles are reasonably reproduced. A component which is suggestive of (ALPHA)-meteoroids is also predicted.
Papanikolaou, Yannis; Tsoumakas, Grigorios; Laliotis, Manos; Markantonatos, Nikos; Vlahavas, Ioannis
2017-09-22
In this paper we present the approach that we employed to deal with large scale multi-label semantic indexing of biomedical papers. This work was mainly implemented within the context of the BioASQ challenge (2013-2017), a challenge concerned with biomedical semantic indexing and question answering. Our main contribution is a MUlti-Label Ensemble method (MULE) that incorporates a McNemar statistical significance test in order to validate the combination of the constituent machine learning algorithms. Some secondary contributions include a study on the temporal aspects of the BioASQ corpus (observations apply also to the BioASQ's super-set, the PubMed articles collection) and the proper parametrization of the algorithms used to deal with this challenging classification task. The ensemble method that we developed is compared to other approaches in experimental scenarios with subsets of the BioASQ corpus giving positive results. In our participation in the BioASQ challenge we obtained the first place in 2013 and the second place in the four following years, steadily outperforming MTI, the indexing system of the National Library of Medicine (NLM). The results of our experimental comparisons, suggest that employing a statistical significance test to validate the ensemble method's choices, is the optimal approach for ensembling multi-label classifiers, especially in contexts with many rare labels.
Inferring properties of disordered chains from FRET transfer efficiencies
NASA Astrophysics Data System (ADS)
Zheng, Wenwei; Zerze, Gül H.; Borgia, Alessandro; Mittal, Jeetain; Schuler, Benjamin; Best, Robert B.
2018-03-01
Förster resonance energy transfer (FRET) is a powerful tool for elucidating both structural and dynamic properties of unfolded or disordered biomolecules, especially in single-molecule experiments. However, the key observables, namely, the mean transfer efficiency and fluorescence lifetimes of the donor and acceptor chromophores, are averaged over a broad distribution of donor-acceptor distances. The inferred average properties of the ensemble therefore depend on the form of the model distribution chosen to describe the distance, as has been widely recognized. In addition, while the distribution for one type of polymer model may be appropriate for a chain under a given set of physico-chemical conditions, it may not be suitable for the same chain in a different environment so that even an apparently consistent application of the same model over all conditions may distort the apparent changes in chain dimensions with variation of temperature or solution composition. Here, we present an alternative and straightforward approach to determining ensemble properties from FRET data, in which the polymer scaling exponent is allowed to vary with solution conditions. In its simplest form, it requires either the mean FRET efficiency or fluorescence lifetime information. In order to test the accuracy of the method, we have utilized both synthetic FRET data from implicit and explicit solvent simulations for 30 different protein sequences, and experimental single-molecule FRET data for an intrinsically disordered and a denatured protein. In all cases, we find that the inferred radii of gyration are within 10% of the true values, thus providing higher accuracy than simpler polymer models. In addition, the scaling exponents obtained by our procedure are in good agreement with those determined directly from the molecular ensemble. Our approach can in principle be generalized to treating other ensemble-averaged functions of intramolecular distances from experimental data.
Identifying pollution sources and predicting urban air quality using ensemble learning methods
NASA Astrophysics Data System (ADS)
Singh, Kunwar P.; Gupta, Shikha; Rai, Premanjali
2013-12-01
In this study, principal components analysis (PCA) was performed to identify air pollution sources and tree based ensemble learning models were constructed to predict the urban air quality of Lucknow (India) using the air quality and meteorological databases pertaining to a period of five years. PCA identified vehicular emissions and fuel combustion as major air pollution sources. The air quality indices revealed the air quality unhealthy during the summer and winter. Ensemble models were constructed to discriminate between the seasonal air qualities, factors responsible for discrimination, and to predict the air quality indices. Accordingly, single decision tree (SDT), decision tree forest (DTF), and decision treeboost (DTB) were constructed and their generalization and predictive performance was evaluated in terms of several statistical parameters and compared with conventional machine learning benchmark, support vector machines (SVM). The DT and SVM models discriminated the seasonal air quality rendering misclassification rate (MR) of 8.32% (SDT); 4.12% (DTF); 5.62% (DTB), and 6.18% (SVM), respectively in complete data. The AQI and CAQI regression models yielded a correlation between measured and predicted values and root mean squared error of 0.901, 6.67 and 0.825, 9.45 (SDT); 0.951, 4.85 and 0.922, 6.56 (DTF); 0.959, 4.38 and 0.929, 6.30 (DTB); 0.890, 7.00 and 0.836, 9.16 (SVR) in complete data. The DTF and DTB models outperformed the SVM both in classification and regression which could be attributed to the incorporation of the bagging and boosting algorithms in these models. The proposed ensemble models successfully predicted the urban ambient air quality and can be used as effective tools for its management.
Assessment of SWE data assimilation for ensemble streamflow predictions
NASA Astrophysics Data System (ADS)
Franz, Kristie J.; Hogue, Terri S.; Barik, Muhammad; He, Minxue
2014-11-01
An assessment of data assimilation (DA) for Ensemble Streamflow Prediction (ESP) using seasonal water supply hindcasting in the North Fork of the American River Basin (NFARB) and the National Weather Service (NWS) hydrologic forecast models is undertaken. Two parameter sets, one from the California Nevada River Forecast Center (RFC) and one from the Differential Evolution Adaptive Metropolis (DREAM) algorithm, are tested. For each parameter set, hindcasts are generated using initial conditions derived with and without the inclusion of a DA scheme that integrates snow water equivalent (SWE) observations. The DREAM-DA scenario uses an Integrated Uncertainty and Ensemble-based data Assimilation (ICEA) framework that also considers model and parameter uncertainty. Hindcasts are evaluated using deterministic and probabilistic forecast verification metrics. In general, the impact of DA on the skill of the seasonal water supply predictions is mixed. For deterministic (ensemble mean) predictions, the Percent Bias (PBias) is improved with integration of the DA. DREAM-DA and the RFC-DA have the lowest biases and the RFC-DA has the lowest Root Mean Squared Error (RMSE). However, the RFC and DREAM-DA have similar RMSE scores. For the probabilistic predictions, the RFC and DREAM have the highest Continuous Ranked Probability Skill Scores (CRPSS) and the RFC has the best discrimination for low flows. Reliability results are similar between the non-DA and DA tests and the DREAM and DREAM-DA have better reliability than the RFC and RFC-DA for forecast dates February 1 and later. Despite producing improved streamflow simulations in previous studies, the hindcast analysis suggests that the DA method tested may not result in obvious improvements in streamflow forecasts. We advocate that integration of hindcasting and probabilistic metrics provides more rigorous insight on model performance for forecasting applications, such as in this study.
The Correlated Jacobi and the Correlated Cauchy-Lorentz Ensembles
NASA Astrophysics Data System (ADS)
Wirtz, Tim; Waltner, Daniel; Kieburg, Mario; Kumar, Santosh
2016-01-01
We calculate the k-point generating function of the correlated Jacobi ensemble using supersymmetric methods. We use the result for complex matrices for k=1 to derive a closed-form expression for the eigenvalue density. For real matrices we obtain the density in terms of a twofold integral that we evaluate numerically. For both expressions we find agreement when comparing with Monte Carlo simulations. Relations between these quantities for the Jacobi and the Cauchy-Lorentz ensemble are derived.
USDA-ARS?s Scientific Manuscript database
Data from modern soil water contents probes can be used for data assimilation in soil water flow modeling, i.e. continual correction of the flow model performance based on observations. The ensemble Kalman filter appears to be an appropriate method for that. The method requires estimates of the unce...
Ensemble density variational methods with self- and ghost-interaction-corrected functionals
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pastorczak, Ewa; Pernal, Katarzyna, E-mail: pernalk@gmail.com
2014-05-14
Ensemble density functional theory (DFT) offers a way of predicting excited-states energies of atomic and molecular systems without referring to a density response function. Despite a significant theoretical work, practical applications of the proposed approximations have been scarce and they do not allow for a fair judgement of the potential usefulness of ensemble DFT with available functionals. In the paper, we investigate two forms of ensemble density functionals formulated within ensemble DFT framework: the Gross, Oliveira, and Kohn (GOK) functional proposed by Gross et al. [Phys. Rev. A 37, 2809 (1988)] alongside the orbital-dependent eDFT form of the functional introducedmore » by Nagy [J. Phys. B 34, 2363 (2001)] (the acronym eDFT proposed in analogy to eHF – ensemble Hartree-Fock method). Local and semi-local ground-state density functionals are employed in both approaches. Approximate ensemble density functionals contain not only spurious self-interaction but also the so-called ghost-interaction which has no counterpart in the ground-state DFT. We propose how to correct the GOK functional for both kinds of interactions in approximations that go beyond the exact-exchange functional. Numerical applications lead to a conclusion that functionals free of the ghost-interaction by construction, i.e., eDFT, yield much more reliable results than approximate self- and ghost-interaction-corrected GOK functional. Additionally, local density functional corrected for self-interaction employed in the eDFT framework yields excitations energies of the accuracy comparable to that of the uncorrected semi-local eDFT functional.« less