Bhatt, Divesh; Zuckerman, Daniel M.
2010-01-01
We performed “weighted ensemble” path–sampling simulations of adenylate kinase, using several semi–atomistic protein models. The models have an all–atom backbone with various levels of residue interactions. The primary result is that full statistically rigorous path sampling required only a few weeks of single–processor computing time with these models, indicating the addition of further chemical detail should be readily feasible. Our semi–atomistic path ensembles are consistent with previous biophysical findings: the presence of two distinct pathways, identification of intermediates, and symmetry of forward and reverse pathways. PMID:21660120
Generalized Ensemble Sampling of Enzyme Reaction Free Energy Pathways
Wu, Dongsheng; Fajer, Mikolai I.; Cao, Liaoran; Cheng, Xiaolin; Yang, Wei
2016-01-01
Free energy path sampling plays an essential role in computational understanding of chemical reactions, particularly those occurring in enzymatic environments. Among a variety of molecular dynamics simulation approaches, the generalized ensemble sampling strategy is uniquely attractive for the fact that it not only can enhance the sampling of rare chemical events but also can naturally ensure consistent exploration of environmental degrees of freedom. In this review, we plan to provide a tutorial-like tour on an emerging topic: generalized ensemble sampling of enzyme reaction free energy path. The discussion is largely focused on our own studies, particularly ones based on the metadynamics free energy sampling method and the on-the-path random walk path sampling method. We hope that this mini presentation will provide interested practitioners some meaningful guidance for future algorithm formulation and application study. PMID:27498634
Zwier, Matthew C.; Adelman, Joshua L.; Kaus, Joseph W.; Pratt, Adam J.; Wong, Kim F.; Rego, Nicholas B.; Suárez, Ernesto; Lettieri, Steven; Wang, David W.; Grabe, Michael; Zuckerman, Daniel M.; Chong, Lillian T.
2015-01-01
The weighted ensemble (WE) path sampling approach orchestrates an ensemble of parallel calculations with intermittent communication to enhance the sampling of rare events, such as molecular associations or conformational changes in proteins or peptides. Trajectories are replicated and pruned in a way that focuses computational effort on under-explored regions of configuration space while maintaining rigorous kinetics. To enable the simulation of rare events at any scale (e.g. atomistic, cellular), we have developed an open-source, interoperable, and highly scalable software package for the execution and analysis of WE simulations: WESTPA (The Weighted Ensemble Simulation Toolkit with Parallelization and Analysis). WESTPA scales to thousands of CPU cores and includes a suite of analysis tools that have been implemented in a massively parallel fashion. The software has been designed to interface conveniently with any dynamics engine and has already been used with a variety of molecular dynamics (e.g. GROMACS, NAMD, OpenMM, AMBER) and cell-modeling packages (e.g. BioNetGen, MCell). WESTPA has been in production use for over a year, and its utility has been demonstrated for a broad set of problems, ranging from atomically detailed host-guest associations to non-spatial chemical kinetics of cellular signaling networks. The following describes the design and features of WESTPA, including the facilities it provides for running WE simulations, storing and analyzing WE simulation data, as well as examples of input and output. PMID:26392815
Annealed importance sampling with constant cooling rate
NASA Astrophysics Data System (ADS)
Giovannelli, Edoardo; Cardini, Gianni; Gellini, Cristina; Pietraperzia, Giangaetano; Chelli, Riccardo
2015-02-01
Annealed importance sampling is a simulation method devised by Neal [Stat. Comput. 11, 125 (2001)] to assign weights to configurations generated by simulated annealing trajectories. In particular, the equilibrium average of a generic physical quantity can be computed by a weighted average exploiting weights and estimates of this quantity associated to the final configurations of the annealed trajectories. Here, we review annealed importance sampling from the perspective of nonequilibrium path-ensemble averages [G. E. Crooks, Phys. Rev. E 61, 2361 (2000)]. The equivalence of Neal's and Crooks' treatments highlights the generality of the method, which goes beyond the mere thermal-based protocols. Furthermore, we show that a temperature schedule based on a constant cooling rate outperforms stepwise cooling schedules and that, for a given elapsed computer time, performances of annealed importance sampling are, in general, improved by increasing the number of intermediate temperatures.
On the Likely Utility of Hybrid Weights Optimized for Variances in Hybrid Error Covariance Models
NASA Astrophysics Data System (ADS)
Satterfield, E.; Hodyss, D.; Kuhl, D.; Bishop, C. H.
2017-12-01
Because of imperfections in ensemble data assimilation schemes, one cannot assume that the ensemble covariance is equal to the true error covariance of a forecast. Previous work demonstrated how information about the distribution of true error variances given an ensemble sample variance can be revealed from an archive of (observation-minus-forecast, ensemble-variance) data pairs. Here, we derive a simple and intuitively compelling formula to obtain the mean of this distribution of true error variances given an ensemble sample variance from (observation-minus-forecast, ensemble-variance) data pairs produced by a single run of a data assimilation system. This formula takes the form of a Hybrid weighted average of the climatological forecast error variance and the ensemble sample variance. Here, we test the extent to which these readily obtainable weights can be used to rapidly optimize the covariance weights used in Hybrid data assimilation systems that employ weighted averages of static covariance models and flow-dependent ensemble based covariance models. Univariate data assimilation and multi-variate cycling ensemble data assimilation are considered. In both cases, it is found that our computationally efficient formula gives Hybrid weights that closely approximate the optimal weights found through the simple but computationally expensive process of testing every plausible combination of weights.
Path planning in uncertain flow fields using ensemble method
NASA Astrophysics Data System (ADS)
Wang, Tong; Le Maître, Olivier P.; Hoteit, Ibrahim; Knio, Omar M.
2016-10-01
An ensemble-based approach is developed to conduct optimal path planning in unsteady ocean currents under uncertainty. We focus our attention on two-dimensional steady and unsteady uncertain flows, and adopt a sampling methodology that is well suited to operational forecasts, where an ensemble of deterministic predictions is used to model and quantify uncertainty. In an operational setting, much about dynamics, topography, and forcing of the ocean environment is uncertain. To address this uncertainty, the flow field is parametrized using a finite number of independent canonical random variables with known densities, and the ensemble is generated by sampling these variables. For each of the resulting realizations of the uncertain current field, we predict the path that minimizes the travel time by solving a boundary value problem (BVP), based on the Pontryagin maximum principle. A family of backward-in-time trajectories starting at the end position is used to generate suitable initial values for the BVP solver. This allows us to examine and analyze the performance of the sampling strategy and to develop insight into extensions dealing with general circulation ocean models. In particular, the ensemble method enables us to perform a statistical analysis of travel times and consequently develop a path planning approach that accounts for these statistics. The proposed methodology is tested for a number of scenarios. We first validate our algorithms by reproducing simple canonical solutions, and then demonstrate our approach in more complex flow fields, including idealized, steady and unsteady double-gyre flows.
Girsanov reweighting for path ensembles and Markov state models
NASA Astrophysics Data System (ADS)
Donati, L.; Hartmann, C.; Keller, B. G.
2017-06-01
The sensitivity of molecular dynamics on changes in the potential energy function plays an important role in understanding the dynamics and function of complex molecules. We present a method to obtain path ensemble averages of a perturbed dynamics from a set of paths generated by a reference dynamics. It is based on the concept of path probability measure and the Girsanov theorem, a result from stochastic analysis to estimate a change of measure of a path ensemble. Since Markov state models (MSMs) of the molecular dynamics can be formulated as a combined phase-space and path ensemble average, the method can be extended to reweight MSMs by combining it with a reweighting of the Boltzmann distribution. We demonstrate how to efficiently implement the Girsanov reweighting in a molecular dynamics simulation program by calculating parts of the reweighting factor "on the fly" during the simulation, and we benchmark the method on test systems ranging from a two-dimensional diffusion process and an artificial many-body system to alanine dipeptide and valine dipeptide in implicit and explicit water. The method can be used to study the sensitivity of molecular dynamics on external perturbations as well as to reweight trajectories generated by enhanced sampling schemes to the original dynamics.
Li, Wenjin
2018-02-28
Transition path ensemble consists of reactive trajectories and possesses all the information necessary for the understanding of the mechanism and dynamics of important condensed phase processes. However, quantitative description of the properties of the transition path ensemble is far from being established. Here, with numerical calculations on a model system, the equipartition terms defined in thermal equilibrium were for the first time estimated in the transition path ensemble. It was not surprising to observe that the energy was not equally distributed among all the coordinates. However, the energies distributed on a pair of conjugated coordinates remained equal. Higher energies were observed to be distributed on several coordinates, which are highly coupled to the reaction coordinate, while the rest were almost equally distributed. In addition, the ensemble-averaged energy on each coordinate as a function of time was also quantified. These quantitative analyses on energy distributions provided new insights into the transition path ensemble.
2017-06-01
11 Table 1 Notation for fabric and ensemble resistances . .......................................... 13 Thermal manikin...Table 1 Notation for fabric and ensemble resistances .................................................. 13 Table 2 Weight reduction of CB garment...samples were tested on a Sweating Guarded Hot Plate (SGHP) to measure fabric thermal and evaporative resistance , respectively. The ensembles were tested
A benchmark for reaction coordinates in the transition path ensemble
2016-01-01
The molecular mechanism of a reaction is embedded in its transition path ensemble, the complete collection of reactive trajectories. Utilizing the information in the transition path ensemble alone, we developed a novel metric, which we termed the emergent potential energy, for distinguishing reaction coordinates from the bath modes. The emergent potential energy can be understood as the average energy cost for making a displacement of a coordinate in the transition path ensemble. Where displacing a bath mode invokes essentially no cost, it costs significantly to move the reaction coordinate. Based on some general assumptions of the behaviors of reaction and bath coordinates in the transition path ensemble, we proved theoretically with statistical mechanics that the emergent potential energy could serve as a benchmark of reaction coordinates and demonstrated its effectiveness by applying it to a prototypical system of biomolecular dynamics. Using the emergent potential energy as guidance, we developed a committor-free and intuition-independent method for identifying reaction coordinates in complex systems. We expect this method to be applicable to a wide range of reaction processes in complex biomolecular systems. PMID:27059559
NASA Astrophysics Data System (ADS)
Annan, James; Hargreaves, Julia
2016-04-01
In order to perform any Bayesian processing of a model ensemble, we need a prior over the ensemble members. In the case of multimodel ensembles such as CMIP, the historical approach of ``model democracy'' (i.e. equal weight for all models in the sample) is no longer credible (if it ever was) due to model duplication and inbreeding. The question of ``model independence'' is central to the question of prior weights. However, although this question has been repeatedly raised, it has not yet been satisfactorily addressed. Here I will discuss the issue of independence and present a theoretical foundation for understanding and analysing the ensemble in this context. I will also present some simple examples showing how these ideas may be applied and developed.
NASA Astrophysics Data System (ADS)
Pribram-Jones, Aurora
Warm dense matter (WDM) is a high energy phase between solids and plasmas, with characteristics of both. It is present in the centers of giant planets, within the earth's core, and on the path to ignition of inertial confinement fusion. The high temperatures and pressures of warm dense matter lead to complications in its simulation, as both classical and quantum effects must be included. One of the most successful simulation methods is density functional theory-molecular dynamics (DFT-MD). Despite great success in a diverse array of applications, DFT-MD remains computationally expensive and it neglects the explicit temperature dependence of electron-electron interactions known to exist within exact DFT. Finite-temperature density functional theory (FT DFT) is an extension of the wildly successful ground-state DFT formalism via thermal ensembles, broadening its quantum mechanical treatment of electrons to include systems at non-zero temperatures. Exact mathematical conditions have been used to predict the behavior of approximations in limiting conditions and to connect FT DFT to the ground-state theory. An introduction to FT DFT is given within the context of ensemble DFT and the larger field of DFT is discussed for context. Ensemble DFT is used to describe ensembles of ground-state and excited systems. Exact conditions in ensemble DFT and the performance of approximations depend on ensemble weights. Using an inversion method, exact Kohn-Sham ensemble potentials are found and compared to approximations. The symmetry eigenstate Hartree-exchange approximation is in good agreement with exact calculations because of its inclusion of an ensemble derivative discontinuity. Since ensemble weights in FT DFT are temperature-dependent Fermi weights, this insight may help develop approximations well-suited to both ground-state and FT DFT. A novel, highly efficient approach to free energy calculations, finite-temperature potential functional theory, is derived, which has the potential to transform the simulation of warm dense matter. As a semiclassical method, it connects the normally disparate regimes of cold condensed matter physics and hot plasma physics. This orbital-free approach captures the smooth classical density envelope and quantum density oscillations that are both crucial to accurate modeling of materials where temperature and pressure effects are influential.
Sampling the kinetic pathways of a micelle fusion and fission transition.
Pool, René; Bolhuis, Peter G
2007-06-28
The mechanism and kinetics of micellar breakup and fusion in a dilute solution of a model surfactant are investigated by path sampling techniques. Analysis of the path ensemble gives insight in the mechanism of the transition. For larger, less stable micelles the fission/fusion occurs via a clear neck formation, while for smaller micelles the mechanism is more direct. In addition, path analysis yields an appropriate order parameter to evaluate the fusion and fission rate constants using stochastic transition interface sampling. For the small, stable micelle (50 surfactants) the computed fission rate constant is a factor of 10 lower than the fusion rate constant. The procedure opens the way for accurate calculation of free energy and kinetics for, e.g., membrane fusion, and wormlike micelle endcap formation.
Donovan, Rory M.; Tapia, Jose-Juan; Sullivan, Devin P.; Faeder, James R.; Murphy, Robert F.; Dittrich, Markus; Zuckerman, Daniel M.
2016-01-01
The long-term goal of connecting scales in biological simulation can be facilitated by scale-agnostic methods. We demonstrate that the weighted ensemble (WE) strategy, initially developed for molecular simulations, applies effectively to spatially resolved cell-scale simulations. The WE approach runs an ensemble of parallel trajectories with assigned weights and uses a statistical resampling strategy of replicating and pruning trajectories to focus computational effort on difficult-to-sample regions. The method can also generate unbiased estimates of non-equilibrium and equilibrium observables, sometimes with significantly less aggregate computing time than would be possible using standard parallelization. Here, we use WE to orchestrate particle-based kinetic Monte Carlo simulations, which include spatial geometry (e.g., of organelles, plasma membrane) and biochemical interactions among mobile molecular species. We study a series of models exhibiting spatial, temporal and biochemical complexity and show that although WE has important limitations, it can achieve performance significantly exceeding standard parallel simulation—by orders of magnitude for some observables. PMID:26845334
NASA Astrophysics Data System (ADS)
Qiao, Qin; Zhang, Hou-Dao; Huang, Xuhui
2016-04-01
Simulated tempering (ST) is a widely used enhancing sampling method for Molecular Dynamics simulations. As one expanded ensemble method, ST is a combination of canonical ensembles at different temperatures and the acceptance probability of cross-temperature transitions is determined by both the temperature difference and the weights of each temperature. One popular way to obtain the weights is to adopt the free energy of each canonical ensemble, which achieves uniform sampling among temperature space. However, this uniform distribution in temperature space may not be optimal since high temperatures do not always speed up the conformational transitions of interest, as anti-Arrhenius kinetics are prevalent in protein and RNA folding. Here, we propose a new method: Enhancing Pairwise State-transition Weights (EPSW), to obtain the optimal weights by minimizing the round-trip time for transitions among different metastable states at the temperature of interest in ST. The novelty of the EPSW algorithm lies in explicitly considering the kinetics of conformation transitions when optimizing the weights of different temperatures. We further demonstrate the power of EPSW in three different systems: a simple two-temperature model, a two-dimensional model for protein folding with anti-Arrhenius kinetics, and the alanine dipeptide. The results from these three systems showed that the new algorithm can substantially accelerate the transitions between conformational states of interest in the ST expanded ensemble and further facilitate the convergence of thermodynamics compared to the widely used free energy weights. We anticipate that this algorithm is particularly useful for studying functional conformational changes of biological systems where the initial and final states are often known from structural biology experiments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Qiao, Qin, E-mail: qqiao@ust.hk; Zhang, Hou-Dao; Huang, Xuhui, E-mail: xuhuihuang@ust.hk
2016-04-21
Simulated tempering (ST) is a widely used enhancing sampling method for Molecular Dynamics simulations. As one expanded ensemble method, ST is a combination of canonical ensembles at different temperatures and the acceptance probability of cross-temperature transitions is determined by both the temperature difference and the weights of each temperature. One popular way to obtain the weights is to adopt the free energy of each canonical ensemble, which achieves uniform sampling among temperature space. However, this uniform distribution in temperature space may not be optimal since high temperatures do not always speed up the conformational transitions of interest, as anti-Arrhenius kineticsmore » are prevalent in protein and RNA folding. Here, we propose a new method: Enhancing Pairwise State-transition Weights (EPSW), to obtain the optimal weights by minimizing the round-trip time for transitions among different metastable states at the temperature of interest in ST. The novelty of the EPSW algorithm lies in explicitly considering the kinetics of conformation transitions when optimizing the weights of different temperatures. We further demonstrate the power of EPSW in three different systems: a simple two-temperature model, a two-dimensional model for protein folding with anti-Arrhenius kinetics, and the alanine dipeptide. The results from these three systems showed that the new algorithm can substantially accelerate the transitions between conformational states of interest in the ST expanded ensemble and further facilitate the convergence of thermodynamics compared to the widely used free energy weights. We anticipate that this algorithm is particularly useful for studying functional conformational changes of biological systems where the initial and final states are often known from structural biology experiments.« less
NASA Astrophysics Data System (ADS)
Oh, Seok-Geun; Suh, Myoung-Seok
2017-07-01
The projection skills of five ensemble methods were analyzed according to simulation skills, training period, and ensemble members, using 198 sets of pseudo-simulation data (PSD) produced by random number generation assuming the simulated temperature of regional climate models. The PSD sets were classified into 18 categories according to the relative magnitude of bias, variance ratio, and correlation coefficient, where each category had 11 sets (including 1 truth set) with 50 samples. The ensemble methods used were as follows: equal weighted averaging without bias correction (EWA_NBC), EWA with bias correction (EWA_WBC), weighted ensemble averaging based on root mean square errors and correlation (WEA_RAC), WEA based on the Taylor score (WEA_Tay), and multivariate linear regression (Mul_Reg). The projection skills of the ensemble methods improved generally as compared with the best member for each category. However, their projection skills are significantly affected by the simulation skills of the ensemble member. The weighted ensemble methods showed better projection skills than non-weighted methods, in particular, for the PSD categories having systematic biases and various correlation coefficients. The EWA_NBC showed considerably lower projection skills than the other methods, in particular, for the PSD categories with systematic biases. Although Mul_Reg showed relatively good skills, it showed strong sensitivity to the PSD categories, training periods, and number of members. On the other hand, the WEA_Tay and WEA_RAC showed relatively superior skills in both the accuracy and reliability for all the sensitivity experiments. This indicates that WEA_Tay and WEA_RAC are applicable even for simulation data with systematic biases, a short training period, and a small number of ensemble members.
NASA Astrophysics Data System (ADS)
Hoteit, I.; Hollt, T.; Hadwiger, M.; Knio, O. M.; Gopalakrishnan, G.; Zhan, P.
2016-02-01
Ocean reanalyses and forecasts are nowadays generated by combining ensemble simulations with data assimilation techniques. Most of these techniques resample the ensemble members after each assimilation cycle. Tracking behavior over time, such as all possible paths of a particle in an ensemble vector field, becomes very difficult, as the number of combinations rises exponentially with the number of assimilation cycles. In general a single possible path is not of interest but only the probabilities that any point in space might be reached by a particle at some point in time. We present an approach using probability-weighted piecewise particle trajectories to allow for interactive probability mapping. This is achieved by binning the domain and splitting up the tracing process into the individual assimilation cycles, so that particles that fall into the same bin after a cycle can be treated as a single particle with a larger probability as input for the next cycle. As a result we loose the possibility to track individual particles, but can create probability maps for any desired seed at interactive rates. The technique is integrated in an interactive visualization system that enables the visual analysis of the particle traces side by side with other forecast variables, such as the sea surface height, and their corresponding behavior over time. By harnessing the power of modern graphics processing units (GPUs) for visualization as well as computation, our system allows the user to browse through the simulation ensembles in real-time, view specific parameter settings or simulation models and move between different spatial or temporal regions without delay. In addition our system provides advanced visualizations to highlight the uncertainty, or show the complete distribution of the simulations at user-defined positions over the complete time series of the domain.
Reactive trajectories of the Ru2+/3+ self-exchange reaction and the connection to Marcus' theory.
Tiwari, Ambuj; Ensing, Bernd
2016-12-22
Outer sphere electron transfer between two ions in aqueous solution is a rare event on the time scale of first principles molecular dynamics simulations. We have used transition path sampling to generate an ensemble of reactive trajectories of the self-exchange reaction between a pair of Ru 2+ and Ru 3+ ions in water. To distinguish between the reactant and product states, we use as an order parameter the position of the maximally localised Wannier center associated with the transferring electron. This allows us to align the trajectories with respect to the moment of barrier crossing and compute statistical averages over the path ensemble. We compare our order parameter with two typical reaction coordinates used in applications of Marcus theory of electron transfer: the vertical gap energy and the solvent electrostatic potential at the ions.
Lessons from Climate Modeling on the Design and Use of Ensembles for Crop Modeling
NASA Technical Reports Server (NTRS)
Wallach, Daniel; Mearns, Linda O.; Ruane, Alexander C.; Roetter, Reimund P.; Asseng, Senthold
2016-01-01
Working with ensembles of crop models is a recent but important development in crop modeling which promises to lead to better uncertainty estimates for model projections and predictions, better predictions using the ensemble mean or median, and closer collaboration within the modeling community. There are numerous open questions about the best way to create and analyze such ensembles. Much can be learned from the field of climate modeling, given its much longer experience with ensembles. We draw on that experience to identify questions and make propositions that should help make ensemble modeling with crop models more rigorous and informative. The propositions include defining criteria for acceptance of models in a crop MME, exploring criteria for evaluating the degree of relatedness of models in a MME, studying the effect of number of models in the ensemble, development of a statistical model of model sampling, creation of a repository for MME results, studies of possible differential weighting of models in an ensemble, creation of single model ensembles based on sampling from the uncertainty distribution of parameter values or inputs specifically oriented toward uncertainty estimation, the creation of super ensembles that sample more than one source of uncertainty, the analysis of super ensemble results to obtain information on total uncertainty and the separate contributions of different sources of uncertainty and finally further investigation of the use of the multi-model mean or median as a predictor.
Fluctuating observation time ensembles in the thermodynamics of trajectories
NASA Astrophysics Data System (ADS)
Budini, Adrián A.; Turner, Robert M.; Garrahan, Juan P.
2014-03-01
The dynamics of stochastic systems, both classical and quantum, can be studied by analysing the statistical properties of dynamical trajectories. The properties of ensembles of such trajectories for long, but fixed, times are described by large-deviation (LD) rate functions. These LD functions play the role of dynamical free energies: they are cumulant generating functions for time-integrated observables, and their analytic structure encodes dynamical phase behaviour. This ‘thermodynamics of trajectories’ approach is to trajectories and dynamics what the equilibrium ensemble method of statistical mechanics is to configurations and statics. Here we show that, just like in the static case, there are a variety of alternative ensembles of trajectories, each defined by their global constraints, with that of trajectories of fixed total time being just one of these. We show how the LD functions that describe an ensemble of trajectories where some time-extensive quantity is constant (and large) but where total observation time fluctuates can be mapped to those of the fixed-time ensemble. We discuss how the correspondence between generalized ensembles can be exploited in path sampling schemes for generating rare dynamical trajectories.
Statistical Analysis of the First Passage Path Ensemble of Jump Processes
NASA Astrophysics Data System (ADS)
von Kleist, Max; Schütte, Christof; Zhang, Wei
2018-02-01
The transition mechanism of jump processes between two different subsets in state space reveals important dynamical information of the processes and therefore has attracted considerable attention in the past years. In this paper, we study the first passage path ensemble of both discrete-time and continuous-time jump processes on a finite state space. The main approach is to divide each first passage path into nonreactive and reactive segments and to study them separately. The analysis can be applied to jump processes which are non-ergodic, as well as continuous-time jump processes where the waiting time distributions are non-exponential. In the particular case that the jump processes are both Markovian and ergodic, our analysis elucidates the relations between the study of the first passage paths and the study of the transition paths in transition path theory. We provide algorithms to numerically compute statistics of the first passage path ensemble. The computational complexity of these algorithms scales with the complexity of solving a linear system, for which efficient methods are available. Several examples demonstrate the wide applicability of the derived results across research areas.
Free energy landscape from path-sampling: application to the structural transition in LJ38
NASA Astrophysics Data System (ADS)
Adjanor, G.; Athènes, M.; Calvo, F.
2006-09-01
We introduce a path-sampling scheme that allows equilibrium state-ensemble averages to be computed by means of a biased distribution of non-equilibrium paths. This non-equilibrium method is applied to the case of the 38-atom Lennard-Jones atomic cluster, which has a double-funnel energy landscape. We calculate the free energy profile along the Q4 bond orientational order parameter. At high or moderate temperature the results obtained using the non-equilibrium approach are consistent with those obtained using conventional equilibrium methods, including parallel tempering and Wang-Landau Monte Carlo simulations. At lower temperatures, the non-equilibrium approach becomes more efficient in exploring the relevant inherent structures. In particular, the free energy agrees with the predictions of the harmonic superposition approximation.
Chen, Zhiru; Hong, Wenxue
2016-02-01
Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier.
An Optimization Principle for Deriving Nonequilibrium Statistical Models of Hamiltonian Dynamics
NASA Astrophysics Data System (ADS)
Turkington, Bruce
2013-08-01
A general method for deriving closed reduced models of Hamiltonian dynamical systems is developed using techniques from optimization and statistical estimation. Given a vector of resolved variables, selected to describe the macroscopic state of the system, a family of quasi-equilibrium probability densities on phase space corresponding to the resolved variables is employed as a statistical model, and the evolution of the mean resolved vector is estimated by optimizing over paths of these densities. Specifically, a cost function is constructed to quantify the lack-of-fit to the microscopic dynamics of any feasible path of densities from the statistical model; it is an ensemble-averaged, weighted, squared-norm of the residual that results from submitting the path of densities to the Liouville equation. The path that minimizes the time integral of the cost function determines the best-fit evolution of the mean resolved vector. The closed reduced equations satisfied by the optimal path are derived by Hamilton-Jacobi theory. When expressed in terms of the macroscopic variables, these equations have the generic structure of governing equations for nonequilibrium thermodynamics. In particular, the value function for the optimization principle coincides with the dissipation potential that defines the relation between thermodynamic forces and fluxes. The adjustable closure parameters in the best-fit reduced equations depend explicitly on the arbitrary weights that enter into the lack-of-fit cost function. Two particular model reductions are outlined to illustrate the general method. In each example the set of weights in the optimization principle contracts into a single effective closure parameter.
The Weighted-Average Lagged Ensemble.
DelSole, T; Trenary, L; Tippett, M K
2017-11-01
A lagged ensemble is an ensemble of forecasts from the same model initialized at different times but verifying at the same time. The skill of a lagged ensemble mean can be improved by assigning weights to different forecasts in such a way as to maximize skill. If the forecasts are bias corrected, then an unbiased weighted lagged ensemble requires the weights to sum to one. Such a scheme is called a weighted-average lagged ensemble. In the limit of uncorrelated errors, the optimal weights are positive and decay monotonically with lead time, so that the least skillful forecasts have the least weight. In more realistic applications, the optimal weights do not always behave this way. This paper presents a series of analytic examples designed to illuminate conditions under which the weights of an optimal weighted-average lagged ensemble become negative or depend nonmonotonically on lead time. It is shown that negative weights are most likely to occur when the errors grow rapidly and are highly correlated across lead time. The weights are most likely to behave nonmonotonically when the mean square error is approximately constant over the range forecasts included in the lagged ensemble. An extreme example of the latter behavior is presented in which the optimal weights vanish everywhere except at the shortest and longest lead times.
Calculating ensemble averaged descriptions of protein rigidity without sampling.
González, Luis C; Wang, Hui; Livesay, Dennis R; Jacobs, Donald J
2012-01-01
Previous works have demonstrated that protein rigidity is related to thermodynamic stability, especially under conditions that favor formation of native structure. Mechanical network rigidity properties of a single conformation are efficiently calculated using the integer body-bar Pebble Game (PG) algorithm. However, thermodynamic properties require averaging over many samples from the ensemble of accessible conformations to accurately account for fluctuations in network topology. We have developed a mean field Virtual Pebble Game (VPG) that represents the ensemble of networks by a single effective network. That is, all possible number of distance constraints (or bars) that can form between a pair of rigid bodies is replaced by the average number. The resulting effective network is viewed as having weighted edges, where the weight of an edge quantifies its capacity to absorb degrees of freedom. The VPG is interpreted as a flow problem on this effective network, which eliminates the need to sample. Across a nonredundant dataset of 272 protein structures, we apply the VPG to proteins for the first time. Our results show numerically and visually that the rigidity characterizations of the VPG accurately reflect the ensemble averaged [Formula: see text] properties. This result positions the VPG as an efficient alternative to understand the mechanical role that chemical interactions play in maintaining protein stability.
Graph transformation method for calculating waiting times in Markov chains.
Trygubenko, Semen A; Wales, David J
2006-06-21
We describe an exact approach for calculating transition probabilities and waiting times in finite-state discrete-time Markov processes. All the states and the rules for transitions between them must be known in advance. We can then calculate averages over a given ensemble of paths for both additive and multiplicative properties in a nonstochastic and noniterative fashion. In particular, we can calculate the mean first-passage time between arbitrary groups of stationary points for discrete path sampling databases, and hence extract phenomenological rate constants. We present a number of examples to demonstrate the efficiency and robustness of this approach.
An information-theoretical perspective on weighted ensemble forecasts
NASA Astrophysics Data System (ADS)
Weijs, Steven V.; van de Giesen, Nick
2013-08-01
This paper presents an information-theoretical method for weighting ensemble forecasts with new information. Weighted ensemble forecasts can be used to adjust the distribution that an existing ensemble of time series represents, without modifying the values in the ensemble itself. The weighting can, for example, add new seasonal forecast information in an existing ensemble of historically measured time series that represents climatic uncertainty. A recent article in this journal compared several methods to determine the weights for the ensemble members and introduced the pdf-ratio method. In this article, a new method, the minimum relative entropy update (MRE-update), is presented. Based on the principle of minimum discrimination information, an extension of the principle of maximum entropy (POME), the method ensures that no more information is added to the ensemble than is present in the forecast. This is achieved by minimizing relative entropy, with the forecast information imposed as constraints. From this same perspective, an information-theoretical view on the various weighting methods is presented. The MRE-update is compared with the existing methods and the parallels with the pdf-ratio method are analysed. The paper provides a new, information-theoretical justification for one version of the pdf-ratio method that turns out to be equivalent to the MRE-update. All other methods result in sets of ensemble weights that, seen from the information-theoretical perspective, add either too little or too much (i.e. fictitious) information to the ensemble.
Saglam, Ali S; Chong, Lillian T
2016-01-14
An essential baseline for determining the extent to which electrostatic interactions enhance the kinetics of protein-protein association is the "basal" kon, which is the rate constant for association in the absence of electrostatic interactions. However, since such association events are beyond the milliseconds time scale, it has not been practical to compute the basal kon by directly simulating the association with flexible models. Here, we computed the basal kon for barnase and barstar, two of the most rapidly associating proteins, using highly efficient, flexible molecular simulations. These simulations involved (a) pseudoatomic protein models that reproduce the molecular shapes, electrostatic, and diffusion properties of all-atom models, and (b) application of the weighted ensemble path sampling strategy, which enhanced the efficiency of generating association events by >130-fold. We also examined the extent to which the computed basal kon is affected by inclusion of intermolecular hydrodynamic interactions in the simulations.
Viney, N.R.; Bormann, H.; Breuer, L.; Bronstert, A.; Croke, B.F.W.; Frede, H.; Graff, T.; Hubrechts, L.; Huisman, J.A.; Jakeman, A.J.; Kite, G.W.; Lanini, J.; Leavesley, G.; Lettenmaier, D.P.; Lindstrom, G.; Seibert, J.; Sivapalan, M.; Willems, P.
2009-01-01
This paper reports on a project to compare predictions from a range of catchment models applied to a mesoscale river basin in central Germany and to assess various ensemble predictions of catchment streamflow. The models encompass a large range in inherent complexity and input requirements. In approximate order of decreasing complexity, they are DHSVM, MIKE-SHE, TOPLATS, WASIM-ETH, SWAT, PRMS, SLURP, HBV, LASCAM and IHACRES. The models are calibrated twice using different sets of input data. The two predictions from each model are then combined by simple averaging to produce a single-model ensemble. The 10 resulting single-model ensembles are combined in various ways to produce multi-model ensemble predictions. Both the single-model ensembles and the multi-model ensembles are shown to give predictions that are generally superior to those of their respective constituent models, both during a 7-year calibration period and a 9-year validation period. This occurs despite a considerable disparity in performance of the individual models. Even the weakest of models is shown to contribute useful information to the ensembles they are part of. The best model combination methods are a trimmed mean (constructed using the central four or six predictions each day) and a weighted mean ensemble (with weights calculated from calibration performance) that places relatively large weights on the better performing models. Conditional ensembles, in which separate model weights are used in different system states (e.g. summer and winter, high and low flows) generally yield little improvement over the weighted mean ensemble. However a conditional ensemble that discriminates between rising and receding flows shows moderate improvement. An analysis of ensemble predictions shows that the best ensembles are not necessarily those containing the best individual models. Conversely, it appears that some models that predict well individually do not necessarily combine well with other models in multi-model ensembles. The reasons behind these observations may relate to the effects of the weighting schemes, non-stationarity of the climate series and possible cross-correlations between models. Crown Copyright ?? 2008.
Ensemble Weight Enumerators for Protograph LDPC Codes
NASA Technical Reports Server (NTRS)
Divsalar, Dariush
2006-01-01
Recently LDPC codes with projected graph, or protograph structures have been proposed. In this paper, finite length ensemble weight enumerators for LDPC codes with protograph structures are obtained. Asymptotic results are derived as the block size goes to infinity. In particular we are interested in obtaining ensemble average weight enumerators for protograph LDPC codes which have minimum distance that grows linearly with block size. As with irregular ensembles, linear minimum distance property is sensitive to the proportion of degree-2 variable nodes. In this paper the derived results on ensemble weight enumerators show that linear minimum distance condition on degree distribution of unstructured irregular LDPC codes is a sufficient but not a necessary condition for protograph LDPC codes.
Cendagorta, Joseph R; Bačić, Zlatko; Tuckerman, Mark E
2018-03-14
We introduce a scheme for approximating quantum time correlation functions numerically within the Feynman path integral formulation. Starting with the symmetrized version of the correlation function expressed as a discretized path integral, we introduce a change of integration variables often used in the derivation of trajectory-based semiclassical methods. In particular, we transform to sum and difference variables between forward and backward complex-time propagation paths. Once the transformation is performed, the potential energy is expanded in powers of the difference variables, which allows us to perform the integrals over these variables analytically. The manner in which this procedure is carried out results in an open-chain path integral (in the remaining sum variables) with a modified potential that is evaluated using imaginary-time path-integral sampling rather than requiring the generation of a large ensemble of trajectories. Consequently, any number of path integral sampling schemes can be employed to compute the remaining path integral, including Monte Carlo, path-integral molecular dynamics, or enhanced path-integral molecular dynamics. We believe that this approach constitutes a different perspective in semiclassical-type approximations to quantum time correlation functions. Importantly, we argue that our approximation can be systematically improved within a cumulant expansion formalism. We test this approximation on a set of one-dimensional problems that are commonly used to benchmark approximate quantum dynamical schemes. We show that the method is at least as accurate as the popular ring-polymer molecular dynamics technique and linearized semiclassical initial value representation for correlation functions of linear operators in most of these examples and improves the accuracy of correlation functions of nonlinear operators.
NASA Astrophysics Data System (ADS)
Cendagorta, Joseph R.; Bačić, Zlatko; Tuckerman, Mark E.
2018-03-01
We introduce a scheme for approximating quantum time correlation functions numerically within the Feynman path integral formulation. Starting with the symmetrized version of the correlation function expressed as a discretized path integral, we introduce a change of integration variables often used in the derivation of trajectory-based semiclassical methods. In particular, we transform to sum and difference variables between forward and backward complex-time propagation paths. Once the transformation is performed, the potential energy is expanded in powers of the difference variables, which allows us to perform the integrals over these variables analytically. The manner in which this procedure is carried out results in an open-chain path integral (in the remaining sum variables) with a modified potential that is evaluated using imaginary-time path-integral sampling rather than requiring the generation of a large ensemble of trajectories. Consequently, any number of path integral sampling schemes can be employed to compute the remaining path integral, including Monte Carlo, path-integral molecular dynamics, or enhanced path-integral molecular dynamics. We believe that this approach constitutes a different perspective in semiclassical-type approximations to quantum time correlation functions. Importantly, we argue that our approximation can be systematically improved within a cumulant expansion formalism. We test this approximation on a set of one-dimensional problems that are commonly used to benchmark approximate quantum dynamical schemes. We show that the method is at least as accurate as the popular ring-polymer molecular dynamics technique and linearized semiclassical initial value representation for correlation functions of linear operators in most of these examples and improves the accuracy of correlation functions of nonlinear operators.
Accurate determination of imaging modality using an ensemble of text- and image-based classifiers.
Kahn, Charles E; Kalpathy-Cramer, Jayashree; Lam, Cesar A; Eldredge, Christina E
2012-02-01
Imaging modality can aid retrieval of medical images for clinical practice, research, and education. We evaluated whether an ensemble classifier could outperform its constituent individual classifiers in determining the modality of figures from radiology journals. Seventeen automated classifiers analyzed 77,495 images from two radiology journals. Each classifier assigned one of eight imaging modalities--computed tomography, graphic, magnetic resonance imaging, nuclear medicine, positron emission tomography, photograph, ultrasound, or radiograph-to each image based on visual and/or textual information. Three physicians determined the modality of 5,000 randomly selected images as a reference standard. A "Simple Vote" ensemble classifier assigned each image to the modality that received the greatest number of individual classifiers' votes. A "Weighted Vote" classifier weighted each individual classifier's vote based on performance over a training set. For each image, this classifier's output was the imaging modality that received the greatest weighted vote score. We measured precision, recall, and F score (the harmonic mean of precision and recall) for each classifier. Individual classifiers' F scores ranged from 0.184 to 0.892. The simple vote and weighted vote classifiers correctly assigned 4,565 images (F score, 0.913; 95% confidence interval, 0.905-0.921) and 4,672 images (F score, 0.934; 95% confidence interval, 0.927-0.941), respectively. The weighted vote classifier performed significantly better than all individual classifiers. An ensemble classifier correctly determined the imaging modality of 93% of figures in our sample. The imaging modality of figures published in radiology journals can be determined with high accuracy, which will improve systems for image retrieval.
NASA Astrophysics Data System (ADS)
Orellana, Laura; Yoluk, Ozge; Carrillo, Oliver; Orozco, Modesto; Lindahl, Erik
2016-08-01
Protein conformational changes are at the heart of cell functions, from signalling to ion transport. However, the transient nature of the intermediates along transition pathways hampers their experimental detection, making the underlying mechanisms elusive. Here we retrieve dynamic information on the actual transition routes from principal component analysis (PCA) of structurally-rich ensembles and, in combination with coarse-grained simulations, explore the conformational landscapes of five well-studied proteins. Modelling them as elastic networks in a hybrid elastic-network Brownian dynamics simulation (eBDIMS), we generate trajectories connecting stable end-states that spontaneously sample the crystallographic motions, predicting the structures of known intermediates along the paths. We also show that the explored non-linear routes can delimit the lowest energy passages between end-states sampled by atomistic molecular dynamics. The integrative methodology presented here provides a powerful framework to extract and expand dynamic pathway information from the Protein Data Bank, as well as to validate sampling methods in general.
NASA Astrophysics Data System (ADS)
Fernández, J.; Primo, C.; Cofiño, A. S.; Gutiérrez, J. M.; Rodríguez, M. A.
2009-08-01
In a recent paper, Gutiérrez et al. (Nonlinear Process Geophys 15(1):109-114, 2008) introduced a new characterization of spatiotemporal error growth—the so called mean-variance logarithmic (MVL) diagram—and applied it to study ensemble prediction systems (EPS); in particular, they analyzed single-model ensembles obtained by perturbing the initial conditions. In the present work, the MVL diagram is applied to multi-model ensembles analyzing also the effect of model formulation differences. To this aim, the MVL diagram is systematically applied to the multi-model ensemble produced in the EU-funded DEMETER project. It is shown that the shared building blocks (atmospheric and ocean components) impose similar dynamics among different models and, thus, contribute to poorly sampling the model formulation uncertainty. This dynamical similarity should be taken into account, at least as a pre-screening process, before applying any objective weighting method.
On Certain Wronskians of Multiple Orthogonal Polynomials
NASA Astrophysics Data System (ADS)
Zhang, Lun; Filipuk, Galina
2014-11-01
We consider determinants of Wronskian type whose entries are multiple orthogonal polynomials associated with a path connecting two multi-indices. By assuming that the weight functions form an algebraic Chebyshev (AT) system, we show that the polynomials represented by the Wronskians keep a constant sign in some cases, while in some other cases oscillatory behavior appears, which generalizes classical results for orthogonal polynomials due to Karlin and Szegő. There are two applications of our results. The first application arises from the observation that the m-th moment of the average characteristic polynomials for multiple orthogonal polynomial ensembles can be expressed as a Wronskian of the type II multiple orthogonal polynomials. Hence, it is straightforward to obtain the distinct behavior of the moments for odd and even m in a special multiple orthogonal ensemble - the AT ensemble. As the second application, we derive some Turán type inequalities for m! ultiple Hermite and multiple Laguerre polynomials (of two kinds). Finally, we study numerically the geometric configuration of zeros for the Wronskians of these multiple orthogonal polynomials. We observe that the zeros have regular configurations in the complex plane, which might be of independent interest.
Multi-model ensemble hydrologic prediction using Bayesian model averaging
NASA Astrophysics Data System (ADS)
Duan, Qingyun; Ajami, Newsha K.; Gao, Xiaogang; Sorooshian, Soroosh
2007-05-01
Multi-model ensemble strategy is a means to exploit the diversity of skillful predictions from different models. This paper studies the use of Bayesian model averaging (BMA) scheme to develop more skillful and reliable probabilistic hydrologic predictions from multiple competing predictions made by several hydrologic models. BMA is a statistical procedure that infers consensus predictions by weighing individual predictions based on their probabilistic likelihood measures, with the better performing predictions receiving higher weights than the worse performing ones. Furthermore, BMA provides a more reliable description of the total predictive uncertainty than the original ensemble, leading to a sharper and better calibrated probability density function (PDF) for the probabilistic predictions. In this study, a nine-member ensemble of hydrologic predictions was used to test and evaluate the BMA scheme. This ensemble was generated by calibrating three different hydrologic models using three distinct objective functions. These objective functions were chosen in a way that forces the models to capture certain aspects of the hydrograph well (e.g., peaks, mid-flows and low flows). Two sets of numerical experiments were carried out on three test basins in the US to explore the best way of using the BMA scheme. In the first set, a single set of BMA weights was computed to obtain BMA predictions, while the second set employed multiple sets of weights, with distinct sets corresponding to different flow intervals. In both sets, the streamflow values were transformed using Box-Cox transformation to ensure that the probability distribution of the prediction errors is approximately Gaussian. A split sample approach was used to obtain and validate the BMA predictions. The test results showed that BMA scheme has the advantage of generating more skillful and equally reliable probabilistic predictions than original ensemble. The performance of the expected BMA predictions in terms of daily root mean square error (DRMS) and daily absolute mean error (DABS) is generally superior to that of the best individual predictions. Furthermore, the BMA predictions employing multiple sets of weights are generally better than those using single set of weights.
NASA Technical Reports Server (NTRS)
MIittman, David S
2011-01-01
Ensemble is an open architecture for the development, integration, and deployment of mission operations software. Fundamentally, it is an adaptation of the Eclipse Rich Client Platform (RCP), a widespread, stable, and supported framework for component-based application development. By capitalizing on the maturity and availability of the Eclipse RCP, Ensemble offers a low-risk, politically neutral path towards a tighter integration of operations tools. The Ensemble project is a highly successful, ongoing collaboration among NASA Centers. Since 2004, the Ensemble project has supported the development of mission operations software for NASA's Exploration Systems, Science, and Space Operations Directorates.
Multi-objective optimization for generating a weighted multi-model ensemble
NASA Astrophysics Data System (ADS)
Lee, H.
2017-12-01
Many studies have demonstrated that multi-model ensembles generally show better skill than each ensemble member. When generating weighted multi-model ensembles, the first step is measuring the performance of individual model simulations using observations. There is a consensus on the assignment of weighting factors based on a single evaluation metric. When considering only one evaluation metric, the weighting factor for each model is proportional to a performance score or inversely proportional to an error for the model. While this conventional approach can provide appropriate combinations of multiple models, the approach confronts a big challenge when there are multiple metrics under consideration. When considering multiple evaluation metrics, it is obvious that a simple averaging of multiple performance scores or model ranks does not address the trade-off problem between conflicting metrics. So far, there seems to be no best method to generate weighted multi-model ensembles based on multiple performance metrics. The current study applies the multi-objective optimization, a mathematical process that provides a set of optimal trade-off solutions based on a range of evaluation metrics, to combining multiple performance metrics for the global climate models and their dynamically downscaled regional climate simulations over North America and generating a weighted multi-model ensemble. NASA satellite data and the Regional Climate Model Evaluation System (RCMES) software toolkit are used for assessment of the climate simulations. Overall, the performance of each model differs markedly with strong seasonal dependence. Because of the considerable variability across the climate simulations, it is important to evaluate models systematically and make future projections by assigning optimized weighting factors to the models with relatively good performance. Our results indicate that the optimally weighted multi-model ensemble always shows better performance than an arithmetic ensemble mean and may provide reliable future projections.
On the use of transition matrix methods with extended ensembles.
Escobedo, Fernando A; Abreu, Charlles R A
2006-03-14
Different extended ensemble schemes for non-Boltzmann sampling (NBS) of a selected reaction coordinate lambda were formulated so that they employ (i) "variable" sampling window schemes (that include the "successive umbrella sampling" method) to comprehensibly explore the lambda domain and (ii) transition matrix methods to iteratively obtain the underlying free-energy eta landscape (or "importance" weights) associated with lambda. The connection between "acceptance ratio" and transition matrix methods was first established to form the basis of the approach for estimating eta(lambda). The validity and performance of the different NBS schemes were then assessed using as lambda coordinate the configurational energy of the Lennard-Jones fluid. For the cases studied, it was found that the convergence rate in the estimation of eta is little affected by the use of data from high-order transitions, while it is noticeably improved by the use of a broader window of sampling in the variable window methods. Finally, it is shown how an "elastic" window of sampling can be used to effectively enact (nonuniform) preferential sampling over the lambda domain, and how to stitch the weights from separate one-dimensional NBS runs to produce a eta surface over a two-dimensional domain.
29 CFR Appendix E to Subpart L of... - Test Methods for Protective Clothing
Code of Federal Regulations, 2010 CFR
2010-07-01
... test ensemble consisting of the sample unit, the two prepared blocks, a piece of leather outsole 10 to... perpendicular to the 1-inch (2.5 cm) edge. B. Apparatus. (i) Six-ounce (.17 kg) weight tension clamps shall be used so designed that the six ounces (.17 kg) of weight are distributed evenly across the complete...
Edwards, James P; Gerber, Urs; Schubert, Christian; Trejo, Maria Anabel; Weber, Axel
2018-04-01
We introduce two integral transforms of the quantum mechanical transition kernel that represent physical information about the path integral. These transforms can be interpreted as probability distributions on particle trajectories measuring respectively the relative contribution to the path integral from paths crossing a given spatial point (the hit function) and the likelihood of values of the line integral of the potential along a path in the ensemble (the path-averaged potential).
NASA Astrophysics Data System (ADS)
Edwards, James P.; Gerber, Urs; Schubert, Christian; Trejo, Maria Anabel; Weber, Axel
2018-04-01
We introduce two integral transforms of the quantum mechanical transition kernel that represent physical information about the path integral. These transforms can be interpreted as probability distributions on particle trajectories measuring respectively the relative contribution to the path integral from paths crossing a given spatial point (the hit function) and the likelihood of values of the line integral of the potential along a path in the ensemble (the path-averaged potential).
Zhou, Shenghan; Qian, Silin; Chang, Wenbing; Xiao, Yiyong; Cheng, Yang
2018-06-14
Timely and accurate state detection and fault diagnosis of rolling element bearings are very critical to ensuring the reliability of rotating machinery. This paper proposes a novel method of rolling bearing fault diagnosis based on a combination of ensemble empirical mode decomposition (EEMD), weighted permutation entropy (WPE) and an improved support vector machine (SVM) ensemble classifier. A hybrid voting (HV) strategy that combines SVM-based classifiers and cloud similarity measurement (CSM) was employed to improve the classification accuracy. First, the WPE value of the bearing vibration signal was calculated to detect the fault. Secondly, if a bearing fault occurred, the vibration signal was decomposed into a set of intrinsic mode functions (IMFs) by EEMD. The WPE values of the first several IMFs were calculated to form the fault feature vectors. Then, the SVM ensemble classifier was composed of binary SVM and the HV strategy to identify the bearing multi-fault types. Finally, the proposed model was fully evaluated by experiments and comparative studies. The results demonstrate that the proposed method can effectively detect bearing faults and maintain a high accuracy rate of fault recognition when a small number of training samples are available.
Fast Constrained Spectral Clustering and Cluster Ensemble with Random Projection
Liu, Wenfen
2017-01-01
Constrained spectral clustering (CSC) method can greatly improve the clustering accuracy with the incorporation of constraint information into spectral clustering and thus has been paid academic attention widely. In this paper, we propose a fast CSC algorithm via encoding landmark-based graph construction into a new CSC model and applying random sampling to decrease the data size after spectral embedding. Compared with the original model, the new algorithm has the similar results with the increase of its model size asymptotically; compared with the most efficient CSC algorithm known, the new algorithm runs faster and has a wider range of suitable data sets. Meanwhile, a scalable semisupervised cluster ensemble algorithm is also proposed via the combination of our fast CSC algorithm and dimensionality reduction with random projection in the process of spectral ensemble clustering. We demonstrate by presenting theoretical analysis and empirical results that the new cluster ensemble algorithm has advantages in terms of efficiency and effectiveness. Furthermore, the approximate preservation of random projection in clustering accuracy proved in the stage of consensus clustering is also suitable for the weighted k-means clustering and thus gives the theoretical guarantee to this special kind of k-means clustering where each point has its corresponding weight. PMID:29312447
Molloy, Kevin; Shehu, Amarda
2013-01-01
Many proteins tune their biological function by transitioning between different functional states, effectively acting as dynamic molecular machines. Detailed structural characterization of transition trajectories is central to understanding the relationship between protein dynamics and function. Computational approaches that build on the Molecular Dynamics framework are in principle able to model transition trajectories at great detail but also at considerable computational cost. Methods that delay consideration of dynamics and focus instead on elucidating energetically-credible conformational paths connecting two functionally-relevant structures provide a complementary approach. Effective sampling-based path planning methods originating in robotics have been recently proposed to produce conformational paths. These methods largely model short peptides or address large proteins by simplifying conformational space. We propose a robotics-inspired method that connects two given structures of a protein by sampling conformational paths. The method focuses on small- to medium-size proteins, efficiently modeling structural deformations through the use of the molecular fragment replacement technique. In particular, the method grows a tree in conformational space rooted at the start structure, steering the tree to a goal region defined around the goal structure. We investigate various bias schemes over a progress coordinate for balance between coverage of conformational space and progress towards the goal. A geometric projection layer promotes path diversity. A reactive temperature scheme allows sampling of rare paths that cross energy barriers. Experiments are conducted on small- to medium-size proteins of length up to 214 amino acids and with multiple known functionally-relevant states, some of which are more than 13Å apart of each-other. Analysis reveals that the method effectively obtains conformational paths connecting structural states that are significantly different. A detailed analysis on the depth and breadth of the tree suggests that a soft global bias over the progress coordinate enhances sampling and results in higher path diversity. The explicit geometric projection layer that biases the exploration away from over-sampled regions further increases coverage, often improving proximity to the goal by forcing the exploration to find new paths. The reactive temperature scheme is shown effective in increasing path diversity, particularly in difficult structural transitions with known high-energy barriers.
Inner Radiation Belt Dynamics and Climatology
NASA Astrophysics Data System (ADS)
Guild, T. B.; O'Brien, P. P.; Looper, M. D.
2012-12-01
We present preliminary results of inner belt proton data assimilation using an augmented version of the Selesnick et al. Inner Zone Model (SIZM). By varying modeled physics parameters and solar particle injection parameters to generate many ensembles of the inner belt, then optimizing the ensemble weights according to inner belt observations from SAMPEX/PET at LEO and HEO/DOS at high altitude, we obtain the best-fit state of the inner belt. We need to fully sample the range of solar proton injection sources among the ensemble members to ensure reasonable agreement between the model ensembles and observations. Once this is accomplished, we find the method is fairly robust. We will demonstrate the data assimilation by presenting an extended interval of solar proton injections and losses, illustrating how these short-term dynamics dominate long-term inner belt climatology.
Online probabilistic learning with an ensemble of forecasts
NASA Astrophysics Data System (ADS)
Thorey, Jean; Mallet, Vivien; Chaussin, Christophe
2016-04-01
Our objective is to produce a calibrated weighted ensemble to forecast a univariate time series. In addition to a meteorological ensemble of forecasts, we rely on observations or analyses of the target variable. The celebrated Continuous Ranked Probability Score (CRPS) is used to evaluate the probabilistic forecasts. However applying the CRPS on weighted empirical distribution functions (deriving from the weighted ensemble) may introduce a bias because of which minimizing the CRPS does not produce the optimal weights. Thus we propose an unbiased version of the CRPS which relies on clusters of members and is strictly proper. We adapt online learning methods for the minimization of the CRPS. These methods generate the weights associated to the members in the forecasted empirical distribution function. The weights are updated before each forecast step using only past observations and forecasts. Our learning algorithms provide the theoretical guarantee that, in the long run, the CRPS of the weighted forecasts is at least as good as the CRPS of any weighted ensemble with weights constant in time. In particular, the performance of our forecast is better than that of any subset ensemble with uniform weights. A noteworthy advantage of our algorithm is that it does not require any assumption on the distributions of the observations and forecasts, both for the application and for the theoretical guarantee to hold. As application example on meteorological forecasts for photovoltaic production integration, we show that our algorithm generates a calibrated probabilistic forecast, with significant performance improvements on probabilistic diagnostic tools (the CRPS, the reliability diagram and the rank histogram).
Generalized ensemble method applied to study systems with strong first order transitions
Malolepsza, E.; Kim, J.; Keyes, T.
2015-09-28
At strong first-order phase transitions, the entropy versus energy or, at constant pressure, enthalpy, exhibits convex behavior, and the statistical temperature curve correspondingly exhibits an S-loop or back-bending. In the canonical and isothermal-isobaric ensembles, with temperature as the control variable, the probability density functions become bimodal with peaks localized outside of the S-loop region. Inside, states are unstable, and as a result simulation of equilibrium phase coexistence becomes impossible. To overcome this problem, a method was proposed by Kim, Keyes and Straub, where optimally designed generalized ensemble sampling was combined with replica exchange, and denoted generalized replica exchange method (gREM).more » This new technique uses parametrized effective sampling weights that lead to a unimodal energy distribution, transforming unstable states into stable ones. In the present study, the gREM, originally developed as a Monte Carlo algorithm, was implemented to work with molecular dynamics in an isobaric ensemble and coded into LAMMPS, a highly optimized open source molecular simulation package. Lastly, the method is illustrated in a study of the very strong solid/liquid transition in water.« less
Generalized ensemble method applied to study systems with strong first order transitions
NASA Astrophysics Data System (ADS)
Małolepsza, E.; Kim, J.; Keyes, T.
2015-09-01
At strong first-order phase transitions, the entropy versus energy or, at constant pressure, enthalpy, exhibits convex behavior, and the statistical temperature curve correspondingly exhibits an S-loop or back-bending. In the canonical and isothermal-isobaric ensembles, with temperature as the control variable, the probability density functions become bimodal with peaks localized outside of the S-loop region. Inside, states are unstable, and as a result simulation of equilibrium phase coexistence becomes impossible. To overcome this problem, a method was proposed by Kim, Keyes and Straub [1], where optimally designed generalized ensemble sampling was combined with replica exchange, and denoted generalized replica exchange method (gREM). This new technique uses parametrized effective sampling weights that lead to a unimodal energy distribution, transforming unstable states into stable ones. In the present study, the gREM, originally developed as a Monte Carlo algorithm, was implemented to work with molecular dynamics in an isobaric ensemble and coded into LAMMPS, a highly optimized open source molecular simulation package. The method is illustrated in a study of the very strong solid/liquid transition in water.
Locally Weighted Ensemble Clustering.
Huang, Dong; Wang, Chang-Dong; Lai, Jian-Huang
2018-05-01
Due to its ability to combine multiple base clusterings into a probably better and more robust clustering, the ensemble clustering technique has been attracting increasing attention in recent years. Despite the significant success, one limitation to most of the existing ensemble clustering methods is that they generally treat all base clusterings equally regardless of their reliability, which makes them vulnerable to low-quality base clusterings. Although some efforts have been made to (globally) evaluate and weight the base clusterings, yet these methods tend to view each base clustering as an individual and neglect the local diversity of clusters inside the same base clustering. It remains an open problem how to evaluate the reliability of clusters and exploit the local diversity in the ensemble to enhance the consensus performance, especially, in the case when there is no access to data features or specific assumptions on data distribution. To address this, in this paper, we propose a novel ensemble clustering approach based on ensemble-driven cluster uncertainty estimation and local weighting strategy. In particular, the uncertainty of each cluster is estimated by considering the cluster labels in the entire ensemble via an entropic criterion. A novel ensemble-driven cluster validity measure is introduced, and a locally weighted co-association matrix is presented to serve as a summary for the ensemble of diverse clusters. With the local diversity in ensembles exploited, two novel consensus functions are further proposed. Extensive experiments on a variety of real-world datasets demonstrate the superiority of the proposed approach over the state-of-the-art.
Improved transition path sampling methods for simulation of rare events
NASA Astrophysics Data System (ADS)
Chopra, Manan; Malshe, Rohit; Reddy, Allam S.; de Pablo, J. J.
2008-04-01
The free energy surfaces of a wide variety of systems encountered in physics, chemistry, and biology are characterized by the existence of deep minima separated by numerous barriers. One of the central aims of recent research in computational chemistry and physics has been to determine how transitions occur between deep local minima on rugged free energy landscapes, and transition path sampling (TPS) Monte-Carlo methods have emerged as an effective means for numerical investigation of such transitions. Many of the shortcomings of TPS-like approaches generally stem from their high computational demands. Two new algorithms are presented in this work that improve the efficiency of TPS simulations. The first algorithm uses biased shooting moves to render the sampling of reactive trajectories more efficient. The second algorithm is shown to substantially improve the accuracy of the transition state ensemble by introducing a subset of local transition path simulations in the transition state. The system considered in this work consists of a two-dimensional rough energy surface that is representative of numerous systems encountered in applications. When taken together, these algorithms provide gains in efficiency of over two orders of magnitude when compared to traditional TPS simulations.
An ensemble method for extracting adverse drug events from social media.
Liu, Jing; Zhao, Songzheng; Zhang, Xiaodi
2016-06-01
Because adverse drug events (ADEs) are a serious health problem and a leading cause of death, it is of vital importance to identify them correctly and in a timely manner. With the development of Web 2.0, social media has become a large data source for information on ADEs. The objective of this study is to develop a relation extraction system that uses natural language processing techniques to effectively distinguish between ADEs and non-ADEs in informal text on social media. We develop a feature-based approach that utilizes various lexical, syntactic, and semantic features. Information-gain-based feature selection is performed to address high-dimensional features. Then, we evaluate the effectiveness of four well-known kernel-based approaches (i.e., subset tree kernel, tree kernel, shortest dependency path kernel, and all-paths graph kernel) and several ensembles that are generated by adopting different combination methods (i.e., majority voting, weighted averaging, and stacked generalization). All of the approaches are tested using three data sets: two health-related discussion forums and one general social networking site (i.e., Twitter). When investigating the contribution of each feature subset, the feature-based approach attains the best area under the receiver operating characteristics curve (AUC) values, which are 78.6%, 72.2%, and 79.2% on the three data sets. When individual methods are used, we attain the best AUC values of 82.1%, 73.2%, and 77.0% using the subset tree kernel, shortest dependency path kernel, and feature-based approach on the three data sets, respectively. When using classifier ensembles, we achieve the best AUC values of 84.5%, 77.3%, and 84.5% on the three data sets, outperforming the baselines. Our experimental results indicate that ADE extraction from social media can benefit from feature selection. With respect to the effectiveness of different feature subsets, lexical features and semantic features can enhance the ADE extraction capability. Kernel-based approaches, which can stay away from the feature sparsity issue, are qualified to address the ADE extraction problem. Combining different individual classifiers using suitable combination methods can further enhance the ADE extraction effectiveness. Copyright © 2016 Elsevier B.V. All rights reserved.
Ensemble of classifiers for confidence-rated classification of NDE signal
NASA Astrophysics Data System (ADS)
Banerjee, Portia; Safdarnejad, Seyed; Udpa, Lalita; Udpa, Satish
2016-02-01
Ensemble of classifiers in general, aims to improve classification accuracy by combining results from multiple weak hypotheses into a single strong classifier through weighted majority voting. Improved versions of ensemble of classifiers generate self-rated confidence scores which estimate the reliability of each of its prediction and boost the classifier using these confidence-rated predictions. However, such a confidence metric is based only on the rate of correct classification. In existing works, although ensemble of classifiers has been widely used in computational intelligence, the effect of all factors of unreliability on the confidence of classification is highly overlooked. With relevance to NDE, classification results are affected by inherent ambiguity of classifica-tion, non-discriminative features, inadequate training samples and noise due to measurement. In this paper, we extend the existing ensemble classification by maximizing confidence of every classification decision in addition to minimizing the classification error. Initial results of the approach on data from eddy current inspection show improvement in classification performance of defect and non-defect indications.
Rate Constant and Reaction Coordinate of Trp-Cage Folding in Explicit Water
Juraszek, Jarek; Bolhuis, Peter G.
2008-01-01
We report rate constant calculations and a reaction coordinate analysis of the rate-limiting folding and unfolding process of the Trp-cage mini-protein in explicit solvent using transition interface sampling. Previous transition path sampling simulations revealed that in this (un)folding process the protein maintains its compact configuration, while a (de)increase of secondary structure is observed. The calculated folding rate agrees reasonably with experiment, while the unfolding rate is 10 times higher. We discuss possible origins for this mismatch. We recomputed the rates with the forward flux sampling method, and found a discrepancy of four orders of magnitude, probably caused by the method's higher sensitivity to the choice of order parameter with respect to transition interface sampling. Finally, we used the previously computed transition path-sampling ensemble to screen combinations of many order parameters for the best model of the reaction coordinate by employing likelihood maximization. We found that a combination of the root mean-square deviation of the helix and of the entire protein was, of the set of tried order parameters, the one that best describes the reaction coordination. PMID:18676648
Constructing better classifier ensemble based on weighted accuracy and diversity measure.
Zeng, Xiaodong; Wong, Derek F; Chao, Lidia S
2014-01-01
A weighted accuracy and diversity (WAD) method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In fact, accuracy and diversity are mutual restraint factors; that is, an ensemble with high accuracy may have low diversity, and an overly diverse ensemble may negatively affect accuracy. This study proposes a method to find the balance between accuracy and diversity that enhances the predictive ability of an ensemble for unknown data. The quality assessment for an ensemble is performed such that the final score is achieved by computing the harmonic mean of accuracy and diversity, where two weight parameters are used to balance them. The measure is compared to two representative measures, Kappa-Error and GenDiv, and two threshold measures that consider only accuracy or diversity, with two heuristic search algorithms, genetic algorithm, and forward hill-climbing algorithm, in ensemble selection tasks performed on 15 UCI benchmark datasets. The empirical results demonstrate that the WAD measure is superior to others in most cases.
Constructing Better Classifier Ensemble Based on Weighted Accuracy and Diversity Measure
Chao, Lidia S.
2014-01-01
A weighted accuracy and diversity (WAD) method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In fact, accuracy and diversity are mutual restraint factors; that is, an ensemble with high accuracy may have low diversity, and an overly diverse ensemble may negatively affect accuracy. This study proposes a method to find the balance between accuracy and diversity that enhances the predictive ability of an ensemble for unknown data. The quality assessment for an ensemble is performed such that the final score is achieved by computing the harmonic mean of accuracy and diversity, where two weight parameters are used to balance them. The measure is compared to two representative measures, Kappa-Error and GenDiv, and two threshold measures that consider only accuracy or diversity, with two heuristic search algorithms, genetic algorithm, and forward hill-climbing algorithm, in ensemble selection tasks performed on 15 UCI benchmark datasets. The empirical results demonstrate that the WAD measure is superior to others in most cases. PMID:24672402
Creating "Intelligent" Ensemble Averages Using a Process-Based Framework
NASA Astrophysics Data System (ADS)
Baker, Noel; Taylor, Patrick
2014-05-01
The CMIP5 archive contains future climate projections from over 50 models provided by dozens of modeling centers from around the world. Individual model projections, however, are subject to biases created by structural model uncertainties. As a result, ensemble averaging of multiple models is used to add value to individual model projections and construct a consensus projection. Previous reports for the IPCC establish climate change projections based on an equal-weighted average of all model projections. However, individual models reproduce certain climate processes better than other models. Should models be weighted based on performance? Unequal ensemble averages have previously been constructed using a variety of mean state metrics. What metrics are most relevant for constraining future climate projections? This project develops a framework for systematically testing metrics in models to identify optimal metrics for unequal weighting multi-model ensembles. The intention is to produce improved ("intelligent") unequal-weight ensemble averages. A unique aspect of this project is the construction and testing of climate process-based model evaluation metrics. A climate process-based metric is defined as a metric based on the relationship between two physically related climate variables—e.g., outgoing longwave radiation and surface temperature. Several climate process metrics are constructed using high-quality Earth radiation budget data from NASA's Clouds and Earth's Radiant Energy System (CERES) instrument in combination with surface temperature data sets. It is found that regional values of tested quantities can vary significantly when comparing the equal-weighted ensemble average and an ensemble weighted using the process-based metric. Additionally, this study investigates the dependence of the metric weighting scheme on the climate state using a combination of model simulations including a non-forced preindustrial control experiment, historical simulations, and several radiative forcing Representative Concentration Pathway (RCP) scenarios. Ultimately, the goal of the framework is to advise better methods for ensemble averaging models and create better climate predictions.
Long-time Dynamics of Stochastic Wave Breaking
NASA Astrophysics Data System (ADS)
Restrepo, J. M.; Ramirez, J. M.; Deike, L.; Melville, K.
2017-12-01
A stochastic parametrization is proposed for the dynamics of wave breaking of progressive water waves. The model is shown to agree with transport estimates, derived from the Lagrangian path of fluid parcels. These trajectories are obtained numerically and are shown to agree well with theory in the non-breaking regime. Of special interest is the impact of wave breaking on transport, momentum exchanges and energy dissipation, as well as dispersion of trajectories. The proposed model, ensemble averaged to larger time scales, is compared to ensemble averages of the numerically generated parcel dynamics, and is then used to capture energy dissipation and path dispersion.
Zheng, Weihua; Gallicchio, Emilio; Deng, Nanjie; Andrec, Michael; Levy, Ronald M.
2011-01-01
We present a new approach to study a multitude of folding pathways and different folding mechanisms for the 20-residue mini-protein Trp-Cage using the combined power of replica exchange molecular dynamics (REMD) simulations for conformational sampling, Transition Path Theory (TPT) for constructing folding pathways and stochastic simulations for sampling the pathways in a high dimensional structure space. REMD simulations of Trp-Cage with 16 replicas at temperatures between 270K and 566K are carried out with an all-atom force field (OPLSAA) and an implicit solvent model (AGBNP). The conformations sampled from all temperatures are collected. They form a discretized state space that can be used to model the folding process. The equilibrium population for each state at a target temperature can be calculated using the Weighted-Histogram-Analysis Method (WHAM). By connecting states with similar structures and creating edges satisfying detailed balance conditions, we construct a kinetic network that preserves the equilibrium population distribution of the state space. After defining the folded and unfolded macrostates, committor probabilities (Pfold) are calculated by solving a set of linear equations for each node in the network and pathways are extracted together with their fluxes using the TPT algorithm. By clustering the pathways into folding “tubes”, a more physically meaningful picture of the diversity of folding routes emerges. Stochastic simulations are carried out on the network and a procedure is developed to project sampled trajectories onto the folding tubes. The fluxes through the folding tubes calculated from the stochastic trajectories are in good agreement with the corresponding values obtained from the TPT analysis. The temperature dependence of the ensemble of Trp-Cage folding pathways is investigated. Above the folding temperature, a large number of diverse folding pathways with comparable fluxes flood the energy landscape. At low temperature, however, the folding transition is dominated by only a few localized pathways. PMID:21254767
Zheng, Weihua; Gallicchio, Emilio; Deng, Nanjie; Andrec, Michael; Levy, Ronald M
2011-02-17
We present a new approach to study a multitude of folding pathways and different folding mechanisms for the 20-residue mini-protein Trp-Cage using the combined power of replica exchange molecular dynamics (REMD) simulations for conformational sampling, transition path theory (TPT) for constructing folding pathways, and stochastic simulations for sampling the pathways in a high dimensional structure space. REMD simulations of Trp-Cage with 16 replicas at temperatures between 270 and 566 K are carried out with an all-atom force field (OPLSAA) and an implicit solvent model (AGBNP). The conformations sampled from all temperatures are collected. They form a discretized state space that can be used to model the folding process. The equilibrium population for each state at a target temperature can be calculated using the weighted-histogram-analysis method (WHAM). By connecting states with similar structures and creating edges satisfying detailed balance conditions, we construct a kinetic network that preserves the equilibrium population distribution of the state space. After defining the folded and unfolded macrostates, committor probabilities (P(fold)) are calculated by solving a set of linear equations for each node in the network and pathways are extracted together with their fluxes using the TPT algorithm. By clustering the pathways into folding "tubes", a more physically meaningful picture of the diversity of folding routes emerges. Stochastic simulations are carried out on the network, and a procedure is developed to project sampled trajectories onto the folding tubes. The fluxes through the folding tubes calculated from the stochastic trajectories are in good agreement with the corresponding values obtained from the TPT analysis. The temperature dependence of the ensemble of Trp-Cage folding pathways is investigated. Above the folding temperature, a large number of diverse folding pathways with comparable fluxes flood the energy landscape. At low temperature, however, the folding transition is dominated by only a few localized pathways.
Quantum Gibbs ensemble Monte Carlo
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fantoni, Riccardo, E-mail: rfantoni@ts.infn.it; Moroni, Saverio, E-mail: moroni@democritos.it
We present a path integral Monte Carlo method which is the full quantum analogue of the Gibbs ensemble Monte Carlo method of Panagiotopoulos to study the gas-liquid coexistence line of a classical fluid. Unlike previous extensions of Gibbs ensemble Monte Carlo to include quantum effects, our scheme is viable even for systems with strong quantum delocalization in the degenerate regime of temperature. This is demonstrated by an illustrative application to the gas-superfluid transition of {sup 4}He in two dimensions.
Nuclear Ensemble Approach with Importance Sampling.
Kossoski, Fábris; Barbatti, Mario
2018-06-12
We show that the importance sampling technique can effectively augment the range of problems where the nuclear ensemble approach can be applied. A sampling probability distribution function initially determines the collection of initial conditions for which calculations are performed, as usual. Then, results for a distinct target distribution are computed by introducing compensating importance sampling weights for each sampled point. This mapping between the two probability distributions can be performed whenever they are both explicitly constructed. Perhaps most notably, this procedure allows for the computation of temperature dependent observables. As a test case, we investigated the UV absorption spectra of phenol, which has been shown to have a marked temperature dependence. Application of the proposed technique to a range that covers 500 K provides results that converge to those obtained with conventional sampling. We further show that an overall improved rate of convergence is obtained when sampling is performed at intermediate temperatures. The comparison between calculated and the available measured cross sections is very satisfactory, as the main features of the spectra are correctly reproduced. As a second test case, one of Tully's classical models was revisited, and we show that the computation of dynamical observables also profits from the importance sampling technique. In summary, the strategy developed here can be employed to assess the role of temperature for any property calculated within the nuclear ensemble method, with the same computational cost as doing so for a single temperature.
Yang, Shan; Al-Hashimi, Hashim M.
2016-01-01
A growing number of studies employ time-averaged experimental data to determine dynamic ensembles of biomolecules. While it is well known that different ensembles can satisfy experimental data to within error, the extent and nature of these degeneracies, and their impact on the accuracy of the ensemble determination remains poorly understood. Here, we use simulations and a recently introduced metric for assessing ensemble similarity to explore degeneracies in determining ensembles using NMR residual dipolar couplings (RDCs) with specific application to A-form helices in RNA. Various target ensembles were constructed representing different domain-domain orientational distributions that are confined to a topologically restricted (<10%) conformational space. Five independent sets of ensemble averaged RDCs were then computed for each target ensemble and a ‘sample and select’ scheme used to identify degenerate ensembles that satisfy RDCs to within experimental uncertainty. We find that ensembles with different ensemble sizes and that can differ significantly from the target ensemble (by as much as ΣΩ ~ 0.4 where ΣΩ varies between 0 and 1 for maximum and minimum ensemble similarity, respectively) can satisfy the ensemble averaged RDCs. These deviations increase with the number of unique conformers and breadth of the target distribution, and result in significant uncertainty in determining conformational entropy (as large as 5 kcal/mol at T = 298 K). Nevertheless, the RDC-degenerate ensembles are biased towards populated regions of the target ensemble, and capture other essential features of the distribution, including the shape. Our results identify ensemble size as a major source of uncertainty in determining ensembles and suggest that NMR interactions such as RDCs and spin relaxation, on their own, do not carry the necessary information needed to determine conformational entropy at a useful level of precision. The framework introduced here provides a general approach for exploring degeneracies in ensemble determination for different types of experimental data. PMID:26131693
2013-01-01
Background Many proteins tune their biological function by transitioning between different functional states, effectively acting as dynamic molecular machines. Detailed structural characterization of transition trajectories is central to understanding the relationship between protein dynamics and function. Computational approaches that build on the Molecular Dynamics framework are in principle able to model transition trajectories at great detail but also at considerable computational cost. Methods that delay consideration of dynamics and focus instead on elucidating energetically-credible conformational paths connecting two functionally-relevant structures provide a complementary approach. Effective sampling-based path planning methods originating in robotics have been recently proposed to produce conformational paths. These methods largely model short peptides or address large proteins by simplifying conformational space. Methods We propose a robotics-inspired method that connects two given structures of a protein by sampling conformational paths. The method focuses on small- to medium-size proteins, efficiently modeling structural deformations through the use of the molecular fragment replacement technique. In particular, the method grows a tree in conformational space rooted at the start structure, steering the tree to a goal region defined around the goal structure. We investigate various bias schemes over a progress coordinate for balance between coverage of conformational space and progress towards the goal. A geometric projection layer promotes path diversity. A reactive temperature scheme allows sampling of rare paths that cross energy barriers. Results and conclusions Experiments are conducted on small- to medium-size proteins of length up to 214 amino acids and with multiple known functionally-relevant states, some of which are more than 13Å apart of each-other. Analysis reveals that the method effectively obtains conformational paths connecting structural states that are significantly different. A detailed analysis on the depth and breadth of the tree suggests that a soft global bias over the progress coordinate enhances sampling and results in higher path diversity. The explicit geometric projection layer that biases the exploration away from over-sampled regions further increases coverage, often improving proximity to the goal by forcing the exploration to find new paths. The reactive temperature scheme is shown effective in increasing path diversity, particularly in difficult structural transitions with known high-energy barriers. PMID:24565158
NASA Astrophysics Data System (ADS)
dos Santos, A. F.; Freitas, S. R.; de Mattos, J. G. Z.; de Campos Velho, H. F.; Gan, M. A.; da Luz, E. F. P.; Grell, G. A.
2013-09-01
In this paper we consider an optimization problem applying the metaheuristic Firefly algorithm (FY) to weight an ensemble of rainfall forecasts from daily precipitation simulations with the Brazilian developments on the Regional Atmospheric Modeling System (BRAMS) over South America during January 2006. The method is addressed as a parameter estimation problem to weight the ensemble of precipitation forecasts carried out using different options of the convective parameterization scheme. Ensemble simulations were performed using different choices of closures, representing different formulations of dynamic control (the modulation of convection by the environment) in a deep convection scheme. The optimization problem is solved as an inverse problem of parameter estimation. The application and validation of the methodology is carried out using daily precipitation fields, defined over South America and obtained by merging remote sensing estimations with rain gauge observations. The quadratic difference between the model and observed data was used as the objective function to determine the best combination of the ensemble members to reproduce the observations. To reduce the model rainfall biases, the set of weights determined by the algorithm is used to weight members of an ensemble of model simulations in order to compute a new precipitation field that represents the observed precipitation as closely as possible. The validation of the methodology is carried out using classical statistical scores. The algorithm has produced the best combination of the weights, resulting in a new precipitation field closest to the observations.
NASA Astrophysics Data System (ADS)
Fernández, J.; Frías, M. D.; Cabos, W. D.; Cofiño, A. S.; Domínguez, M.; Fita, L.; Gaertner, M. A.; García-Díez, M.; Gutiérrez, J. M.; Jiménez-Guerrero, P.; Liguori, G.; Montávez, J. P.; Romera, R.; Sánchez, E.
2018-03-01
We present an unprecedented ensemble of 196 future climate projections arising from different global and regional model intercomparison projects (MIPs): CMIP3, CMIP5, ENSEMBLES, ESCENA, EURO- and Med-CORDEX. This multi-MIP ensemble includes all regional climate model (RCM) projections publicly available to date, along with their driving global climate models (GCMs). We illustrate consistent and conflicting messages using continental Spain and the Balearic Islands as target region. The study considers near future (2021-2050) changes and their dependence on several uncertainty sources sampled in the multi-MIP ensemble: GCM, future scenario, internal variability, RCM, and spatial resolution. This initial work focuses on mean seasonal precipitation and temperature changes. The results show that the potential GCM-RCM combinations have been explored very unevenly, with favoured GCMs and large ensembles of a few RCMs that do not respond to any ensemble design. Therefore, the grand-ensemble is weighted towards a few models. The selection of a balanced, credible sub-ensemble is challenged in this study by illustrating several conflicting responses between the RCM and its driving GCM and among different RCMs. Sub-ensembles from different initiatives are dominated by different uncertainty sources, being the driving GCM the main contributor to uncertainty in the grand-ensemble. For this analysis of the near future changes, the emission scenario does not lead to a strong uncertainty. Despite the extra computational effort, for mean seasonal changes, the increase in resolution does not lead to important changes.
Classifying Imbalanced Data Streams via Dynamic Feature Group Weighting with Importance Sampling.
Wu, Ke; Edwards, Andrea; Fan, Wei; Gao, Jing; Zhang, Kun
2014-04-01
Data stream classification and imbalanced data learning are two important areas of data mining research. Each has been well studied to date with many interesting algorithms developed. However, only a few approaches reported in literature address the intersection of these two fields due to their complex interplay. In this work, we proposed an importance sampling driven, dynamic feature group weighting framework (DFGW-IS) for classifying data streams of imbalanced distribution. Two components are tightly incorporated into the proposed approach to address the intrinsic characteristics of concept-drifting, imbalanced streaming data. Specifically, the ever-evolving concepts are tackled by a weighted ensemble trained on a set of feature groups with each sub-classifier (i.e. a single classifier or an ensemble) weighed by its discriminative power and stable level. The un-even class distribution, on the other hand, is typically battled by the sub-classifier built in a specific feature group with the underlying distribution rebalanced by the importance sampling technique. We derived the theoretical upper bound for the generalization error of the proposed algorithm. We also studied the empirical performance of our method on a set of benchmark synthetic and real world data, and significant improvement has been achieved over the competing algorithms in terms of standard evaluation metrics and parallel running time. Algorithm implementations and datasets are available upon request.
Measuring excess free energies of self-assembled membrane structures.
Norizoe, Yuki; Daoulas, Kostas Ch; Müller, Marcus
2010-01-01
Using computer simulation of a solvent-free, coarse-grained model for amphiphilic membranes, we study the excess free energy of hourglass-shaped connections (i.e., stalks) between two apposed bilayer membranes. In order to calculate the free energy by simulation in the canonical ensemble, we reversibly transfer two apposed bilayers into a configuration with a stalk in three steps. First, we gradually replace the intermolecular interactions by an external, ordering field. The latter is chosen such that the structure of the non-interacting system in this field closely resembles the structure of the original, interacting system in the absence of the external field. The absence of structural changes along this path suggests that it is reversible; a fact which is confirmed by expanded-ensemble simulations. Second, the external, ordering field is changed as to transform the non-interacting system from the apposed bilayer structure to two-bilayers connected by a stalk. The final external field is chosen such that the structure of the non-interacting system resembles the structure of the stalk in the interacting system without a field. On the third branch of the transformation path, we reversibly replace the external, ordering field by non-bonded interactions. Using expanded-ensemble techniques, the free energy change along this reversible path can be obtained with an accuracy of 10(-3)k(B)T per molecule in the n VT-ensemble. Calculating the chemical potential, we obtain the free energy of a stalk in the grandcanonical ensemble, and employing semi-grandcanonical techniques, we calculate the change of the excess free energy upon altering the molecular architecture. This computational strategy can be applied to compute the free energy of self-assembled phases in lipid and copolymer systems, and the excess free energy of defects or interfaces.
NASA Astrophysics Data System (ADS)
Soltanzadeh, I.; Azadi, M.; Vakili, G. A.
2011-07-01
Using Bayesian Model Averaging (BMA), an attempt was made to obtain calibrated probabilistic numerical forecasts of 2-m temperature over Iran. The ensemble employs three limited area models (WRF, MM5 and HRM), with WRF used with five different configurations. Initial and boundary conditions for MM5 and WRF are obtained from the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS) and for HRM the initial and boundary conditions come from analysis of Global Model Europe (GME) of the German Weather Service. The resulting ensemble of seven members was run for a period of 6 months (from December 2008 to May 2009) over Iran. The 48-h raw ensemble outputs were calibrated using BMA technique for 120 days using a 40 days training sample of forecasts and relative verification data. The calibrated probabilistic forecasts were assessed using rank histogram and attribute diagrams. Results showed that application of BMA improved the reliability of the raw ensemble. Using the weighted ensemble mean forecast as a deterministic forecast it was found that the deterministic-style BMA forecasts performed usually better than the best member's deterministic forecast.
Parameter Uncertainty on AGCM-simulated Tropical Cyclones
NASA Astrophysics Data System (ADS)
He, F.
2015-12-01
This work studies the parameter uncertainty on tropical cyclone (TC) simulations in Atmospheric General Circulation Models (AGCMs) using the Reed-Jablonowski TC test case, which is illustrated in Community Atmosphere Model (CAM). It examines the impact from 24 parameters across the physical parameterization schemes that represent the convection, turbulence, precipitation and cloud processes in AGCMs. The one-at-a-time (OAT) sensitivity analysis method first quantifies their relative importance on TC simulations and identifies the key parameters to the six different TC characteristics: intensity, precipitation, longwave cloud radiative forcing (LWCF), shortwave cloud radiative forcing (SWCF), cloud liquid water path (LWP) and ice water path (IWP). Then, 8 physical parameters are chosen and perturbed using the Latin-Hypercube Sampling (LHS) method. The comparison between OAT ensemble run and LHS ensemble run shows that the simulated TC intensity is mainly affected by the parcel fractional mass entrainment rate in Zhang-McFarlane (ZM) deep convection scheme. The nonlinear interactive effect among different physical parameters is negligible on simulated TC intensity. In contrast, this nonlinear interactive effect plays a significant role in other simulated tropical cyclone characteristics (precipitation, LWCF, SWCF, LWP and IWP) and greatly enlarge their simulated uncertainties. The statistical emulator Extended Multivariate Adaptive Regression Splines (EMARS) is applied to characterize the response functions for nonlinear effect. Last, we find that the intensity uncertainty caused by physical parameters is in a degree comparable to uncertainty caused by model structure (e.g. grid) and initial conditions (e.g. sea surface temperature, atmospheric moisture). These findings suggest the importance of using the perturbed physics ensemble (PPE) method to revisit tropical cyclone prediction under climate change scenario.
Gruber, Susan; Logan, Roger W; Jarrín, Inmaculada; Monge, Susana; Hernán, Miguel A
2015-01-15
Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However, a data-adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V-fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS-MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results. Copyright © 2014 John Wiley & Sons, Ltd.
Gruber, Susan; Logan, Roger W.; Jarrín, Inmaculada; Monge, Susana; Hernán, Miguel A.
2014-01-01
Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However a data-adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V -fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS-MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results. PMID:25316152
Nonequilibrium umbrella sampling in spaces of many order parameters
NASA Astrophysics Data System (ADS)
Dickson, Alex; Warmflash, Aryeh; Dinner, Aaron R.
2009-02-01
We recently introduced an umbrella sampling method for obtaining nonequilibrium steady-state probability distributions projected onto an arbitrary number of coordinates that characterize a system (order parameters) [A. Warmflash, P. Bhimalapuram, and A. R. Dinner, J. Chem. Phys. 127, 154112 (2007)]. Here, we show how our algorithm can be combined with the image update procedure from the finite-temperature string method for reversible processes [E. Vanden-Eijnden and M. Venturoli, "Revisiting the finite temperature string method for calculation of reaction tubes and free energies," J. Chem. Phys. (in press)] to enable restricted sampling of a nonequilibrium steady state in the vicinity of a path in a many-dimensional space of order parameters. For the study of transitions between stable states, the adapted algorithm results in improved scaling with the number of order parameters and the ability to progressively refine the regions of enforced sampling. We demonstrate the algorithm by applying it to a two-dimensional model of driven Brownian motion and a coarse-grained (Ising) model for nucleation under shear. It is found that the choice of order parameters can significantly affect the convergence of the simulation; local magnetization variables other than those used previously for sampling transition paths in Ising systems are needed to ensure that the reactive flux is primarily contained within a tube in the space of order parameters. The relation of this method to other algorithms that sample the statistics of path ensembles is discussed.
Creating "Intelligent" Climate Model Ensemble Averages Using a Process-Based Framework
NASA Astrophysics Data System (ADS)
Baker, N. C.; Taylor, P. C.
2014-12-01
The CMIP5 archive contains future climate projections from over 50 models provided by dozens of modeling centers from around the world. Individual model projections, however, are subject to biases created by structural model uncertainties. As a result, ensemble averaging of multiple models is often used to add value to model projections: consensus projections have been shown to consistently outperform individual models. Previous reports for the IPCC establish climate change projections based on an equal-weighted average of all model projections. However, certain models reproduce climate processes better than other models. Should models be weighted based on performance? Unequal ensemble averages have previously been constructed using a variety of mean state metrics. What metrics are most relevant for constraining future climate projections? This project develops a framework for systematically testing metrics in models to identify optimal metrics for unequal weighting multi-model ensembles. A unique aspect of this project is the construction and testing of climate process-based model evaluation metrics. A climate process-based metric is defined as a metric based on the relationship between two physically related climate variables—e.g., outgoing longwave radiation and surface temperature. Metrics are constructed using high-quality Earth radiation budget data from NASA's Clouds and Earth's Radiant Energy System (CERES) instrument and surface temperature data sets. It is found that regional values of tested quantities can vary significantly when comparing weighted and unweighted model ensembles. For example, one tested metric weights the ensemble by how well models reproduce the time-series probability distribution of the cloud forcing component of reflected shortwave radiation. The weighted ensemble for this metric indicates lower simulated precipitation (up to .7 mm/day) in tropical regions than the unweighted ensemble: since CMIP5 models have been shown to overproduce precipitation, this result could indicate that the metric is effective in identifying models which simulate more realistic precipitation. Ultimately, the goal of the framework is to identify performance metrics for advising better methods for ensemble averaging models and create better climate predictions.
Path Similarity Analysis: A Method for Quantifying Macromolecular Pathways
Seyler, Sean L.; Kumar, Avishek; Thorpe, M. F.; Beckstein, Oliver
2015-01-01
Diverse classes of proteins function through large-scale conformational changes and various sophisticated computational algorithms have been proposed to enhance sampling of these macromolecular transition paths. Because such paths are curves in a high-dimensional space, it has been difficult to quantitatively compare multiple paths, a necessary prerequisite to, for instance, assess the quality of different algorithms. We introduce a method named Path Similarity Analysis (PSA) that enables us to quantify the similarity between two arbitrary paths and extract the atomic-scale determinants responsible for their differences. PSA utilizes the full information available in 3N-dimensional configuration space trajectories by employing the Hausdorff or Fréchet metrics (adopted from computational geometry) to quantify the degree of similarity between piecewise-linear curves. It thus completely avoids relying on projections into low dimensional spaces, as used in traditional approaches. To elucidate the principles of PSA, we quantified the effect of path roughness induced by thermal fluctuations using a toy model system. Using, as an example, the closed-to-open transitions of the enzyme adenylate kinase (AdK) in its substrate-free form, we compared a range of protein transition path-generating algorithms. Molecular dynamics-based dynamic importance sampling (DIMS) MD and targeted MD (TMD) and the purely geometric FRODA (Framework Rigidity Optimized Dynamics Algorithm) were tested along with seven other methods publicly available on servers, including several based on the popular elastic network model (ENM). PSA with clustering revealed that paths produced by a given method are more similar to each other than to those from another method and, for instance, that the ENM-based methods produced relatively similar paths. PSA applied to ensembles of DIMS MD and FRODA trajectories of the conformational transition of diphtheria toxin, a particularly challenging example, showed that the geometry-based FRODA occasionally sampled the pathway space of force field-based DIMS MD. For the AdK transition, the new concept of a Hausdorff-pair map enabled us to extract the molecular structural determinants responsible for differences in pathways, namely a set of conserved salt bridges whose charge-charge interactions are fully modelled in DIMS MD but not in FRODA. PSA has the potential to enhance our understanding of transition path sampling methods, validate them, and to provide a new approach to analyzing conformational transitions. PMID:26488417
Effects of a mutation on the folding mechanism of a beta-hairpin.
Juraszek, Jarek; Bolhuis, Peter G
2009-12-17
The folding mechanism of a protein is determined by its primary sequence. Yet, how the mechanism is changed by a mutation is still poorly understood, even for basic secondary structures such as beta-hairpins. We perform an extensive simulation study of the effects of mutating the GB1 beta-hairpin into Trpzip4 (Y5W, F12W, V14W) on the folding mechanism. While Trpzip4 has a much more stable native state due to very strong hydrophobic interactions of the side chains, its folding rate does not differ significantly from the wild type beta-hairpin. We sample the free-energy landscapes of both hairpins with Replica Exchange Molecular Dynamics (REMD) and identify the four (meta)stable states (U, H, F, and N). Using Transition Path Sampling (TPS), we then harvest ensembles of unbiased pathways between the H and F states and between the F and N states to investigate the unbiased folding mechanisms. In both hairpins, the hydrophobic collapse (U-H) is followed by the middle hydrogen bond formation (H-F), and finally a closing of the strands in a zipper-like fashion (F-N). For the Trpzip4, the path ensembles indicate that the final F-N step is much more difficult than for GB1 and involves partial unfolding, rezipping of hydrogen bonds, and rearrangement of the Trp-14 side chain. For the rate-limiting (H-F) step, the path ensembles show that in GB1 desolvation and strand closure go hand in hand, while in Trpzip4 desolvation is decoupled from strand closure. Nevertheless, likelihood maximization shows that the reaction coordinate for both hairpins remains the interstrand distance. We conclude that the folding mechanism of both hairpins is a combination of hydrophobic collapse and zipping of hydrogen bonds but that the zipper mechanism is more visible in Trpzip4. A major difference between the two hairpins is that in the transition state of the rate-limiting step for Trpzip4 one tryptophan is exposed to the solvent due to steric hindrance, making the folding mechanism more complex and leading to an increased F-N barrier. Thus, our results show in atomistic detail how a mutation leads to a different folding mechanism and results in a more frustrated folding free-energy landscape.
NASA Technical Reports Server (NTRS)
Hizanidis, Kyriakos; Vlahos, L.; Polymilis, C.
1989-01-01
The relativistic motion of an ensemble of electrons in an intense monochromatic electromagnetic wave propagating obliquely in a uniform external magnetic field is studied. The problem is formulated from the viewpoint of Hamiltonian theory and the Fokker-Planck-Kolmogorov approach analyzed by Hizanidis (1989), leading to a one-dimensional diffusive acceleration along paths of constant zeroth-order generalized Hamiltonian. For values of the wave amplitude and the propagating angle inside the analytically predicted stochastic region, the numerical results suggest that the diffusion probes proceeds in stages. In the first stage, the electrons are accelerated to relatively high energies by sampling the first few overlapping resonances one by one. During that stage, the ensemble-average square deviation of the variable involved scales quadratically with time. During the second stage, they scale linearly with time. For much longer times, deviation from linear scaling slowly sets in.
Asymptotic Linear Spectral Statistics for Spiked Hermitian Random Matrices
NASA Astrophysics Data System (ADS)
Passemier, Damien; McKay, Matthew R.; Chen, Yang
2015-07-01
Using the Coulomb Fluid method, this paper derives central limit theorems (CLTs) for linear spectral statistics of three "spiked" Hermitian random matrix ensembles. These include Johnstone's spiked model (i.e., central Wishart with spiked correlation), non-central Wishart with rank-one non-centrality, and a related class of non-central matrices. For a generic linear statistic, we derive simple and explicit CLT expressions as the matrix dimensions grow large. For all three ensembles under consideration, we find that the primary effect of the spike is to introduce an correction term to the asymptotic mean of the linear spectral statistic, which we characterize with simple formulas. The utility of our proposed framework is demonstrated through application to three different linear statistics problems: the classical likelihood ratio test for a population covariance, the capacity analysis of multi-antenna wireless communication systems with a line-of-sight transmission path, and a classical multiple sample significance testing problem.
NASA Astrophysics Data System (ADS)
Wang, Yuanbing; Min, Jinzhong; Chen, Yaodeng; Huang, Xiang-Yu; Zeng, Mingjian; Li, Xin
2017-01-01
This study evaluates the performance of three-dimensional variational (3DVar) and a hybrid data assimilation system using time-lagged ensembles in a heavy rainfall event. The time-lagged ensembles are constructed by sampling from a moving time window of 3 h along a model trajectory, which is economical and easy to implement. The proposed hybrid data assimilation system introduces flow-dependent error covariance derived from time-lagged ensemble into variational cost function without significantly increasing computational cost. Single observation tests are performed to document characteristic of the hybrid system. The sensitivity of precipitation forecasts to ensemble covariance weight and localization scale is investigated. Additionally, the TLEn-Var is evaluated and compared to the ETKF(ensemble transformed Kalman filter)-based hybrid assimilation within a continuously cycling framework, through which new hybrid analyses are produced every 3 h over 10 days. The 24 h accumulated precipitation, moisture, wind are analyzed between 3DVar and the hybrid assimilation using time-lagged ensembles. Results show that model states and precipitation forecast skill are improved by the hybrid assimilation using time-lagged ensembles compared with 3DVar. Simulation of the precipitable water and structure of the wind are also improved. Cyclonic wind increments are generated near the rainfall center, leading to an improved precipitation forecast. This study indicates that the hybrid data assimilation using time-lagged ensembles seems like a viable alternative or supplement in the complex models for some weather service agencies that have limited computing resources to conduct large size of ensembles.
Selecting a climate model subset to optimise key ensemble properties
NASA Astrophysics Data System (ADS)
Herger, Nadja; Abramowitz, Gab; Knutti, Reto; Angélil, Oliver; Lehmann, Karsten; Sanderson, Benjamin M.
2018-02-01
End users studying impacts and risks caused by human-induced climate change are often presented with large multi-model ensembles of climate projections whose composition and size are arbitrarily determined. An efficient and versatile method that finds a subset which maintains certain key properties from the full ensemble is needed, but very little work has been done in this area. Therefore, users typically make their own somewhat subjective subset choices and commonly use the equally weighted model mean as a best estimate. However, different climate model simulations cannot necessarily be regarded as independent estimates due to the presence of duplicated code and shared development history. Here, we present an efficient and flexible tool that makes better use of the ensemble as a whole by finding a subset with improved mean performance compared to the multi-model mean while at the same time maintaining the spread and addressing the problem of model interdependence. Out-of-sample skill and reliability are demonstrated using model-as-truth experiments. This approach is illustrated with one set of optimisation criteria but we also highlight the flexibility of cost functions, depending on the focus of different users. The technique is useful for a range of applications that, for example, minimise present-day bias to obtain an accurate ensemble mean, reduce dependence in ensemble spread, maximise future spread, ensure good performance of individual models in an ensemble, reduce the ensemble size while maintaining important ensemble characteristics, or optimise several of these at the same time. As in any calibration exercise, the final ensemble is sensitive to the metric, observational product, and pre-processing steps used.
NASA Astrophysics Data System (ADS)
Higgins, S. M. W.; Du, H. L.; Smith, L. A.
2012-04-01
Ensemble forecasting on a lead time of seconds over several years generates a large forecast-outcome archive, which can be used to evaluate and weight "models". Challenges which arise as the archive becomes smaller are investigated: in weather forecasting one typically has only thousands of forecasts however those launched 6 hours apart are not independent of each other, nor is it justified to mix seasons with different dynamics. Seasonal forecasts, as from ENSEMBLES and DEMETER, typically have less than 64 unique launch dates; decadal forecasts less than eight, and long range climate forecasts arguably none. It is argued that one does not weight "models" so much as entire ensemble prediction systems (EPSs), and that the marginal value of an EPS will depend on the other members in the mix. The impact of using different skill scores is examined in the limits of both very large forecast-outcome archives (thereby evaluating the efficiency of the skill score) and in very small forecast-outcome archives (illustrating fundamental limitations due to sampling fluctuations and memory in the physical system being forecast). It is shown that blending with climatology (J. Bröcker and L.A. Smith, Tellus A, 60(4), 663-678, (2008)) tends to increase the robustness of the results; also a new kernel dressing methodology (simply insuring that the expected probability mass tends to lie outside the range of the ensemble) is illustrated. Fair comparisons using seasonal forecasts from the ENSEMBLES project are used to illustrate the importance of these results with fairly small archives. The robustness of these results across the range of small, moderate and huge archives is demonstrated using imperfect models of perfectly known nonlinear (chaotic) dynamical systems. The implications these results hold for distinguishing the skill of a forecast from its value to a user of the forecast are discussed.
Weighted projected networks: mapping hypergraphs to networks.
López, Eduardo
2013-05-01
Many natural, technological, and social systems incorporate multiway interactions, yet are characterized and measured on the basis of weighted pairwise interactions. In this article, I propose a family of models in which pairwise interactions originate from multiway interactions, by starting from ensembles of hypergraphs and applying projections that generate ensembles of weighted projected networks. I calculate analytically the statistical properties of weighted projected networks, and suggest ways these could be used beyond theoretical studies. Weighted projected networks typically exhibit weight disorder along links even for very simple generating hypergraph ensembles. Also, as the size of a hypergraph changes, a signature of multiway interaction emerges on the link weights of weighted projected networks that distinguishes them from fundamentally weighted pairwise networks. This signature could be used to search for hidden multiway interactions in weighted network data. I find the percolation threshold and size of the largest component for hypergraphs of arbitrary uniform rank, translate the results into projected networks, and show that the transition is second order. This general approach to network formation has the potential to shed new light on our understanding of weighted networks.
Supermodeling With A Global Atmospheric Model
NASA Astrophysics Data System (ADS)
Wiegerinck, Wim; Burgers, Willem; Selten, Frank
2013-04-01
In weather and climate prediction studies it often turns out to be the case that the multi-model ensemble mean prediction has the best prediction skill scores. One possible explanation is that the major part of the model error is random and is averaged out in the ensemble mean. In the standard multi-model ensemble approach, the models are integrated in time independently and the predicted states are combined a posteriori. Recently an alternative ensemble prediction approach has been proposed in which the models exchange information during the simulation and synchronize on a common solution that is closer to the truth than any of the individual model solutions in the standard multi-model ensemble approach or a weighted average of these. This approach is called the super modeling approach (SUMO). The potential of the SUMO approach has been demonstrated in the context of simple, low-order, chaotic dynamical systems. The information exchange takes the form of linear nudging terms in the dynamical equations that nudge the solution of each model to the solution of all other models in the ensemble. With a suitable choice of the connection strengths the models synchronize on a common solution that is indeed closer to the true system than any of the individual model solutions without nudging. This approach is called connected SUMO. An alternative approach is to integrate a weighted averaged model, weighted SUMO. At each time step all models in the ensemble calculate the tendency, these tendencies are weighted averaged and the state is integrated one time step into the future with this weighted averaged tendency. It was shown that in case the connected SUMO synchronizes perfectly, the connected SUMO follows the weighted averaged trajectory and both approaches yield the same solution. In this study we pioneer both approaches in the context of a global, quasi-geostrophic, three-level atmosphere model that is capable of simulating quite realistically the extra-tropical circulation in the Northern Hemisphere winter.
Enhanced reconstruction of weighted networks from strengths and degrees
NASA Astrophysics Data System (ADS)
Mastrandrea, Rossana; Squartini, Tiziano; Fagiolo, Giorgio; Garlaschelli, Diego
2014-04-01
Network topology plays a key role in many phenomena, from the spreading of diseases to that of financial crises. Whenever the whole structure of a network is unknown, one must resort to reconstruction methods that identify the least biased ensemble of networks consistent with the partial information available. A challenging case, frequently encountered due to privacy issues in the analysis of interbank flows and Big Data, is when there is only local (node-specific) aggregate information available. For binary networks, the relevant ensemble is one where the degree (number of links) of each node is constrained to its observed value. However, for weighted networks the problem is much more complicated. While the naïve approach prescribes to constrain the strengths (total link weights) of all nodes, recent counter-intuitive results suggest that in weighted networks the degrees are often more informative than the strengths. This implies that the reconstruction of weighted networks would be significantly enhanced by the specification of both strengths and degrees, a computationally hard and bias-prone procedure. Here we solve this problem by introducing an analytical and unbiased maximum-entropy method that works in the shortest possible time and does not require the explicit generation of reconstructed samples. We consider several real-world examples and show that, while the strengths alone give poor results, the additional knowledge of the degrees yields accurately reconstructed networks. Information-theoretic criteria rigorously confirm that the degree sequence, as soon as it is non-trivial, is irreducible to the strength sequence. Our results have strong implications for the analysis of motifs and communities and whenever the reconstructed ensemble is required as a null model to detect higher-order patterns.
Synoptic Factors Affecting Structure Predictability of Hurricane Alex (2016)
NASA Astrophysics Data System (ADS)
Gonzalez-Aleman, J. J.; Evans, J. L.; Kowaleski, A. M.
2016-12-01
On January 7, 2016, a disturbance formed over the western North Atlantic basin. After undergoing tropical transition, the system became the first hurricane of 2016 - and the first North Atlantic hurricane to form in January since 1938. Already an extremely rare hurricane event, Alex then underwent extratropical transition [ET] just north of the Azores Islands. We examine the factors affecting Alex's structural evolution through a new technique called path-clustering. In this way, 51 ensembles from the European Centre for Medium-Range Weather Forecasts Ensemble Prediction System (ECMWF-EPS) are grouped based on similarities in the storm's path through the Cyclone Phase Space (CPS). The differing clusters group various possible scenarios of structural development represented in the ensemble forecasts. As a result, it is possible to shed light on the role of the synoptic scale in changing the structure of this hurricane in the midlatitudes through intercomparison of the most "realistic" forecast of the evolution of Alex and the other physically plausible modes of its development.
Entanglement between two spatially separated atomic modes
NASA Astrophysics Data System (ADS)
Lange, Karsten; Peise, Jan; Lücke, Bernd; Kruse, Ilka; Vitagliano, Giuseppe; Apellaniz, Iagoba; Kleinmann, Matthias; Tóth, Géza; Klempt, Carsten
2018-04-01
Modern quantum technologies in the fields of quantum computing, quantum simulation, and quantum metrology require the creation and control of large ensembles of entangled particles. In ultracold ensembles of neutral atoms, nonclassical states have been generated with mutual entanglement among thousands of particles. The entanglement generation relies on the fundamental particle-exchange symmetry in ensembles of identical particles, which lacks the standard notion of entanglement between clearly definable subsystems. Here, we present the generation of entanglement between two spatially separated clouds by splitting an ensemble of ultracold identical particles prepared in a twin Fock state. Because the clouds can be addressed individually, our experiments open a path to exploit the available entangled states of indistinguishable particles for quantum information applications.
NASA Astrophysics Data System (ADS)
Sanderson, B. M.
2017-12-01
The CMIP ensembles represent the most comprehensive source of information available to decision-makers for climate adaptation, yet it is clear that there are fundamental limitations in our ability to treat the ensemble as an unbiased sample of possible future climate trajectories. There is considerable evidence that models are not independent, and increasing complexity and resolution combined with computational constraints prevent a thorough exploration of parametric uncertainty or internal variability. Although more data than ever is available for calibration, the optimization of each model is influenced by institutional priorities, historical precedent and available resources. The resulting ensemble thus represents a miscellany of climate simulators which defy traditional statistical interpretation. Models are in some cases interdependent, but are sufficiently complex that the degree of interdependency is conditional on the application. Configurations have been updated using available observations to some degree, but not in a consistent or easily identifiable fashion. This means that the ensemble cannot be viewed as a true posterior distribution updated by available data, but nor can observational data alone be used to assess individual model likelihood. We assess recent literature for combining projections from an imperfect ensemble of climate simulators. Beginning with our published methodology for addressing model interdependency and skill in the weighting scheme for the 4th US National Climate Assessment, we consider strategies for incorporating process-based constraints on future response, perturbed parameter experiments and multi-model output into an integrated framework. We focus on a number of guiding questions: Is the traditional framework of confidence in projections inferred from model agreement leading to biased or misleading conclusions? Can the benefits of upweighting skillful models be reconciled with the increased risk of truth lying outside the weighted ensemble distribution? If CMIP is an ensemble of partially informed best-guesses, can we infer anything about the parent distribution of all possible models of the climate system (and if not, are we implicitly under-representing the risk of a climate catastrophe outside of the envelope of CMIP simulations)?
Bayesian quantitative precipitation forecasts in terms of quantiles
NASA Astrophysics Data System (ADS)
Bentzien, Sabrina; Friederichs, Petra
2014-05-01
Ensemble prediction systems (EPS) for numerical weather predictions on the mesoscale are particularly developed to obtain probabilistic guidance for high impact weather. An EPS not only issues a deterministic future state of the atmosphere but a sample of possible future states. Ensemble postprocessing then translates such a sample of forecasts into probabilistic measures. This study focus on probabilistic quantitative precipitation forecasts in terms of quantiles. Quantiles are particular suitable to describe precipitation at various locations, since no assumption is required on the distribution of precipitation. The focus is on the prediction during high-impact events and related to the Volkswagen Stiftung funded project WEX-MOP (Mesoscale Weather Extremes - Theory, Spatial Modeling and Prediction). Quantile forecasts are derived from the raw ensemble and via quantile regression. Neighborhood method and time-lagging are effective tools to inexpensively increase the ensemble spread, which results in more reliable forecasts especially for extreme precipitation events. Since an EPS provides a large amount of potentially informative predictors, a variable selection is required in order to obtain a stable statistical model. A Bayesian formulation of quantile regression allows for inference about the selection of predictive covariates by the use of appropriate prior distributions. Moreover, the implementation of an additional process layer for the regression parameters accounts for spatial variations of the parameters. Bayesian quantile regression and its spatially adaptive extension is illustrated for the German-focused mesoscale weather prediction ensemble COSMO-DE-EPS, which runs (pre)operationally since December 2010 at the German Meteorological Service (DWD). Objective out-of-sample verification uses the quantile score (QS), a weighted absolute error between quantile forecasts and observations. The QS is a proper scoring function and can be decomposed into reliability, resolutions and uncertainty parts. A quantile reliability plot gives detailed insights in the predictive performance of the quantile forecasts.
An Optimal Estimation Method to Obtain Surface Layer Turbulent Fluxes from Profile Measurements
NASA Astrophysics Data System (ADS)
Kang, D.
2015-12-01
In the absence of direct turbulence measurements, the turbulence characteristics of the atmospheric surface layer are often derived from measurements of the surface layer mean properties based on Monin-Obukhov Similarity Theory (MOST). This approach requires two levels of the ensemble mean wind, temperature, and water vapor, from which the fluxes of momentum, sensible heat, and water vapor can be obtained. When only one measurement level is available, the roughness heights and the assumed properties of the corresponding variables at the respective roughness heights are used. In practice, the temporal mean with large number of samples are used in place of the ensemble mean. However, in many situations the samples of data are taken from multiple levels. It is thus desirable to derive the boundary layer flux properties using all measurements. In this study, we used an optimal estimation approach to derive surface layer properties based on all available measurements. This approach assumes that the samples are taken from a population whose ensemble mean profile follows the MOST. An optimized estimate is obtained when the results yield a minimum cost function defined as a weighted summation of all error variance at each sample altitude. The weights are based one sample data variance and the altitude of the measurements. This method was applied to measurements in the marine atmospheric surface layer from a small boat using radiosonde on a tethered balloon where temperature and relative humidity profiles in the lowest 50 m were made repeatedly in about 30 minutes. We will present the resultant fluxes and the derived MOST mean profiles using different sets of measurements. The advantage of this method over the 'traditional' methods will be illustrated. Some limitations of this optimization method will also be discussed. Its application to quantify the effects of marine surface layer environment on radar and communication signal propagation will be shown as well.
Sanchez-Martinez, M; Crehuet, R
2014-12-21
We present a method based on the maximum entropy principle that can re-weight an ensemble of protein structures based on data from residual dipolar couplings (RDCs). The RDCs of intrinsically disordered proteins (IDPs) provide information on the secondary structure elements present in an ensemble; however even two sets of RDCs are not enough to fully determine the distribution of conformations, and the force field used to generate the structures has a pervasive influence on the refined ensemble. Two physics-based coarse-grained force fields, Profasi and Campari, are able to predict the secondary structure elements present in an IDP, but even after including the RDC data, the re-weighted ensembles differ between both force fields. Thus the spread of IDP ensembles highlights the need for better force fields. We distribute our algorithm in an open-source Python code.
SSAGES: Software Suite for Advanced General Ensemble Simulations.
Sidky, Hythem; Colón, Yamil J; Helfferich, Julian; Sikora, Benjamin J; Bezik, Cody; Chu, Weiwei; Giberti, Federico; Guo, Ashley Z; Jiang, Xikai; Lequieu, Joshua; Li, Jiyuan; Moller, Joshua; Quevillon, Michael J; Rahimi, Mohammad; Ramezani-Dakhel, Hadi; Rathee, Vikramjit S; Reid, Daniel R; Sevgen, Emre; Thapar, Vikram; Webb, Michael A; Whitmer, Jonathan K; de Pablo, Juan J
2018-01-28
Molecular simulation has emerged as an essential tool for modern-day research, but obtaining proper results and making reliable conclusions from simulations requires adequate sampling of the system under consideration. To this end, a variety of methods exist in the literature that can enhance sampling considerably, and increasingly sophisticated, effective algorithms continue to be developed at a rapid pace. Implementation of these techniques, however, can be challenging for experts and non-experts alike. There is a clear need for software that provides rapid, reliable, and easy access to a wide range of advanced sampling methods and that facilitates implementation of new techniques as they emerge. Here we present SSAGES, a publicly available Software Suite for Advanced General Ensemble Simulations designed to interface with multiple widely used molecular dynamics simulations packages. SSAGES allows facile application of a variety of enhanced sampling techniques-including adaptive biasing force, string methods, and forward flux sampling-that extract meaningful free energy and transition path data from all-atom and coarse-grained simulations. A noteworthy feature of SSAGES is a user-friendly framework that facilitates further development and implementation of new methods and collective variables. In this work, the use of SSAGES is illustrated in the context of simple representative applications involving distinct methods and different collective variables that are available in the current release of the suite. The code may be found at: https://github.com/MICCoM/SSAGES-public.
NASA Astrophysics Data System (ADS)
Forrester, Peter J.; Trinh, Allan K.
2018-05-01
The neighbourhood of the largest eigenvalue λmax in the Gaussian unitary ensemble (GUE) and Laguerre unitary ensemble (LUE) is referred to as the soft edge. It is known that there exists a particular centring and scaling such that the distribution of λmax tends to a universal form, with an error term bounded by 1/N2/3. We take up the problem of computing the exact functional form of the leading error term in a large N asymptotic expansion for both the GUE and LUE—two versions of the LUE are considered, one with the parameter a fixed and the other with a proportional to N. Both settings in the LUE case allow for an interpretation in terms of the distribution of a particular weighted path length in a model involving exponential variables on a rectangular grid, as the grid size gets large. We give operator theoretic forms of the corrections, which are corollaries of knowledge of the first two terms in the large N expansion of the scaled kernel and are readily computed using a method due to Bornemann. We also give expressions in terms of the solutions of particular systems of coupled differential equations, which provide an alternative method of computation. Both characterisations are well suited to a thinned generalisation of the original ensemble, whereby each eigenvalue is deleted independently with probability (1 - ξ). In Sec. V, we investigate using simulation the question of whether upon an appropriate centring and scaling a wider class of complex Hermitian random matrix ensembles have their leading correction to the distribution of λmax proportional to 1/N2/3.
Ensemble-type numerical uncertainty information from single model integrations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rauser, Florian, E-mail: florian.rauser@mpimet.mpg.de; Marotzke, Jochem; Korn, Peter
2015-07-01
We suggest an algorithm that quantifies the discretization error of time-dependent physical quantities of interest (goals) for numerical models of geophysical fluid dynamics. The goal discretization error is estimated using a sum of weighted local discretization errors. The key feature of our algorithm is that these local discretization errors are interpreted as realizations of a random process. The random process is determined by the model and the flow state. From a class of local error random processes we select a suitable specific random process by integrating the model over a short time interval at different resolutions. The weights of themore » influences of the local discretization errors on the goal are modeled as goal sensitivities, which are calculated via automatic differentiation. The integration of the weighted realizations of local error random processes yields a posterior ensemble of goal approximations from a single run of the numerical model. From the posterior ensemble we derive the uncertainty information of the goal discretization error. This algorithm bypasses the requirement of detailed knowledge about the models discretization to generate numerical error estimates. The algorithm is evaluated for the spherical shallow-water equations. For two standard test cases we successfully estimate the error of regional potential energy, track its evolution, and compare it to standard ensemble techniques. The posterior ensemble shares linear-error-growth properties with ensembles of multiple model integrations when comparably perturbed. The posterior ensemble numerical error estimates are of comparable size as those of a stochastic physics ensemble.« less
Ensemble predictive model for more accurate soil organic carbon spectroscopic estimation
NASA Astrophysics Data System (ADS)
Vašát, Radim; Kodešová, Radka; Borůvka, Luboš
2017-07-01
A myriad of signal pre-processing strategies and multivariate calibration techniques has been explored in attempt to improve the spectroscopic prediction of soil organic carbon (SOC) over the last few decades. Therefore, to come up with a novel, more powerful, and accurate predictive approach to beat the rank becomes a challenging task. However, there may be a way, so that combine several individual predictions into a single final one (according to ensemble learning theory). As this approach performs best when combining in nature different predictive algorithms that are calibrated with structurally different predictor variables, we tested predictors of two different kinds: 1) reflectance values (or transforms) at each wavelength and 2) absorption feature parameters. Consequently we applied four different calibration techniques, two per each type of predictors: a) partial least squares regression and support vector machines for type 1, and b) multiple linear regression and random forest for type 2. The weights to be assigned to individual predictions within the ensemble model (constructed as a weighted average) were determined by an automated procedure that ensured the best solution among all possible was selected. The approach was tested at soil samples taken from surface horizon of four sites differing in the prevailing soil units. By employing the ensemble predictive model the prediction accuracy of SOC improved at all four sites. The coefficient of determination in cross-validation (R2cv) increased from 0.849, 0.611, 0.811 and 0.644 (the best individual predictions) to 0.864, 0.650, 0.824 and 0.698 for Site 1, 2, 3 and 4, respectively. Generally, the ensemble model affected the final prediction so that the maximal deviations of predicted vs. observed values of the individual predictions were reduced, and thus the correlation cloud became thinner as desired.
Entropy-based link prediction in weighted networks
NASA Astrophysics Data System (ADS)
Xu, Zhongqi; Pu, Cunlai; Ramiz Sharafat, Rajput; Li, Lunbo; Yang, Jian
2017-01-01
Information entropy has been proved to be an effective tool to quantify the structural importance of complex networks. In the previous work (Xu et al, 2016 \\cite{xu2016}), we measure the contribution of a path in link prediction with information entropy. In this paper, we further quantify the contribution of a path with both path entropy and path weight, and propose a weighted prediction index based on the contributions of paths, namely Weighted Path Entropy (WPE), to improve the prediction accuracy in weighted networks. Empirical experiments on six weighted real-world networks show that WPE achieves higher prediction accuracy than three typical weighted indices.
Zhu, Guanhua; Liu, Wei; Bao, Chenglong; Tong, Dudu; Ji, Hui; Shen, Zuowei; Yang, Daiwen; Lu, Lanyuan
2018-05-01
The structural variations of multidomain proteins with flexible parts mediate many biological processes, and a structure ensemble can be determined by selecting a weighted combination of representative structures from a simulated structure pool, producing the best fit to experimental constraints such as interatomic distance. In this study, a hybrid structure-based and physics-based atomistic force field with an efficient sampling strategy is adopted to simulate a model di-domain protein against experimental paramagnetic relaxation enhancement (PRE) data that correspond to distance constraints. The molecular dynamics simulations produce a wide range of conformations depicted on a protein energy landscape. Subsequently, a conformational ensemble recovered with low-energy structures and the minimum-size restraint is identified in good agreement with experimental PRE rates, and the result is also supported by chemical shift perturbations and small-angle X-ray scattering data. It is illustrated that the regularizations of energy and ensemble-size prevent an arbitrary interpretation of protein conformations. Moreover, energy is found to serve as a critical control to refine the structure pool and prevent data overfitting, because the absence of energy regularization exposes ensemble construction to the noise from high-energy structures and causes a more ambiguous representation of protein conformations. Finally, we perform structure-ensemble optimizations with a topology-based structure pool, to enhance the understanding on the ensemble results from different sources of pool candidates. © 2018 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Yin, Dong-shan; Gao, Yu-ping; Zhao, Shu-hong
2017-07-01
Millisecond pulsars can generate another type of time scale that is totally independent of the atomic time scale, because the physical mechanisms of the pulsar time scale and the atomic time scale are quite different from each other. Usually the pulsar timing observations are not evenly sampled, and the internals between two data points range from several hours to more than half a month. Further more, these data sets are sparse. All this makes it difficult to generate an ensemble pulsar time scale. Hence, a new algorithm to calculate the ensemble pulsar time scale is proposed. Firstly, a cubic spline interpolation is used to densify the data set, and make the intervals between data points uniform. Then, the Vondrak filter is employed to smooth the data set, and get rid of the high-frequency noises, and finally the weighted average method is adopted to generate the ensemble pulsar time scale. The newly released NANOGRAV (North American Nanohertz Observatory for Gravitational Waves) 9-year data set is used to generate the ensemble pulsar time scale. This data set includes the 9-year observational data of 37 millisecond pulsars observed by the 100-meter Green Bank telescope and the 305-meter Arecibo telescope. It is found that the algorithm used in this paper can reduce effectively the influence caused by the noises in pulsar timing residuals, and improve the long-term stability of the ensemble pulsar time scale. Results indicate that the long-term (> 1 yr) stability of the ensemble pulsar time scale is better than 3.4 × 10-15.
An Ensemble-Based Smoother with Retrospectively Updated Weights for Highly Nonlinear Systems
NASA Technical Reports Server (NTRS)
Chin, T. M.; Turmon, M. J.; Jewell, J. B.; Ghil, M.
2006-01-01
Monte Carlo computational methods have been introduced into data assimilation for nonlinear systems in order to alleviate the computational burden of updating and propagating the full probability distribution. By propagating an ensemble of representative states, algorithms like the ensemble Kalman filter (EnKF) and the resampled particle filter (RPF) rely on the existing modeling infrastructure to approximate the distribution based on the evolution of this ensemble. This work presents an ensemble-based smoother that is applicable to the Monte Carlo filtering schemes like EnKF and RPF. At the minor cost of retrospectively updating a set of weights for ensemble members, this smoother has demonstrated superior capabilities in state tracking for two highly nonlinear problems: the double-well potential and trivariate Lorenz systems. The algorithm does not require retrospective adaptation of the ensemble members themselves, and it is thus suited to a streaming operational mode. The accuracy of the proposed backward-update scheme in estimating non-Gaussian distributions is evaluated by comparison to the more accurate estimates provided by a Markov chain Monte Carlo algorithm.
NASA Astrophysics Data System (ADS)
Lahmiri, Salim; Boukadoum, Mounir
2015-08-01
We present a new ensemble system for stock market returns prediction where continuous wavelet transform (CWT) is used to analyze return series and backpropagation neural networks (BPNNs) for processing CWT-based coefficients, determining the optimal ensemble weights, and providing final forecasts. Particle swarm optimization (PSO) is used for finding optimal weights and biases for each BPNN. To capture symmetry/asymmetry in the underlying data, three wavelet functions with different shapes are adopted. The proposed ensemble system was tested on three Asian stock markets: The Hang Seng, KOSPI, and Taiwan stock market data. Three statistical metrics were used to evaluate the forecasting accuracy; including, mean of absolute errors (MAE), root mean of squared errors (RMSE), and mean of absolute deviations (MADs). Experimental results showed that our proposed ensemble system outperformed the individual CWT-ANN models each with different wavelet function. In addition, the proposed ensemble system outperformed the conventional autoregressive moving average process. As a result, the proposed ensemble system is suitable to capture symmetry/asymmetry in financial data fluctuations for better prediction accuracy.
SSAGES: Software Suite for Advanced General Ensemble Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sidky, Hythem; Colón, Yamil J.; Helfferich, Julian
Molecular simulation has emerged as an essential tool for modern-day research, but obtaining proper results and making reliable conclusions from simulations requires adequate sampling of the system under consideration. To this end, a variety of methods exist in the literature that can enhance sampling considerably, and increasingly sophisticated, effective algorithms continue to be developed at a rapid pace. Implementation of these techniques, however, can be challenging for experts and non-experts alike. There is a clear need for software that provides rapid, reliable, and easy access to a wide range of advanced sampling methods, and that facilitates implementation of new techniquesmore » as they emerge. Here we present SSAGES, a publicly available Software Suite for Advanced General Ensemble Simulations designed to interface with multiple widely used molecular dynamics simulations packages. SSAGES allows facile application of a variety of enhanced sampling techniques—including adaptive biasing force, string methods, and forward flux sampling—that extract meaningful free energy and transition path data from all-atom and coarse grained simulations. A noteworthy feature of SSAGES is a user-friendly framework that facilitates further development and implementation of new methods and collective variables. In this work, the use of SSAGES is illustrated in the context of simple representative applications involving distinct methods and different collective variables that are available in the current release of the suite.« less
SSAGES: Software Suite for Advanced General Ensemble Simulations
NASA Astrophysics Data System (ADS)
Sidky, Hythem; Colón, Yamil J.; Helfferich, Julian; Sikora, Benjamin J.; Bezik, Cody; Chu, Weiwei; Giberti, Federico; Guo, Ashley Z.; Jiang, Xikai; Lequieu, Joshua; Li, Jiyuan; Moller, Joshua; Quevillon, Michael J.; Rahimi, Mohammad; Ramezani-Dakhel, Hadi; Rathee, Vikramjit S.; Reid, Daniel R.; Sevgen, Emre; Thapar, Vikram; Webb, Michael A.; Whitmer, Jonathan K.; de Pablo, Juan J.
2018-01-01
Molecular simulation has emerged as an essential tool for modern-day research, but obtaining proper results and making reliable conclusions from simulations requires adequate sampling of the system under consideration. To this end, a variety of methods exist in the literature that can enhance sampling considerably, and increasingly sophisticated, effective algorithms continue to be developed at a rapid pace. Implementation of these techniques, however, can be challenging for experts and non-experts alike. There is a clear need for software that provides rapid, reliable, and easy access to a wide range of advanced sampling methods and that facilitates implementation of new techniques as they emerge. Here we present SSAGES, a publicly available Software Suite for Advanced General Ensemble Simulations designed to interface with multiple widely used molecular dynamics simulations packages. SSAGES allows facile application of a variety of enhanced sampling techniques—including adaptive biasing force, string methods, and forward flux sampling—that extract meaningful free energy and transition path data from all-atom and coarse-grained simulations. A noteworthy feature of SSAGES is a user-friendly framework that facilitates further development and implementation of new methods and collective variables. In this work, the use of SSAGES is illustrated in the context of simple representative applications involving distinct methods and different collective variables that are available in the current release of the suite. The code may be found at: https://github.com/MICCoM/SSAGES-public.
Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.
Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G
2017-09-01
To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.
Weighted Ensemble Simulation: Review of Methodology, Applications, and Software
Zuckerman, Daniel M.; Chong, Lillian T.
2018-01-01
The weighted ensemble (WE) methodology orchestrates quasi-independent parallel simulations run with intermittent communication that can enhance sampling of rare events such as protein conformational changes, folding, and binding. The WE strategy can achieve superlinear scaling—the unbiased estimation of key observables such as rate constants and equilibrium state populations to greater precision than would be possible with ordinary parallel simulation. WE software can be used to control any dynamics engine, such as standard molecular dynamics and cell-modeling packages. This article reviews the theoretical basis of WE and goes on to describe successful applications to a number of complex biological processes—protein conformational transitions, (un)binding, and assembly processes, as well as cell-scale processes in systems biology. We furthermore discuss the challenges that need to be overcome in the next phase of WE methodological development. Overall, the combined advances in WE methodology and software have enabled the simulation of long-timescale processes that would otherwise not be practical on typical computing resources using standard simulation. PMID:28301772
Weighted Ensemble Simulation: Review of Methodology, Applications, and Software.
Zuckerman, Daniel M; Chong, Lillian T
2017-05-22
The weighted ensemble (WE) methodology orchestrates quasi-independent parallel simulations run with intermittent communication that can enhance sampling of rare events such as protein conformational changes, folding, and binding. The WE strategy can achieve superlinear scaling-the unbiased estimation of key observables such as rate constants and equilibrium state populations to greater precision than would be possible with ordinary parallel simulation. WE software can be used to control any dynamics engine, such as standard molecular dynamics and cell-modeling packages. This article reviews the theoretical basis of WE and goes on to describe successful applications to a number of complex biological processes-protein conformational transitions, (un)binding, and assembly processes, as well as cell-scale processes in systems biology. We furthermore discuss the challenges that need to be overcome in the next phase of WE methodological development. Overall, the combined advances in WE methodology and software have enabled the simulation of long-timescale processes that would otherwise not be practical on typical computing resources using standard simulation.
Oreb, Goran; Ruzić, Lana; Matković, Branka; Misigoj-Duraković, Marjeta; Vlasić, Jadranka; Ciliga, Dubravka
2006-06-01
The study investigated differences in morphological, motor and functional abilities between folk and ballet dancers. The sample comprised 51 female subjects: Croatian National Ballet (N=30) and Croatian National Folk Ensemble "LADO" (N=21). The data regarding menstrual cycle, menarche, number of births and smoking habit were collected and the morphological, motor and functional abilities measured. Significant correlations between the amount of fat tissue and number of births were found in both groups. Folk dancers were as tall as ballet dancers but weighted more and had a larger body frame (p<0.001). Ballet dancers were more flexible but there were no differences in absolute maximal oxygen uptake (2.65 vs. 2.35 L/min, p=0.101). Still, as the ballet dancers weighted less, their relative maximal oxygen uptake was significantly higher (37.62 vs. 50.22 mL/kg/min, p<0.001). Also, a high number of 45% of smokers among professional ballet and professional folk dancers was found.
NASA Astrophysics Data System (ADS)
Zhang, Chuan-Biao; Ming, Li; Xin, Zhou
2015-12-01
Ensemble simulations, which use multiple short independent trajectories from dispersive initial conformations, rather than a single long trajectory as used in traditional simulations, are expected to sample complex systems such as biomolecules much more efficiently. The re-weighted ensemble dynamics (RED) is designed to combine these short trajectories to reconstruct the global equilibrium distribution. In the RED, a number of conformational functions, named as basis functions, are applied to relate these trajectories to each other, then a detailed-balance-based linear equation is built, whose solution provides the weights of these trajectories in equilibrium distribution. Thus, the sufficient and efficient selection of basis functions is critical to the practical application of RED. Here, we review and present a few possible ways to generally construct basis functions for applying the RED in complex molecular systems. Especially, for systems with less priori knowledge, we could generally use the root mean squared deviation (RMSD) among conformations to split the whole conformational space into a set of cells, then use the RMSD-based-cell functions as basis functions. We demonstrate the application of the RED in typical systems, including a two-dimensional toy model, the lattice Potts model, and a short peptide system. The results indicate that the RED with the constructions of basis functions not only more efficiently sample the complex systems, but also provide a general way to understand the metastable structure of conformational space. Project supported by the National Natural Science Foundation of China (Grant No. 11175250).
Competitive Learning Neural Network Ensemble Weighted by Predicted Performance
ERIC Educational Resources Information Center
Ye, Qiang
2010-01-01
Ensemble approaches have been shown to enhance classification by combining the outputs from a set of voting classifiers. Diversity in error patterns among base classifiers promotes ensemble performance. Multi-task learning is an important characteristic for Neural Network classifiers. Introducing a secondary output unit that receives different…
Arshad, Sannia; Rho, Seungmin
2014-01-01
We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes. PMID:25295302
Khalid, Shehzad; Arshad, Sannia; Jabbar, Sohail; Rho, Seungmin
2014-01-01
We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes.
Mixture EMOS model for calibrating ensemble forecasts of wind speed.
Baran, S; Lerch, S
2016-03-01
Ensemble model output statistics (EMOS) is a statistical tool for post-processing forecast ensembles of weather variables obtained from multiple runs of numerical weather prediction models in order to produce calibrated predictive probability density functions. The EMOS predictive probability density function is given by a parametric distribution with parameters depending on the ensemble forecasts. We propose an EMOS model for calibrating wind speed forecasts based on weighted mixtures of truncated normal (TN) and log-normal (LN) distributions where model parameters and component weights are estimated by optimizing the values of proper scoring rules over a rolling training period. The new model is tested on wind speed forecasts of the 50 member European Centre for Medium-range Weather Forecasts ensemble, the 11 member Aire Limitée Adaptation dynamique Développement International-Hungary Ensemble Prediction System ensemble of the Hungarian Meteorological Service, and the eight-member University of Washington mesoscale ensemble, and its predictive performance is compared with that of various benchmark EMOS models based on single parametric families and combinations thereof. The results indicate improved calibration of probabilistic and accuracy of point forecasts in comparison with the raw ensemble and climatological forecasts. The mixture EMOS model significantly outperforms the TN and LN EMOS methods; moreover, it provides better calibrated forecasts than the TN-LN combination model and offers an increased flexibility while avoiding covariate selection problems. © 2016 The Authors Environmetrics Published by JohnWiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Tang, Jian; Qiao, Junfei; Wu, ZhiWei; Chai, Tianyou; Zhang, Jian; Yu, Wen
2018-01-01
Frequency spectral data of mechanical vibration and acoustic signals relate to difficult-to-measure production quality and quantity parameters of complex industrial processes. A selective ensemble (SEN) algorithm can be used to build a soft sensor model of these process parameters by fusing valued information selectively from different perspectives. However, a combination of several optimized ensemble sub-models with SEN cannot guarantee the best prediction model. In this study, we use several techniques to construct mechanical vibration and acoustic frequency spectra of a data-driven industrial process parameter model based on selective fusion multi-condition samples and multi-source features. Multi-layer SEN (MLSEN) strategy is used to simulate the domain expert cognitive process. Genetic algorithm and kernel partial least squares are used to construct the inside-layer SEN sub-model based on each mechanical vibration and acoustic frequency spectral feature subset. Branch-and-bound and adaptive weighted fusion algorithms are integrated to select and combine outputs of the inside-layer SEN sub-models. Then, the outside-layer SEN is constructed. Thus, "sub-sampling training examples"-based and "manipulating input features"-based ensemble construction methods are integrated, thereby realizing the selective information fusion process based on multi-condition history samples and multi-source input features. This novel approach is applied to a laboratory-scale ball mill grinding process. A comparison with other methods indicates that the proposed MLSEN approach effectively models mechanical vibration and acoustic signals.
NASA Astrophysics Data System (ADS)
Pollard, David; Chang, Won; Haran, Murali; Applegate, Patrick; DeConto, Robert
2016-05-01
A 3-D hybrid ice-sheet model is applied to the last deglacial retreat of the West Antarctic Ice Sheet over the last ˜ 20 000 yr. A large ensemble of 625 model runs is used to calibrate the model to modern and geologic data, including reconstructed grounding lines, relative sea-level records, elevation-age data and uplift rates, with an aggregate score computed for each run that measures overall model-data misfit. Two types of statistical methods are used to analyze the large-ensemble results: simple averaging weighted by the aggregate score, and more advanced Bayesian techniques involving Gaussian process-based emulation and calibration, and Markov chain Monte Carlo. The analyses provide sea-level-rise envelopes with well-defined parametric uncertainty bounds, but the simple averaging method only provides robust results with full-factorial parameter sampling in the large ensemble. Results for best-fit parameter ranges and envelopes of equivalent sea-level rise with the simple averaging method agree well with the more advanced techniques. Best-fit parameter ranges confirm earlier values expected from prior model tuning, including large basal sliding coefficients on modern ocean beds.
Path analysis of risk factors leading to premature birth.
Fields, S J; Livshits, G; Sirotta, L; Merlob, P
1996-01-01
The present study tested whether various sociodemographic, anthropometric, behavioral, and medical/physiological factors act in a direct or indirect manner on the risk of prematurity using path analysis on a sample of Israeli births. The path model shows that medical complications, primarily toxemia, chorioammionitis, and a previous low birth weight delivery directly and significantly act on the risk of prematurity as do low maternal pregnancy weight gain and ethnicity. Other medical complications, including chronic hypertension, preclampsia, and placental abruption, although significantly correlated with prematurity, act indirectly on prematurity through toxemia. The model further shows that the commonly accepted sociodemographic, anthropometric, and behavioral risk factors act by modifying the development of medical complications that lead to prematurity as opposed to having a direct effect on premature delivery. © 1996 Wiley-Liss, Inc. Copyright © 1996 Wiley-Liss, Inc.
Jung, Wonmo; Bülthoff, Isabelle; Armann, Regine G M
2017-11-01
The brain can only attend to a fraction of all the information that is entering the visual system at any given moment. One way of overcoming the so-called bottleneck of selective attention (e.g., J. M. Wolfe, Võ, Evans, & Greene, 2011) is to make use of redundant visual information and extract summarized statistical information of the whole visual scene. Such ensemble representation occurs for low-level features of textures or simple objects, but it has also been reported for complex high-level properties. While the visual system has, for example, been shown to compute summary representations of facial expression, gender, or identity, it is less clear whether perceptual input from all parts of the visual field contributes equally to the ensemble percept. Here we extend the line of ensemble-representation research into the realm of race and look at the possibility that ensemble perception relies on weighting visual information differently depending on its origin from either the fovea or the visual periphery. We find that observers can judge the mean race of a set of faces, similar to judgments of mean emotion from faces and ensemble representations in low-level domains of visual processing. We also find that while peripheral faces seem to be taken into account for the ensemble percept, far more weight is given to stimuli presented foveally than peripherally. Whether this precision weighting of information stems from differences in the accuracy with which the visual system processes information across the visual field or from statistical inferences about the world needs to be determined by further research.
AWE-WQ: fast-forwarding molecular dynamics using the accelerated weighted ensemble.
Abdul-Wahid, Badi'; Feng, Haoyun; Rajan, Dinesh; Costaouec, Ronan; Darve, Eric; Thain, Douglas; Izaguirre, Jesús A
2014-10-27
A limitation of traditional molecular dynamics (MD) is that reaction rates are difficult to compute. This is due to the rarity of observing transitions between metastable states since high energy barriers trap the system in these states. Recently the weighted ensemble (WE) family of methods have emerged which can flexibly and efficiently sample conformational space without being trapped and allow calculation of unbiased rates. However, while WE can sample correctly and efficiently, a scalable implementation applicable to interesting biomolecular systems is not available. We provide here a GPLv2 implementation called AWE-WQ of a WE algorithm using the master/worker distributed computing WorkQueue (WQ) framework. AWE-WQ is scalable to thousands of nodes and supports dynamic allocation of computer resources, heterogeneous resource usage (such as central processing units (CPU) and graphical processing units (GPUs) concurrently), seamless heterogeneous cluster usage (i.e., campus grids and cloud providers), and support for arbitrary MD codes such as GROMACS, while ensuring that all statistics are unbiased. We applied AWE-WQ to a 34 residue protein which simulated 1.5 ms over 8 months with peak aggregate performance of 1000 ns/h. Comparison was done with a 200 μs simulation collected on a GPU over a similar timespan. The folding and unfolded rates were of comparable accuracy.
AWE-WQ: Fast-Forwarding Molecular Dynamics Using the Accelerated Weighted Ensemble
2015-01-01
A limitation of traditional molecular dynamics (MD) is that reaction rates are difficult to compute. This is due to the rarity of observing transitions between metastable states since high energy barriers trap the system in these states. Recently the weighted ensemble (WE) family of methods have emerged which can flexibly and efficiently sample conformational space without being trapped and allow calculation of unbiased rates. However, while WE can sample correctly and efficiently, a scalable implementation applicable to interesting biomolecular systems is not available. We provide here a GPLv2 implementation called AWE-WQ of a WE algorithm using the master/worker distributed computing WorkQueue (WQ) framework. AWE-WQ is scalable to thousands of nodes and supports dynamic allocation of computer resources, heterogeneous resource usage (such as central processing units (CPU) and graphical processing units (GPUs) concurrently), seamless heterogeneous cluster usage (i.e., campus grids and cloud providers), and support for arbitrary MD codes such as GROMACS, while ensuring that all statistics are unbiased. We applied AWE-WQ to a 34 residue protein which simulated 1.5 ms over 8 months with peak aggregate performance of 1000 ns/h. Comparison was done with a 200 μs simulation collected on a GPU over a similar timespan. The folding and unfolded rates were of comparable accuracy. PMID:25207854
PhytoPath: an integrative resource for plant pathogen genomics.
Pedro, Helder; Maheswari, Uma; Urban, Martin; Irvine, Alistair George; Cuzick, Alayne; McDowall, Mark D; Staines, Daniel M; Kulesha, Eugene; Hammond-Kosack, Kim Elizabeth; Kersey, Paul Julian
2016-01-04
PhytoPath (www.phytopathdb.org) is a resource for genomic and phenotypic data from plant pathogen species, that integrates phenotypic data for genes from PHI-base, an expertly curated catalog of genes with experimentally verified pathogenicity, with the Ensembl tools for data visualization and analysis. The resource is focused on fungi, protists (oomycetes) and bacterial plant pathogens that have genomes that have been sequenced and annotated. Genes with associated PHI-base data can be easily identified across all plant pathogen species using a BioMart-based query tool and visualized in their genomic context on the Ensembl genome browser. The PhytoPath resource contains data for 135 genomic sequences from 87 plant pathogen species, and 1364 genes curated for their role in pathogenicity and as targets for chemical intervention. Support for community annotation of gene models is provided using the WebApollo online gene editor, and we are working with interested communities to improve reference annotation for selected species. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Ensemble average theory of gravity
NASA Astrophysics Data System (ADS)
Khosravi, Nima
2016-12-01
We put forward the idea that all the theoretically consistent models of gravity have contributions to the observed gravity interaction. In this formulation, each model comes with its own Euclidean path-integral weight where general relativity (GR) has automatically the maximum weight in high-curvature regions. We employ this idea in the framework of Lovelock models and show that in four dimensions the result is a specific form of the f (R ,G ) model. This specific f (R ,G ) satisfies the stability conditions and possesses self-accelerating solutions. Our model is consistent with the local tests of gravity since its behavior is the same as in GR for the high-curvature regime. In the low-curvature regime the gravitational force is weaker than in GR, which can be interpreted as the existence of a repulsive fifth force for very large scales. Interestingly, there is an intermediate-curvature regime where the gravitational force is stronger in our model compared to GR. The different behavior of our model in comparison with GR in both low- and intermediate-curvature regimes makes it observationally distinguishable from Λ CDM .
PathVisio-Faceted Search: an exploration tool for multi-dimensional navigation of large pathways
Fried, Jake Y.; Luna, Augustin
2013-01-01
Purpose: The PathVisio-Faceted Search plugin helps users explore and understand complex pathways by overlaying experimental data and data from webservices, such as Ensembl BioMart, onto diagrams drawn using formalized notations in PathVisio. The plugin then provides a filtering mechanism, known as a faceted search, to find and highlight diagram nodes (e.g. genes and proteins) of interest based on imported data. The tool additionally provides a flexible scripting mechanism to handle complex queries. Availability: The PathVisio-Faceted Search plugin is compatible with PathVisio 3.0 and above. PathVisio is compatible with Windows, Mac OS X and Linux. The plugin, documentation, example diagrams and Groovy scripts are available at http://PathVisio.org/wiki/PathVisioFacetedSearchHelp. The plugin is free, open-source and licensed by the Apache 2.0 License. Contact: augustin@mail.nih.gov or jakeyfried@gmail.com PMID:23547033
Creation of the BMA ensemble for SST using a parallel processing technique
NASA Astrophysics Data System (ADS)
Kim, Kwangjin; Lee, Yang Won
2013-10-01
Despite the same purpose, each satellite product has different value because of its inescapable uncertainty. Also the satellite products have been calculated for a long time, and the kinds of the products are various and enormous. So the efforts for reducing the uncertainty and dealing with enormous data will be necessary. In this paper, we create an ensemble Sea Surface Temperature (SST) using MODIS Aqua, MODIS Terra and COMS (Communication Ocean and Meteorological Satellite). We used Bayesian Model Averaging (BMA) as ensemble method. The principle of the BMA is synthesizing the conditional probability density function (PDF) using posterior probability as weight. The posterior probability is estimated using EM algorithm. The BMA PDF is obtained by weighted average. As the result, the ensemble SST showed the lowest RMSE and MAE, which proves the applicability of BMA for satellite data ensemble. As future work, parallel processing techniques using Hadoop framework will be adopted for more efficient computation of very big satellite data.
Ensemble Sampling vs. Time Sampling in Molecular Dynamics Simulations of Thermal Conductivity
Gordiz, Kiarash; Singh, David J.; Henry, Asegun
2015-01-29
In this report we compare time sampling and ensemble averaging as two different methods available for phase space sampling. For the comparison, we calculate thermal conductivities of solid argon and silicon structures, using equilibrium molecular dynamics. We introduce two different schemes for the ensemble averaging approach, and show that both can reduce the total simulation time as compared to time averaging. It is also found that velocity rescaling is an efficient mechanism for phase space exploration. Although our methodology is tested using classical molecular dynamics, the ensemble generation approaches may find their greatest utility in computationally expensive simulations such asmore » first principles molecular dynamics. For such simulations, where each time step is costly, time sampling can require long simulation times because each time step must be evaluated sequentially and therefore phase space averaging is achieved through sequential operations. On the other hand, with ensemble averaging, phase space sampling can be achieved through parallel operations, since each ensemble is independent. For this reason, particularly when using massively parallel architectures, ensemble sampling can result in much shorter simulation times and exhibits similar overall computational effort.« less
Shear-stress fluctuations and relaxation in polymer glasses
NASA Astrophysics Data System (ADS)
Kriuchevskyi, I.; Wittmer, J. P.; Meyer, H.; Benzerara, O.; Baschnagel, J.
2018-01-01
We investigate by means of molecular dynamics simulation a coarse-grained polymer glass model focusing on (quasistatic and dynamical) shear-stress fluctuations as a function of temperature T and sampling time Δ t . The linear response is characterized using (ensemble-averaged) expectation values of the contributions (time averaged for each shear plane) to the stress-fluctuation relation μsf for the shear modulus and the shear-stress relaxation modulus G (t ) . Using 100 independent configurations, we pay attention to the respective standard deviations. While the ensemble-averaged modulus μsf(T ) decreases continuously with increasing T for all Δ t sampled, its standard deviation δ μsf(T ) is nonmonotonic with a striking peak at the glass transition. The question of whether the shear modulus is continuous or has a jump singularity at the glass transition is thus ill posed. Confirming the effective time-translational invariance of our systems, the Δ t dependence of μsf and related quantities can be understood using a weighted integral over G (t ) .
Resonance fluorescence trajectories in superconducting qubit
NASA Astrophysics Data System (ADS)
Naghiloo, Mahdi; Tan, Dian; Harrington, Patrick; Lewalle, Philippe; Jordan, Andrew; Murch, Kater
We employ phase-sensitive amplification to perform homodyne detection of the resonance fluorescence from a driven superconducting artificial atom. Entanglement between the emitter and its fluorescence allows us to track the individual quantum state trajectories of the emitter. We analyze the ensemble properties of these trajectories by considering paths that connect specific initial and final states. By applying a stochastic path integral formalism, we calculate equations of motion for the most likely path between two quantum states and compare these predicted paths to experimental data. Drawing on the mathematical similarity between the action formalism of the most likely quantum paths and ray optics, we study the emergence of caustics in quantum trajectories-situations where multiple extrema in the stochastic action occur. We observe such multiple most likely paths in experimental data and find these paths to be in reasonable quantitative agreement with theoretical calculations. Supported by the John Templeton Foundation.
New technique for ensemble dressing combining Multimodel SuperEnsemble and precipitation PDF
NASA Astrophysics Data System (ADS)
Cane, D.; Milelli, M.
2009-09-01
The Multimodel SuperEnsemble technique (Krishnamurti et al., Science 285, 1548-1550, 1999) is a postprocessing method for the estimation of weather forecast parameters reducing direct model output errors. It differs from other ensemble analysis techniques by the use of an adequate weighting of the input forecast models to obtain a combined estimation of meteorological parameters. Weights are calculated by least-square minimization of the difference between the model and the observed field during a so-called training period. Although it can be applied successfully on the continuous parameters like temperature, humidity, wind speed and mean sea level pressure (Cane and Milelli, Meteorologische Zeitschrift, 15, 2, 2006), the Multimodel SuperEnsemble gives good results also when applied on the precipitation, a parameter quite difficult to handle with standard post-processing methods. Here we present our methodology for the Multimodel precipitation forecasts applied on a wide spectrum of results over Piemonte very dense non-GTS weather station network. We will focus particularly on an accurate statistical method for bias correction and on the ensemble dressing in agreement with the observed precipitation forecast-conditioned PDF. Acknowledgement: this work is supported by the Italian Civil Defence Department.
Water evaporation: a transition path sampling study.
Varilly, Patrick; Chandler, David
2013-02-07
We use transition path sampling to study evaporation in the SPC/E model of liquid water. On the basis of thousands of evaporation trajectories, we characterize the members of the transition state ensemble (TSE), which exhibit a liquid-vapor interface with predominantly negative mean curvature at the site of evaporation. We also find that after evaporation is complete, the distributions of translational and angular momenta of the evaporated water are Maxwellian with a temperature equal to that of the liquid. To characterize the evaporation trajectories in their entirety, we find that it suffices to project them onto just two coordinates: the distance of the evaporating molecule to the instantaneous liquid-vapor interface and the velocity of the water along the average interface normal. In this projected space, we find that the TSE is well-captured by a simple model of ballistic escape from a deep potential well, with no additional barrier to evaporation beyond the cohesive strength of the liquid. Equivalently, they are consistent with a near-unity probability for a water molecule impinging upon a liquid droplet to condense. These results agree with previous simulations and with some, but not all, recent experiments.
A Novel Data-Driven Learning Method for Radar Target Detection in Nonstationary Environments
2016-05-01
Classifier ensembles for changing environments,” in Multiple Classifier Systems, vol. 3077, F. Roli, J. Kittler and T. Windeatt, Eds. New York, NY...Dec. 2006, pp. 1113–1118. [21] J. Z. Kolter and M. A. Maloof, “Dynamic weighted majority: An ensemble method for drifting concepts,” J. Mach. Learn...Trans. Neural Netw., vol. 22, no. 10, pp. 1517–1531, Oct. 2011. [23] R. Polikar, “ Ensemble learning,” in Ensemble Machine Learning: Methods and
Different realizations of Cooper-Frye sampling with conservation laws
NASA Astrophysics Data System (ADS)
Schwarz, C.; Oliinychenko, D.; Pang, L.-G.; Ryu, S.; Petersen, H.
2018-01-01
Approaches based on viscous hydrodynamics for the hot and dense stage and hadronic transport for the final dilute rescattering stage are successfully applied to the dynamic description of heavy ion reactions at high beam energies. One crucial step in such hybrid approaches is the so-called particlization, which is the transition between the hydrodynamic description and the microscopic degrees of freedom. For this purpose, individual particles are sampled on the Cooper-Frye hypersurface. In this work, four different realizations of the sampling algorithms are compared, with three of them incorporating the global conservation laws of quantum numbers in each event. The algorithms are compared within two types of scenarios: a simple ‘box’ hypersurface consisting of only one static cell and a typical particlization hypersurface for Au+Au collisions at \\sqrt{{s}{NN}}=200 {GeV}. For all algorithms the mean multiplicities (or particle spectra) remain unaffected by global conservation laws in the case of large volumes. In contrast, the fluctuations of the particle numbers are affected considerably. The fluctuations of the newly developed SPREW algorithm based on the exponential weight, and the recently suggested SER algorithm based on ensemble rejection, are smaller than those without conservation laws and agree with the expectation from the canonical ensemble. The previously applied mode sampling algorithm produces dramatically larger fluctuations than expected in the corresponding microcanonical ensemble, and therefore should be avoided in fluctuation studies. This study might be of interest for the investigation of particle fluctuations and correlations, e.g. the suggested signatures for a phase transition or a critical endpoint, in hybrid approaches that are affected by global conservation laws.
NASA Astrophysics Data System (ADS)
Tehrany, Mahyat Shafapour; Pradhan, Biswajeet; Jebur, Mustafa Neamah
2014-05-01
Flood is one of the most devastating natural disasters that occur frequently in Terengganu, Malaysia. Recently, ensemble based techniques are getting extremely popular in flood modeling. In this paper, weights-of-evidence (WoE) model was utilized first, to assess the impact of classes of each conditioning factor on flooding through bivariate statistical analysis (BSA). Then, these factors were reclassified using the acquired weights and entered into the support vector machine (SVM) model to evaluate the correlation between flood occurrence and each conditioning factor. Through this integration, the weak point of WoE can be solved and the performance of the SVM will be enhanced. The spatial database included flood inventory, slope, stream power index (SPI), topographic wetness index (TWI), altitude, curvature, distance from the river, geology, rainfall, land use/cover (LULC), and soil type. Four kernel types of SVM (linear kernel (LN), polynomial kernel (PL), radial basis function kernel (RBF), and sigmoid kernel (SIG)) were used to investigate the performance of each kernel type. The efficiency of the new ensemble WoE and SVM method was tested using area under curve (AUC) which measured the prediction and success rates. The validation results proved the strength and efficiency of the ensemble method over the individual methods. The best results were obtained from RBF kernel when compared with the other kernel types. Success rate and prediction rate for ensemble WoE and RBF-SVM method were 96.48% and 95.67% respectively. The proposed ensemble flood susceptibility mapping method could assist researchers and local governments in flood mitigation strategies.
Importance sampling large deviations in nonequilibrium steady states. I.
Ray, Ushnish; Chan, Garnet Kin-Lic; Limmer, David T
2018-03-28
Large deviation functions contain information on the stability and response of systems driven into nonequilibrium steady states and in such a way are similar to free energies for systems at equilibrium. As with equilibrium free energies, evaluating large deviation functions numerically for all but the simplest systems is difficult because by construction they depend on exponentially rare events. In this first paper of a series, we evaluate different trajectory-based sampling methods capable of computing large deviation functions of time integrated observables within nonequilibrium steady states. We illustrate some convergence criteria and best practices using a number of different models, including a biased Brownian walker, a driven lattice gas, and a model of self-assembly. We show how two popular methods for sampling trajectory ensembles, transition path sampling and diffusion Monte Carlo, suffer from exponentially diverging correlations in trajectory space as a function of the bias parameter when estimating large deviation functions. Improving the efficiencies of these algorithms requires introducing guiding functions for the trajectories.
Importance sampling large deviations in nonequilibrium steady states. I
NASA Astrophysics Data System (ADS)
Ray, Ushnish; Chan, Garnet Kin-Lic; Limmer, David T.
2018-03-01
Large deviation functions contain information on the stability and response of systems driven into nonequilibrium steady states and in such a way are similar to free energies for systems at equilibrium. As with equilibrium free energies, evaluating large deviation functions numerically for all but the simplest systems is difficult because by construction they depend on exponentially rare events. In this first paper of a series, we evaluate different trajectory-based sampling methods capable of computing large deviation functions of time integrated observables within nonequilibrium steady states. We illustrate some convergence criteria and best practices using a number of different models, including a biased Brownian walker, a driven lattice gas, and a model of self-assembly. We show how two popular methods for sampling trajectory ensembles, transition path sampling and diffusion Monte Carlo, suffer from exponentially diverging correlations in trajectory space as a function of the bias parameter when estimating large deviation functions. Improving the efficiencies of these algorithms requires introducing guiding functions for the trajectories.
Weighting of NMME temperature and precipitation forecasts across Europe
NASA Astrophysics Data System (ADS)
Slater, Louise J.; Villarini, Gabriele; Bradley, A. Allen
2017-09-01
Multi-model ensemble forecasts are obtained by weighting multiple General Circulation Model (GCM) outputs to heighten forecast skill and reduce uncertainties. The North American Multi-Model Ensemble (NMME) project facilitates the development of such multi-model forecasting schemes by providing publicly-available hindcasts and forecasts online. Here, temperature and precipitation forecasts are enhanced by leveraging the strengths of eight NMME GCMs (CCSM3, CCSM4, CanCM3, CanCM4, CFSv2, GEOS5, GFDL2.1, and FLORb01) across all forecast months and lead times, for four broad climatic European regions: Temperate, Mediterranean, Humid-Continental and Subarctic-Polar. We compare five different approaches to multi-model weighting based on the equally weighted eight single-model ensembles (EW-8), Bayesian updating (BU) of the eight single-model ensembles (BU-8), BU of the 94 model members (BU-94), BU of the principal components of the eight single-model ensembles (BU-PCA-8) and BU of the principal components of the 94 model members (BU-PCA-94). We assess the forecasting skill of these five multi-models and evaluate their ability to predict some of the costliest historical droughts and floods in recent decades. Results indicate that the simplest approach based on EW-8 preserves model skill, but has considerable biases. The BU and BU-PCA approaches reduce the unconditional biases and negative skill in the forecasts considerably, but they can also sometimes diminish the positive skill in the original forecasts. The BU-PCA models tend to produce lower conditional biases than the BU models and have more homogeneous skill than the other multi-models, but with some loss of skill. The use of 94 NMME model members does not present significant benefits over the use of the 8 single model ensembles. These findings may provide valuable insights for the development of skillful, operational multi-model forecasting systems.
A new method for determining the optimal lagged ensemble
DelSole, T.; Tippett, M. K.; Pegion, K.
2017-01-01
Abstract We propose a general methodology for determining the lagged ensemble that minimizes the mean square forecast error. The MSE of a lagged ensemble is shown to depend only on a quantity called the cross‐lead error covariance matrix, which can be estimated from a short hindcast data set and parameterized in terms of analytic functions of time. The resulting parameterization allows the skill of forecasts to be evaluated for an arbitrary ensemble size and initialization frequency. Remarkably, the parameterization also can estimate the MSE of a burst ensemble simply by taking the limit of an infinitely small interval between initialization times. This methodology is applied to forecasts of the Madden Julian Oscillation (MJO) from version 2 of the Climate Forecast System version 2 (CFSv2). For leads greater than a week, little improvement is found in the MJO forecast skill when ensembles larger than 5 days are used or initializations greater than 4 times per day. We find that if the initialization frequency is too infrequent, important structures of the lagged error covariance matrix are lost. Lastly, we demonstrate that the forecast error at leads ≥10 days can be reduced by optimally weighting the lagged ensemble members. The weights are shown to depend only on the cross‐lead error covariance matrix. While the methodology developed here is applied to CFSv2, the technique can be easily adapted to other forecast systems. PMID:28580050
Identifying the optimal segmentors for mass classification in mammograms
NASA Astrophysics Data System (ADS)
Zhang, Yu; Tomuro, Noriko; Furst, Jacob; Raicu, Daniela S.
2015-03-01
In this paper, we present the results of our investigation on identifying the optimal segmentor(s) from an ensemble of weak segmentors, used in a Computer-Aided Diagnosis (CADx) system which classifies suspicious masses in mammograms as benign or malignant. This is an extension of our previous work, where we used various parameter settings of image enhancement techniques to each suspicious mass (region of interest (ROI)) to obtain several enhanced images, then applied segmentation to each image to obtain several contours of a given mass. Each segmentation in this ensemble is essentially a "weak segmentor" because no single segmentation can produce the optimal result for all images. Then after shape features are computed from the segmented contours, the final classification model was built using logistic regression. The work in this paper focuses on identifying the optimal segmentor(s) from an ensemble mix of weak segmentors. For our purpose, optimal segmentors are those in the ensemble mix which contribute the most to the overall classification rather than the ones that produced high precision segmentation. To measure the segmentors' contribution, we examined weights on the features in the derived logistic regression model and computed the average feature weight for each segmentor. The result showed that, while in general the segmentors with higher segmentation success rates had higher feature weights, some segmentors with lower segmentation rates had high classification feature weights as well.
Quantum caustics in resonance-fluorescence trajectories
NASA Astrophysics Data System (ADS)
Naghiloo, M.; Tan, D.; Harrington, P. M.; Lewalle, P.; Jordan, A. N.; Murch, K. W.
2017-11-01
We employ phase-sensitive amplification to perform homodyne detection of the resonance fluorescence from a driven superconducting artificial atom. Entanglement between the emitter and its fluorescence allows us to track the individual quantum state trajectories of the emitter conditioned on the outcomes of the field measurements. We analyze the ensemble properties of these trajectories by considering trajectories that connect specific initial and final states. By applying the stochastic path-integral formalism, we calculate equations of motion for the most-likely path between two quantum states and compare these predicted paths to experimental data. Drawing on the mathematical similarity between the action formalism of the most-likely quantum paths and ray optics, we study the emergence of caustics in quantum trajectories: places where multiple extrema in the stochastic action occur. We observe such multiple most-likely paths in experimental data and find these paths to be in reasonable quantitative agreement with theoretical calculations.
Ensemble: an Architecture for Mission-Operations Software
NASA Technical Reports Server (NTRS)
Norris, Jeffrey; Powell, Mark; Fox, Jason; Rabe, Kenneth; Shu, IHsiang; McCurdy, Michael; Vera, Alonso
2008-01-01
Ensemble is the name of an open architecture for, and a methodology for the development of, spacecraft mission operations software. Ensemble is also potentially applicable to the development of non-spacecraft mission-operations- type software. Ensemble capitalizes on the strengths of the open-source Eclipse software and its architecture to address several issues that have arisen repeatedly in the development of mission-operations software: Heretofore, mission-operations application programs have been developed in disparate programming environments and integrated during the final stages of development of missions. The programs have been poorly integrated, and it has been costly to develop, test, and deploy them. Users of each program have been forced to interact with several different graphical user interfaces (GUIs). Also, the strategy typically used in integrating the programs has yielded serial chains of operational software tools of such a nature that during use of a given tool, it has not been possible to gain access to the capabilities afforded by other tools. In contrast, the Ensemble approach offers a low-risk path towards tighter integration of mission-operations software tools.
Creating ensembles of decision trees through sampling
Kamath, Chandrika; Cantu-Paz, Erick
2005-08-30
A system for decision tree ensembles that includes a module to read the data, a module to sort the data, a module to evaluate a potential split of the data according to some criterion using a random sample of the data, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method is based on statistical sampling techniques and includes the steps of reading the data; sorting the data; evaluating a potential split according to some criterion using a random sample of the data, splitting the data, and combining multiple decision trees in ensembles.
Monthly ENSO Forecast Skill and Lagged Ensemble Size
DelSole, T.; Tippett, M.K.; Pegion, K.
2018-01-01
Abstract The mean square error (MSE) of a lagged ensemble of monthly forecasts of the Niño 3.4 index from the Climate Forecast System (CFSv2) is examined with respect to ensemble size and configuration. Although the real‐time forecast is initialized 4 times per day, it is possible to infer the MSE for arbitrary initialization frequency and for burst ensembles by fitting error covariances to a parametric model and then extrapolating to arbitrary ensemble size and initialization frequency. Applying this method to real‐time forecasts, we find that the MSE consistently reaches a minimum for a lagged ensemble size between one and eight days, when four initializations per day are included. This ensemble size is consistent with the 8–10 day lagged ensemble configuration used operationally. Interestingly, the skill of both ensemble configurations is close to the estimated skill of the infinite ensemble. The skill of the weighted, lagged, and burst ensembles are found to be comparable. Certain unphysical features of the estimated error growth were tracked down to problems with the climatology and data discontinuities. PMID:29937973
Monthly ENSO Forecast Skill and Lagged Ensemble Size
NASA Astrophysics Data System (ADS)
Trenary, L.; DelSole, T.; Tippett, M. K.; Pegion, K.
2018-04-01
The mean square error (MSE) of a lagged ensemble of monthly forecasts of the Niño 3.4 index from the Climate Forecast System (CFSv2) is examined with respect to ensemble size and configuration. Although the real-time forecast is initialized 4 times per day, it is possible to infer the MSE for arbitrary initialization frequency and for burst ensembles by fitting error covariances to a parametric model and then extrapolating to arbitrary ensemble size and initialization frequency. Applying this method to real-time forecasts, we find that the MSE consistently reaches a minimum for a lagged ensemble size between one and eight days, when four initializations per day are included. This ensemble size is consistent with the 8-10 day lagged ensemble configuration used operationally. Interestingly, the skill of both ensemble configurations is close to the estimated skill of the infinite ensemble. The skill of the weighted, lagged, and burst ensembles are found to be comparable. Certain unphysical features of the estimated error growth were tracked down to problems with the climatology and data discontinuities.
Generalized canonical ensembles and ensemble equivalence
DOE Office of Scientific and Technical Information (OSTI.GOV)
Costeniuc, M.; Ellis, R.S.; Turkington, B.
2006-02-15
This paper is a companion piece to our previous work [J. Stat. Phys. 119, 1283 (2005)], which introduced a generalized canonical ensemble obtained by multiplying the usual Boltzmann weight factor e{sup -{beta}}{sup H} of the canonical ensemble with an exponential factor involving a continuous function g of the Hamiltonian H. We provide here a simplified introduction to our previous work, focusing now on a number of physical rather than mathematical aspects of the generalized canonical ensemble. The main result discussed is that, for suitable choices of g, the generalized canonical ensemble reproduces, in the thermodynamic limit, all the microcanonical equilibriummore » properties of the many-body system represented by H even if this system has a nonconcave microcanonical entropy function. This is something that in general the standard (g=0) canonical ensemble cannot achieve. Thus a virtue of the generalized canonical ensemble is that it can often be made equivalent to the microcanonical ensemble in cases in which the canonical ensemble cannot. The case of quadratic g functions is discussed in detail; it leads to the so-called Gaussian ensemble.« less
NASA Astrophysics Data System (ADS)
Chen, L. A.; Doddridge, B. G.; Dickerson, R. R.
2001-12-01
As the primary field experiment for Maryland Aerosol Research and CHaracterization (MARCH-Atlantic) study, chemically speciated PM2.5 has been sampled at Fort Meade (FME, 39.10° N 76.74° W) since July 1999. FME is suburban, located in the middle of the bustling Baltimore-Washington corridor, which is generally downwind of the highly industrialized Midwest. Due to this unique sampling location, the PM2.5 observed at FME is expected to be of both local and regional sources, with relative contributions varying temporally. This variation, believed to be largely controlled by the meteorology, influences day-to-day or seasonal profiles of PM2.5 mass concentration and chemical composition. Air parcel back trajectories, which describe the path of air parcels traveling backward in time from site (receptor), reflect changes in the synoptic meteorological conditions. In this paper, an ensemble back trajectory method is employed to study the meteorology associated with each high/low PM2.5 episode in different seasons. For every sampling day, the residence time of air parcels within the eastern US at a 1° x 1° x 500 m geographic resolution can be estimated in order to resolve areas likely dominating the production of various PM2.5 components. Local sources are found to be more dominant in winter than in summer. "Factor analysis" is based on mass balance approach, providing useful insights on air pollution data. Here, a newly developed factor analysis model (UNMIX) is used to extract source profiles and contributions from the speciated PM2.5 data. Combing the model results with ensemble back trajectory method improves the understanding of the source regions and helps partition the contributions from local or more distant areas. >http://www.meto.umd.edu/~bruce/MARCH-Atl.html
A Sequential Ensemble Prediction System at Convection Permitting Scales
NASA Astrophysics Data System (ADS)
Milan, M.; Simmer, C.
2012-04-01
A Sequential Assimilation Method (SAM) following some aspects of particle filtering with resampling, also called SIR (Sequential Importance Resampling), is introduced and applied in the framework of an Ensemble Prediction System (EPS) for weather forecasting on convection permitting scales, with focus to precipitation forecast. At this scale and beyond, the atmosphere increasingly exhibits chaotic behaviour and non linear state space evolution due to convectively driven processes. One way to take full account of non linear state developments are particle filter methods, their basic idea is the representation of the model probability density function by a number of ensemble members weighted by their likelihood with the observations. In particular particle filter with resampling abandons ensemble members (particles) with low weights restoring the original number of particles adding multiple copies of the members with high weights. In our SIR-like implementation we substitute the likelihood way to define weights and introduce a metric which quantifies the "distance" between the observed atmospheric state and the states simulated by the ensemble members. We also introduce a methodology to counteract filter degeneracy, i.e. the collapse of the simulated state space. To this goal we propose a combination of resampling taking account of simulated state space clustering and nudging. By keeping cluster representatives during resampling and filtering, the method maintains the potential for non linear system state development. We assume that a particle cluster with initially low likelihood may evolve in a state space with higher likelihood in a subsequent filter time thus mimicking non linear system state developments (e.g. sudden convection initiation) and remedies timing errors for convection due to model errors and/or imperfect initial condition. We apply a simplified version of the resampling, the particles with highest weights in each cluster are duplicated; for the model evolution for each particle pair one particle evolves using the forward model; the second particle, however, is nudged to the radar and satellite observation during its evolution based on the forward model.
NASA Astrophysics Data System (ADS)
Maleki, Yusef; Zheltikov, Aleksei M.
2018-01-01
An ensemble of nitrogen-vacancy (NV) centers coupled to a circuit QED device is shown to enable an efficient, high-fidelity generation of high-N00N states. Instead of first creating entanglement and then increasing the number of entangled particles N , our source of high-N00N states first prepares a high-N Fock state in one of the NV ensembles and then entangles it to the rest of the system. With such a strategy, high-N N00N states can be generated in just a few operational steps with an extraordinary fidelity. Once prepared, such a state can be stored over a longer period of time due to the remarkably long coherence time of NV centers.
Impacts of weighting climate models for hydro-meteorological climate change studies
NASA Astrophysics Data System (ADS)
Chen, Jie; Brissette, François P.; Lucas-Picher, Philippe; Caya, Daniel
2017-06-01
Weighting climate models is controversial in climate change impact studies using an ensemble of climate simulations from different climate models. In climate science, there is a general consensus that all climate models should be considered as having equal performance or in other words that all projections are equiprobable. On the other hand, in the impacts and adaptation community, many believe that climate models should be weighted based on their ability to better represent various metrics over a reference period. The debate appears to be partly philosophical in nature as few studies have investigated the impact of using weights in projecting future climate changes. The present study focuses on the impact of assigning weights to climate models for hydrological climate change studies. Five methods are used to determine weights on an ensemble of 28 global climate models (GCMs) adapted from the Coupled Model Intercomparison Project Phase 5 (CMIP5) database. Using a hydrological model, streamflows are computed over a reference (1961-1990) and future (2061-2090) periods, with and without post-processing climate model outputs. The impacts of using different weighting schemes for GCM simulations are then analyzed in terms of ensemble mean and uncertainty. The results show that weighting GCMs has a limited impact on both projected future climate in term of precipitation and temperature changes and hydrology in terms of nine different streamflow criteria. These results apply to both raw and post-processed GCM model outputs, thus supporting the view that climate models should be considered equiprobable.
NASA Technical Reports Server (NTRS)
Taylor, Patrick C.; Baker, Noel C.
2015-01-01
Earth's climate is changing and will continue to change into the foreseeable future. Expected changes in the climatological distribution of precipitation, surface temperature, and surface solar radiation will significantly impact agriculture. Adaptation strategies are, therefore, required to reduce the agricultural impacts of climate change. Climate change projections of precipitation, surface temperature, and surface solar radiation distributions are necessary input for adaption planning studies. These projections are conventionally constructed from an ensemble of climate model simulations (e.g., the Coupled Model Intercomparison Project 5 (CMIP5)) as an equal weighted average, one model one vote. Each climate model, however, represents the array of climate-relevant physical processes with varying degrees of fidelity influencing the projection of individual climate variables differently. Presented here is a new approach, termed the "Intelligent Ensemble, that constructs climate variable projections by weighting each model according to its ability to represent key physical processes, e.g., precipitation probability distribution. This approach provides added value over the equal weighted average method. Physical process metrics applied in the "Intelligent Ensemble" method are created using a combination of NASA and NOAA satellite and surface-based cloud, radiation, temperature, and precipitation data sets. The "Intelligent Ensemble" method is applied to the RCP4.5 and RCP8.5 anthropogenic climate forcing simulations within the CMIP5 archive to develop a set of climate change scenarios for precipitation, temperature, and surface solar radiation in each USDA Farm Resource Region for use in climate change adaptation studies.
Climate Model Ensemble Methodology: Rationale and Challenges
NASA Astrophysics Data System (ADS)
Vezer, M. A.; Myrvold, W.
2012-12-01
A tractable model of the Earth's atmosphere, or, indeed, any large, complex system, is inevitably unrealistic in a variety of ways. This will have an effect on the model's output. Nonetheless, we want to be able to rely on certain features of the model's output in studies aiming to detect, attribute, and project climate change. For this, we need assurance that these features reflect the target system, and are not artifacts of the unrealistic assumptions that go into the model. One technique for overcoming these limitations is to study ensembles of models which employ different simplifying assumptions and different methods of modelling. One then either takes as reliable certain outputs on which models in the ensemble agree, or takes the average of these outputs as the best estimate. Since the Intergovernmental Panel on Climate Change's Fourth Assessment Report (IPCC AR4) modellers have aimed to improve ensemble analysis by developing techniques to account for dependencies among models, and to ascribe unequal weights to models according to their performance. The goal of this paper is to present as clearly and cogently as possible the rationale for climate model ensemble methodology, the motivation of modellers to account for model dependencies, and their efforts to ascribe unequal weights to models. The method of our analysis is as follows. We will consider a simpler, well-understood case of taking the mean of a number of measurements of some quantity. Contrary to what is sometimes said, it is not a requirement of this practice that the errors of the component measurements be independent; one must, however, compensate for any lack of independence. We will also extend the usual accounts to include cases of unknown systematic error. We draw parallels between this simpler illustration and the more complex example of climate model ensembles, detailing how ensembles can provide more useful information than any of their constituent models. This account emphasizes the epistemic importance of considering degrees of model dependence, and the practice of ascribing unequal weights to models of unequal skill.
Improving Climate Projections Using "Intelligent" Ensembles
NASA Technical Reports Server (NTRS)
Baker, Noel C.; Taylor, Patrick C.
2015-01-01
Recent changes in the climate system have led to growing concern, especially in communities which are highly vulnerable to resource shortages and weather extremes. There is an urgent need for better climate information to develop solutions and strategies for adapting to a changing climate. Climate models provide excellent tools for studying the current state of climate and making future projections. However, these models are subject to biases created by structural uncertainties. Performance metrics-or the systematic determination of model biases-succinctly quantify aspects of climate model behavior. Efforts to standardize climate model experiments and collect simulation data-such as the Coupled Model Intercomparison Project (CMIP)-provide the means to directly compare and assess model performance. Performance metrics have been used to show that some models reproduce present-day climate better than others. Simulation data from multiple models are often used to add value to projections by creating a consensus projection from the model ensemble, in which each model is given an equal weight. It has been shown that the ensemble mean generally outperforms any single model. It is possible to use unequal weights to produce ensemble means, in which models are weighted based on performance (called "intelligent" ensembles). Can performance metrics be used to improve climate projections? Previous work introduced a framework for comparing the utility of model performance metrics, showing that the best metrics are related to the variance of top-of-atmosphere outgoing longwave radiation. These metrics improve present-day climate simulations of Earth's energy budget using the "intelligent" ensemble method. The current project identifies several approaches for testing whether performance metrics can be applied to future simulations to create "intelligent" ensemble-mean climate projections. It is shown that certain performance metrics test key climate processes in the models, and that these metrics can be used to evaluate model quality in both current and future climate states. This information will be used to produce new consensus projections and provide communities with improved climate projections for urgent decision-making.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ginn, Timothy R.; Weathers, Tess
Biogeochemical modeling using PHREEQC2 and a streamtube ensemble approach is utilized to understand a well-to-well subsurface treatment system at the Vadose Zone Research Park (VZRP) near Idaho Falls, Idaho. Treatment involves in situ microbially-mediated ureolysis to induce calcite precipitation for the immobilization of strontium-90. PHREEQC2 is utilized to model the kinetically-controlled ureolysis and consequent calcite precipitation. Reaction kinetics, equilibrium phases, and cation exchange are used within PHREEQC2 to track pH and levels of calcium, ammonium, urea, and calcite precipitation over time, within a series of one-dimensional advective-dispersive transport paths creating a streamtube ensemble representation of the well-to-well transport. An understandingmore » of the impact of physical heterogeneities within this radial flowfield is critical for remediation design; we address this via the streamtube approach: instead of depicting spatial extents of solutes in the subsurface we focus on their arrival distribution at the control well(s). Traditionally, each streamtube maintains uniform velocity; however in radial flow in homogeneous media, the velocity within any given streamtube is spatially-variable in a common way, being highest at the input and output wells and approaching a minimum at the midpoint between the wells. This idealized velocity variability is of significance in the case of ureolytically driven calcite precipitation. Streamtube velocity patterns for any particular configuration of injection and withdrawal wells are available as explicit calculations from potential theory, and also from particle tracking programs. To approximate the actual spatial distribution of velocity along streamtubes, we assume idealized radial non-uniform velocity associated with homogeneous media. This is implemented in PHREEQC2 via a non-uniform spatial discretization within each streamtube that honors both the streamtube’s travel time and the idealized “fast-slow-fast” pattern of non-uniform velocity along the streamline. Breakthrough curves produced by each simulation are weighted by the path-respective flux fractions (obtained by deconvolution of tracer tests conducted at the VZRP) to obtain the flux-average of flow contributions to the observation well.« less
Ovchinnikov, Victor; Karplus, Martin
2012-07-26
The popular targeted molecular dynamics (TMD) method for generating transition paths in complex biomolecular systems is revisited. In a typical TMD transition path, the large-scale changes occur early and the small-scale changes tend to occur later. As a result, the order of events in the computed paths depends on the direction in which the simulations are performed. To identify the origin of this bias, and to propose a method in which the bias is absent, variants of TMD in the restraint formulation are introduced and applied to the complex open ↔ closed transition in the protein calmodulin. Due to the global best-fit rotation that is typically part of the TMD method, the simulated system is guided implicitly along the lowest-frequency normal modes, until the large spatial scales associated with these modes are near the target conformation. The remaining portion of the transition is described progressively by higher-frequency modes, which correspond to smaller-scale rearrangements. A straightforward modification of TMD that avoids the global best-fit rotation is the locally restrained TMD (LRTMD) method, in which the biasing potential is constructed from a number of TMD potentials, each acting on a small connected portion of the protein sequence. With a uniform distribution of these elements, transition paths that lack the length-scale bias are obtained. Trajectories generated by steered MD in dihedral angle space (DSMD), a method that avoids best-fit rotations altogether, also lack the length-scale bias. To examine the importance of the paths generated by TMD, LRTMD, and DSMD in the actual transition, we use the finite-temperature string method to compute the free energy profile associated with a transition tube around a path generated by each algorithm. The free energy barriers associated with the paths are comparable, suggesting that transitions can occur along each route with similar probabilities. This result indicates that a broad ensemble of paths needs to be calculated to obtain a full description of conformational changes in biomolecules. The breadth of the contributing ensemble suggests that energetic barriers for conformational transitions in proteins are offset by entropic contributions that arise from a large number of possible paths.
Graphs and matroids weighted in a bounded incline algebra.
Lu, Ling-Xia; Zhang, Bei
2014-01-01
Firstly, for a graph weighted in a bounded incline algebra (or called a dioid), a longest path problem (LPP, for short) is presented, which can be considered the uniform approach to the famous shortest path problem, the widest path problem, and the most reliable path problem. The solutions for LPP and related algorithms are given. Secondly, for a matroid weighted in a linear matroid, the maximum independent set problem is studied.
Kingsley, Laura J.; Lill, Markus A.
2014-01-01
Computational prediction of ligand entry and egress paths in proteins has become an emerging topic in computational biology and has proven useful in fields such as protein engineering and drug design. Geometric tunnel prediction programs, such as Caver3.0 and MolAxis, are computationally efficient methods to identify potential ligand entry and egress routes in proteins. Although many geometric tunnel programs are designed to accommodate a single input structure, the increasingly recognized importance of protein flexibility in tunnel formation and behavior has led to the more widespread use of protein ensembles in tunnel prediction. However, there has not yet been an attempt to directly investigate the influence of ensemble size and composition on geometric tunnel prediction. In this study, we compared tunnels found in a single crystal structure to ensembles of various sizes generated using different methods on both the apo and holo forms of cytochrome P450 enzymes CYP119, CYP2C9, and CYP3A4. Several protein structure clustering methods were tested in an attempt to generate smaller ensembles that were capable of reproducing the data from larger ensembles. Ultimately, we found that by including members from both the apo and holo data sets, we could produce ensembles containing less than 15 members that were comparable to apo or holo ensembles containing over 100 members. Furthermore, we found that, in the absence of either apo or holo crystal structure data, pseudo-apo or –holo ensembles (e.g. adding ligand to apo protein throughout MD simulations) could be used to resemble the structural ensembles of the corresponding apo and holo ensembles, respectively. Our findings not only further highlight the importance of including protein flexibility in geometric tunnel prediction, but also suggest that smaller ensembles can be as capable as larger ensembles at capturing many of the protein motions important for tunnel prediction at a lower computational cost. PMID:24956479
Analyses and forecasts of a tornadic supercell outbreak using a 3DVAR system ensemble
NASA Astrophysics Data System (ADS)
Zhuang, Zhaorong; Yussouf, Nusrat; Gao, Jidong
2016-05-01
As part of NOAA's "Warn-On-Forecast" initiative, a convective-scale data assimilation and prediction system was developed using the WRF-ARW model and ARPS 3DVAR data assimilation technique. The system was then evaluated using retrospective short-range ensemble analyses and probabilistic forecasts of the tornadic supercell outbreak event that occurred on 24 May 2011 in Oklahoma, USA. A 36-member multi-physics ensemble system provided the initial and boundary conditions for a 3-km convective-scale ensemble system. Radial velocity and reflectivity observations from four WSR-88Ds were assimilated into the ensemble using the ARPS 3DVAR technique. Five data assimilation and forecast experiments were conducted to evaluate the sensitivity of the system to data assimilation frequencies, in-cloud temperature adjustment schemes, and fixed- and mixed-microphysics ensembles. The results indicated that the experiment with 5-min assimilation frequency quickly built up the storm and produced a more accurate analysis compared with the 10-min assimilation frequency experiment. The predicted vertical vorticity from the moist-adiabatic in-cloud temperature adjustment scheme was larger in magnitude than that from the latent heat scheme. Cycled data assimilation yielded good forecasts, where the ensemble probability of high vertical vorticity matched reasonably well with the observed tornado damage path. Overall, the results of the study suggest that the 3DVAR analysis and forecast system can provide reasonable forecasts of tornadic supercell storms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, Weiwei; Domcke, Wolfgang; Farantos, Stavros C.
A trajectory method of calculating tunneling probabilities from phase integrals along straight line tunneling paths, originally suggested by Makri and Miller [J. Chem. Phys. 91, 4026 (1989)] and recently implemented by Truhlar and co-workers [Chem. Sci. 5, 2091 (2014)], is tested for one- and two-dimensional ab initio based potentials describing hydrogen dissociation in the {sup 1}B{sub 1} excited electronic state of pyrrole. The primary observables are the tunneling rates in a progression of bending vibrational states lying below the dissociation barrier and their isotope dependences. Several initial ensembles of classical trajectories have been considered, corresponding to the quasiclassical and themore » quantum mechanical samplings of the initial conditions. It is found that the sampling based on the fixed energy Wigner density gives the best agreement with the quantum mechanical dissociation rates.« less
Upgrades to the REA method for producing probabilistic climate change projections
NASA Astrophysics Data System (ADS)
Xu, Ying; Gao, Xuejie; Giorgi, Filippo
2010-05-01
We present an augmented version of the Reliability Ensemble Averaging (REA) method designed to generate probabilistic climate change information from ensembles of climate model simulations. Compared to the original version, the augmented one includes consideration of multiple variables and statistics in the calculation of the performance-based weights. In addition, the model convergence criterion previously employed is removed. The method is applied to the calculation of changes in mean and variability for temperature and precipitation over different sub-regions of East Asia based on the recently completed CMIP3 multi-model ensemble. Comparison of the new and old REA methods, along with the simple averaging procedure, and the use of different combinations of performance metrics shows that at fine sub-regional scales the choice of weighting is relevant. This is mostly because the models show a substantial spread in performance for the simulation of precipitation statistics, a result that supports the use of model weighting as a useful option to account for wide ranges of quality of models. The REA method, and in particular the upgraded one, provides a simple and flexible framework for assessing the uncertainty related to the aggregation of results from ensembles of models in order to produce climate change information at the regional scale. KEY WORDS: REA method, Climate change, CMIP3
NASA Astrophysics Data System (ADS)
Montero-Martinez, M. J.; Colorado, G.; Diaz-Gutierrez, D. E.; Salinas-Prieto, J. A.
2017-12-01
It is well known the North American Monsoon (NAM) region is already a very dry region which is under a lot of stress due to the lack of water resources on multiple locations of the area. However, it is very interesting that even under those conditions, the Mexican part of the NAM region is certainly the most productive in Mexico from the agricultural point of view. Thus, it is very important to have realistic climate scenarios for climate variables such as temperature, precipitation, relative humidity, radiation, etc. This study tries to tackle that problem by generating probabilistic climate scenarios using a weighted CMIP5-GCM ensemble approach based on the Xu et al. (2010) technique which is on itself an improved method from the better known Reliability Ensemble Averaging algorithm of Giorgi and Mearns (2002). In addition, it is compared the 20-plus GCMs individual performances and the weighted ensemble versus observed data (CRU TS2.1) by using different metrics and Taylor diagrams. This study focuses on probabilistic results reaching a certain threshold given the fact that those types of products could be of potential use for agricultural applications.
Working conditions, socioeconomic factors and low birth weight: path analysis.
Mahmoodi, Zohreh; Karimlou, Masoud; Sajjadi, Homeira; Dejman, Masoumeh; Vameghi, Meroe; Dolatian, Mahrokh
2013-09-01
In recent years, with socioeconomic changes in the society, the presence of women in the workplace is inevitable. The differences in working condition, especially for pregnant women, has adverse consequences like low birth weight. This study was conducted with the aim to model the relationship between working conditions, socioeconomic factors, and birth weight. This study was conducted in case-control design. The control group consisted of 500 women with normal weight babies, and the case group, 250 women with low weight babies from selected hospitals in Tehran. Data were collected using a researcher-made questionnaire to determine mothers' lifestyle during pregnancy with low birth weight with health-affecting social determinants approach. This questionnaire investigated women's occupational lifestyle in terms of working conditions, activities, and job satisfaction. Data were analyzed with SPSS-16 and Lisrel-8.8 software using statistical path analysis. The final path model fitted well (CFI =1, RMSEA=0.00) and showed that among direct paths, working condition (β=-0.032), among indirect paths, household income (β=-0.42), and in the overall effect, unemployed spouse (β=-0.1828) had the most effects on the low birth weight. Negative coefficients indicate decreasing effect on birth weight. Based on the path analysis model, working condition and socioeconomic status directly and indirectly influence birth weight. Thus, as well as attention to treatment and health care (biological aspect), special attention must also be paid to mothers' socioeconomic factors.
NASA Astrophysics Data System (ADS)
Isoguchi, O.; Matsui, K.; Kamachi, M.; Usui, N.; Miyazawa, Y.; Ishikawa, Y.; Hirose, N.
2017-12-01
Several operational ocean assimilation models are currently available for the Northwestern Pacific and surrounding marginal seas. One of the main targets is predicting the Kuroshio/Kuroshio Extension, which have an impact not only on social activities, such as fishery and ship routing, but also on local weather. There is a demand to assess their quality comprehensively and make the best out the available products. In the present study, several ocean data assimilation products and their multi-ensemble product were assessed by comparing with satellite-derived sea surface temperature (SST), sea surface height (SSH), and in-situ hydrographic sections. The Kuroshio axes were also computed from the surface currents of these products and were compared with the Kuroshio Axis data produced analyzing satellite-SST, SSH, and in-situ observations by Marine Information Research Center (MIRC). The multi-model ensemble products generally showed the best accuracy in terms of the comparisons with the satellite-derived SST and SSH. On the other hand, the ensemble products didn't result in the best one in the comparison with the hydrographic sections. It is thus suggested that the multi-model ensemble works efficiently for the horizontally 2D parameters for which each assimilation product tends to have random errors while it does not work well for the vertical 2D comparisons for which it tends to have bias errors with respect to in-situ data. In the assessment with the Kuroshio Axis Data, some products showed more energetic behavior than the Kuroshio Axis data, resulting in the large path errors which are defined as a ratio between an area surrounded by the reference and model-derived ones and a path length. It is however not determined which are real, because in-situ observations are still lacking to resolve energetic Kuroshio behavior even though the Kuroshio is one of the strongest current.
NASA Astrophysics Data System (ADS)
Pollard, D.; Chang, W.; Haran, M.; Applegate, P.; DeConto, R.
2015-11-01
A 3-D hybrid ice-sheet model is applied to the last deglacial retreat of the West Antarctic Ice Sheet over the last ~ 20 000 years. A large ensemble of 625 model runs is used to calibrate the model to modern and geologic data, including reconstructed grounding lines, relative sea-level records, elevation-age data and uplift rates, with an aggregate score computed for each run that measures overall model-data misfit. Two types of statistical methods are used to analyze the large-ensemble results: simple averaging weighted by the aggregate score, and more advanced Bayesian techniques involving Gaussian process-based emulation and calibration, and Markov chain Monte Carlo. Results for best-fit parameter ranges and envelopes of equivalent sea-level rise with the simple averaging method agree quite well with the more advanced techniques, but only for a large ensemble with full factorial parameter sampling. Best-fit parameter ranges confirm earlier values expected from prior model tuning, including large basal sliding coefficients on modern ocean beds. Each run is extended 5000 years into the "future" with idealized ramped climate warming. In the majority of runs with reasonable scores, this produces grounding-line retreat deep into the West Antarctic interior, and the analysis provides sea-level-rise envelopes with well defined parametric uncertainty bounds.
Bashir, Saba; Qamar, Usman; Khan, Farhan Hassan
2016-02-01
Accuracy plays a vital role in the medical field as it concerns with the life of an individual. Extensive research has been conducted on disease classification and prediction using machine learning techniques. However, there is no agreement on which classifier produces the best results. A specific classifier may be better than others for a specific dataset, but another classifier could perform better for some other dataset. Ensemble of classifiers has been proved to be an effective way to improve classification accuracy. In this research we present an ensemble framework with multi-layer classification using enhanced bagging and optimized weighting. The proposed model called "HM-BagMoov" overcomes the limitations of conventional performance bottlenecks by utilizing an ensemble of seven heterogeneous classifiers. The framework is evaluated on five different heart disease datasets, four breast cancer datasets, two diabetes datasets, two liver disease datasets and one hepatitis dataset obtained from public repositories. The analysis of the results show that ensemble framework achieved the highest accuracy, sensitivity and F-Measure when compared with individual classifiers for all the diseases. In addition to this, the ensemble framework also achieved the highest accuracy when compared with the state of the art techniques. An application named "IntelliHealth" is also developed based on proposed model that may be used by hospitals/doctors for diagnostic advice. Copyright © 2015 Elsevier Inc. All rights reserved.
Nonmechanistic forecasts of seasonal influenza with iterative one-week-ahead distributions.
Brooks, Logan C; Farrow, David C; Hyun, Sangwon; Tibshirani, Ryan J; Rosenfeld, Roni
2018-06-15
Accurate and reliable forecasts of seasonal epidemics of infectious disease can assist in the design of countermeasures and increase public awareness and preparedness. This article describes two main contributions we made recently toward this goal: a novel approach to probabilistic modeling of surveillance time series based on "delta densities", and an optimization scheme for combining output from multiple forecasting methods into an adaptively weighted ensemble. Delta densities describe the probability distribution of the change between one observation and the next, conditioned on available data; chaining together nonparametric estimates of these distributions yields a model for an entire trajectory. Corresponding distributional forecasts cover more observed events than alternatives that treat the whole season as a unit, and improve upon multiple evaluation metrics when extracting key targets of interest to public health officials. Adaptively weighted ensembles integrate the results of multiple forecasting methods, such as delta density, using weights that can change from situation to situation. We treat selection of optimal weightings across forecasting methods as a separate estimation task, and describe an estimation procedure based on optimizing cross-validation performance. We consider some details of the data generation process, including data revisions and holiday effects, both in the construction of these forecasting methods and when performing retrospective evaluation. The delta density method and an adaptively weighted ensemble of other forecasting methods each improve significantly on the next best ensemble component when applied separately, and achieve even better cross-validated performance when used in conjunction. We submitted real-time forecasts based on these contributions as part of CDC's 2015/2016 FluSight Collaborative Comparison. Among the fourteen submissions that season, this system was ranked by CDC as the most accurate.
Lu, Qing; Kim, Jaegil; Straub, John E
2013-03-14
The generalized Replica Exchange Method (gREM) is extended into the isobaric-isothermal ensemble, and applied to simulate a vapor-liquid phase transition in Lennard-Jones fluids. Merging an optimally designed generalized ensemble sampling with replica exchange, gREM is particularly well suited for the effective simulation of first-order phase transitions characterized by "backbending" in the statistical temperature. While the metastable and unstable states in the vicinity of the first-order phase transition are masked by the enthalpy gap in temperature replica exchange method simulations, they are transformed into stable states through the parameterized effective sampling weights in gREM simulations, and join vapor and liquid phases with a succession of unimodal enthalpy distributions. The enhanced sampling across metastable and unstable states is achieved without the need to identify a "good" order parameter for biased sampling. We performed gREM simulations at various pressures below and near the critical pressure to examine the change in behavior of the vapor-liquid phase transition at different pressures. We observed a crossover from the first-order phase transition at low pressure, characterized by the backbending in the statistical temperature and the "kink" in the Gibbs free energy, to a continuous second-order phase transition near the critical pressure. The controlling mechanisms of nucleation and continuous phase transition are evident and the coexistence properties and phase diagram are found in agreement with literature results.
MSEBAG: a dynamic classifier ensemble generation based on `minimum-sufficient ensemble' and bagging
NASA Astrophysics Data System (ADS)
Chen, Lei; Kamel, Mohamed S.
2016-01-01
In this paper, we propose a dynamic classifier system, MSEBAG, which is characterised by searching for the 'minimum-sufficient ensemble' and bagging at the ensemble level. It adopts an 'over-generation and selection' strategy and aims to achieve a good bias-variance trade-off. In the training phase, MSEBAG first searches for the 'minimum-sufficient ensemble', which maximises the in-sample fitness with the minimal number of base classifiers. Then, starting from the 'minimum-sufficient ensemble', a backward stepwise algorithm is employed to generate a collection of ensembles. The objective is to create a collection of ensembles with a descending fitness on the data, as well as a descending complexity in the structure. MSEBAG dynamically selects the ensembles from the collection for the decision aggregation. The extended adaptive aggregation (EAA) approach, a bagging-style algorithm performed at the ensemble level, is employed for this task. EAA searches for the competent ensembles using a score function, which takes into consideration both the in-sample fitness and the confidence of the statistical inference, and averages the decisions of the selected ensembles to label the test pattern. The experimental results show that the proposed MSEBAG outperforms the benchmarks on average.
Working Conditions, Socioeconomic Factors and Low Birth Weight: Path Analysis
Mahmoodi, Zohreh; Karimlou, Masoud; Sajjadi, Homeira; Dejman, Masoumeh; Vameghi, Meroe; Dolatian, Mahrokh
2013-01-01
Background In recent years, with socioeconomic changes in the society, the presence of women in the workplace is inevitable. The differences in working condition, especially for pregnant women, has adverse consequences like low birth weight. Objectives This study was conducted with the aim to model the relationship between working conditions, socioeconomic factors, and birth weight. Patients and Methods This study was conducted in case-control design. The control group consisted of 500 women with normal weight babies, and the case group, 250 women with low weight babies from selected hospitals in Tehran. Data were collected using a researcher-made questionnaire to determine mothers’ lifestyle during pregnancy with low birth weight with health-affecting social determinants approach. This questionnaire investigated women’s occupational lifestyle in terms of working conditions, activities, and job satisfaction. Data were analyzed with SPSS-16 and Lisrel-8.8 software using statistical path analysis. Results The final path model fitted well (CFI =1, RMSEA=0.00) and showed that among direct paths, working condition (β=-0.032), among indirect paths, household income (β=-0.42), and in the overall effect, unemployed spouse (β=-0.1828) had the most effects on the low birth weight. Negative coefficients indicate decreasing effect on birth weight. Conclusions Based on the path analysis model, working condition and socioeconomic status directly and indirectly influence birth weight. Thus, as well as attention to treatment and health care (biological aspect), special attention must also be paid to mothers’ socioeconomic factors. PMID:24616796
Using multiple travel paths to estimate daily travel distance in arboreal, group-living primates.
Steel, Ruth Irene
2015-01-01
Primate field studies often estimate daily travel distance (DTD) in order to estimate energy expenditure and/or test foraging hypotheses. In group-living species, the center of mass (CM) method is traditionally used to measure DTD; a point is marked at the group's perceived center of mass at a set time interval or upon each move, and the distance between consecutive points is measured and summed. However, for groups using multiple travel paths, the CM method potentially creates a central path that is shorter than the individual paths and/or traverses unused areas. These problems may compromise tests of foraging hypotheses, since distance and energy expenditure could be underestimated. To better understand the magnitude of these potential biases, I designed and tested the multiple travel paths (MTP) method, in which DTD was calculated by recording all travel paths taken by the group's members, weighting each path's distance based on its proportional use by the group, and summing the weighted distances. To compare the MTP and CM methods, DTD was calculated using both methods in three groups of Udzungwa red colobus monkeys (Procolobus gordonorum; group size 30-43) for a random sample of 30 days between May 2009 and March 2010. Compared to the CM method, the MTP method provided significantly longer estimates of DTD that were more representative of the actual distance traveled and the areas used by a group. The MTP method is more time-intensive and requires multiple observers compared to the CM method. However, it provides greater accuracy for testing ecological and foraging models.
Impulsive noise suppression in color images based on the geodesic digital paths
NASA Astrophysics Data System (ADS)
Smolka, Bogdan; Cyganek, Boguslaw
2015-02-01
In the paper a novel filtering design based on the concept of exploration of the pixel neighborhood by digital paths is presented. The paths start from the boundary of a filtering window and reach its center. The cost of transitions between adjacent pixels is defined in the hybrid spatial-color space. Then, an optimal path of minimum total cost, leading from pixels of the window's boundary to its center is determined. The cost of an optimal path serves as a degree of similarity of the central pixel to the samples from the local processing window. If a pixel is an outlier, then all the paths starting from the window's boundary will have high costs and the minimum one will also be high. The filter output is calculated as a weighted mean of the central pixel and an estimate constructed using the information on the minimum cost assigned to each image pixel. So, first the costs of optimal paths are used to build a smoothed image and in the second step the minimum cost of the central pixel is utilized for construction of the weights of a soft-switching scheme. The experiments performed on a set of standard color images, revealed that the efficiency of the proposed algorithm is superior to the state-of-the-art filtering techniques in terms of the objective restoration quality measures, especially for high noise contamination ratios. The proposed filter, due to its low computational complexity, can be applied for real time image denoising and also for the enhancement of video streams.
Santander, Julian E; Tsapatsis, Michael; Auerbach, Scott M
2013-04-16
We have constructed and applied an algorithm to simulate the behavior of zeolite frameworks during liquid adsorption. We applied this approach to compute the adsorption isotherms of furfural-water and hydroxymethyl furfural (HMF)-water mixtures adsorbing in silicalite zeolite at 300 K for comparison with experimental data. We modeled these adsorption processes under two different statistical mechanical ensembles: the grand canonical (V-Nz-μg-T or GC) ensemble keeping volume fixed, and the P-Nz-μg-T (osmotic) ensemble allowing volume to fluctuate. To optimize accuracy and efficiency, we compared pure Monte Carlo (MC) sampling to hybrid MC-molecular dynamics (MD) simulations. For the external furfural-water and HMF-water phases, we assumed the ideal solution approximation and employed a combination of tabulated data and extended ensemble simulations for computing solvation free energies. We found that MC sampling in the V-Nz-μg-T ensemble (i.e., standard GCMC) does a poor job of reproducing both the Henry's law regime and the saturation loadings of these systems. Hybrid MC-MD sampling of the V-Nz-μg-T ensemble, which includes framework vibrations at fixed total volume, provides better results in the Henry's law region, but this approach still does not reproduce experimental saturation loadings. Pure MC sampling of the osmotic ensemble was found to approach experimental saturation loadings more closely, whereas hybrid MC-MD sampling of the osmotic ensemble quantitatively reproduces such loadings because the MC-MD approach naturally allows for locally anisotropic volume changes wherein some pores expand whereas others contract.
NASA Astrophysics Data System (ADS)
Tito Arandia Martinez, Fabian
2014-05-01
Adequate uncertainty assessment is an important issue in hydrological modelling. An important issue for hydropower producers is to obtain ensemble forecasts which truly grasp the uncertainty linked to upcoming streamflows. If properly assessed, this uncertainty can lead to optimal reservoir management and energy production (ex. [1]). The meteorological inputs to the hydrological model accounts for an important part of the total uncertainty in streamflow forecasting. Since the creation of the THORPEX initiative and the TIGGE database, access to meteorological ensemble forecasts from nine agencies throughout the world have been made available. This allows for hydrological ensemble forecasts based on multiple meteorological ensemble forecasts. Consequently, both the uncertainty linked to the architecture of the meteorological model and the uncertainty linked to the initial condition of the atmosphere can be accounted for. The main objective of this work is to show that a weighted combination of meteorological ensemble forecasts based on different atmospheric models can lead to improved hydrological ensemble forecasts, for horizons from one to ten days. This experiment is performed for the Baskatong watershed, a head subcatchment of the Gatineau watershed in the province of Quebec, in Canada. Baskatong watershed is of great importance for hydro-power production, as it comprises the main reservoir for the Gatineau watershed, on which there are six hydropower plants managed by Hydro-Québec. Since the 70's, they have been using pseudo ensemble forecast based on deterministic meteorological forecasts to which variability derived from past forecasting errors is added. We use a combination of meteorological ensemble forecasts from different models (precipitation and temperature) as the main inputs for hydrological model HSAMI ([2]). The meteorological ensembles from eight of the nine agencies available through TIGGE are weighted according to their individual performance and combined to form a grand ensemble. Results show that the hydrological forecasts derived from the grand ensemble perform better than the pseudo ensemble forecasts actually used operationally at Hydro-Québec. References: [1] M. Verbunt, A. Walser, J. Gurtz et al., "Probabilistic flood forecasting with a limited-area ensemble prediction system: Selected case studies," Journal of Hydrometeorology, vol. 8, no. 4, pp. 897-909, Aug, 2007. [2] N. Evora, Valorisation des prévisions météorologiques d'ensemble, Institu de recherceh d'Hydro-Québec 2005. [3] V. Fortin, Le modèle météo-apport HSAMI: historique, théorie et application, Institut de recherche d'Hydro-Québec, 2000.
Climatological Observations for Maritime Prediction and Analysis Support Service (COMPASS)
NASA Astrophysics Data System (ADS)
OConnor, A.; Kirtman, B. P.; Harrison, S.; Gorman, J.
2016-02-01
Current US Navy forecasting systems cannot easily incorporate extended-range forecasts that can improve mission readiness and effectiveness; ensure safety; and reduce cost, labor, and resource requirements. If Navy operational planners had systems that incorporated these forecasts, they could plan missions using more reliable and longer-term weather and climate predictions. Further, using multi-model forecast ensembles instead of single forecasts would produce higher predictive performance. Extended-range multi-model forecast ensembles, such as those available in the North American Multi-Model Ensemble (NMME), are ideal for system integration because of their high skill predictions; however, even higher skill predictions can be produced if forecast model ensembles are combined correctly. While many methods for weighting models exist, the best method in a given environment requires expert knowledge of the models and combination methods.We present an innovative approach that uses machine learning to combine extended-range predictions from multi-model forecast ensembles and generate a probabilistic forecast for any region of the globe up to 12 months in advance. Our machine-learning approach uses 30 years of hindcast predictions to learn patterns of forecast model successes and failures. Each model is assigned a weight for each environmental condition, 100 km2 region, and day given any expected environmental information. These weights are then applied to the respective predictions for the region and time of interest to effectively stitch together a single, coherent probabilistic forecast. Our experimental results demonstrate the benefits of our approach to produce extended-range probabilistic forecasts for regions and time periods of interest that are superior, in terms of skill, to individual NMME forecast models and commonly weighted models. The probabilistic forecast leverages the strengths of three NMME forecast models to predict environmental conditions for an area spanning from San Diego, CA to Honolulu, HI, seven months in-advance. Key findings include: weighted combinations of models are strictly better than individual models; machine-learned combinations are especially better; and forecasts produced using our approach have the highest rank probability skill score most often.
The Origins of the "Fanga" Dance
ERIC Educational Resources Information Center
Damm, Robert J.
2015-01-01
The "fanga" is a dance taught throughout the United States to children in elementary music classes, students in African dance classes, teachers in multicultural workshops, and professional dancers in touring ensembles. Although the history of the fanga is a path overgrown with myth, this article offers information about the dance's…
Muždalo, Anja; Saalfrank, Peter; Vreede, Jocelyne; Santer, Mark
2018-04-10
Azobenzene-based molecular photoswitches are becoming increasingly important for the development of photoresponsive, functional soft-matter material systems. Upon illumination with light, fast interconversion between a more stable trans and a metastable cis configuration can be established resulting in pronounced changes in conformation, dipole moment or hydrophobicity. A rational design of functional photosensitive molecules with embedded azo moieties requires a thorough understanding of isomerization mechanisms and rates, especially the thermally activated relaxation. For small azo derivatives considered in the gas phase or simple solvents, Eyring's classical transition state theory (TST) approach yields useful predictions for trends in activation energies or corresponding half-life times of the cis isomer. However, TST or improved theories cannot easily be applied when the azo moiety is part of a larger molecular complex or embedded into a heterogeneous environment, where a multitude of possible reaction pathways may exist. In these cases, only the sampling of an ensemble of dynamic reactive trajectories (transition path sampling, TPS) with explicit models of the environment may reveal the nature of the processes involved. In the present work we show how a TPS approach can conveniently be implemented for the phenomenon of relaxation-isomerization of azobenzenes starting with the simple examples of pure azobenzene and a push-pull derivative immersed in a polar (DMSO) and apolar (toluene) solvent. The latter are represented explicitly at a molecular mechanical (MM) and the azo moiety at a quantum mechanical (QM) level. We demonstrate for the push-pull azobenzene that path sampling in combination with the chosen QM/MM scheme produces the expected change in isomerization pathway from inversion to rotation in going from a low to a high permittivity (explicit) solvent model. We discuss the potential of the simulation procedure presented for comparative calculation of reaction rates and an improved understanding of activated states.
DOE Office of Scientific and Technical Information (OSTI.GOV)
D'Elia, M.; Edwards, H. C.; Hu, J.
Previous work has demonstrated that propagating groups of samples, called ensembles, together through forward simulations can dramatically reduce the aggregate cost of sampling-based uncertainty propagation methods [E. Phipps, M. D'Elia, H. C. Edwards, M. Hoemmen, J. Hu, and S. Rajamanickam, SIAM J. Sci. Comput., 39 (2017), pp. C162--C193]. However, critical to the success of this approach when applied to challenging problems of scientific interest is the grouping of samples into ensembles to minimize the total computational work. For example, the total number of linear solver iterations for ensemble systems may be strongly influenced by which samples form the ensemble whenmore » applying iterative linear solvers to parameterized and stochastic linear systems. In this paper we explore sample grouping strategies for local adaptive stochastic collocation methods applied to PDEs with uncertain input data, in particular canonical anisotropic diffusion problems where the diffusion coefficient is modeled by truncated Karhunen--Loève expansions. Finally, we demonstrate that a measure of the total anisotropy of the diffusion coefficient is a good surrogate for the number of linear solver iterations for each sample and therefore provides a simple and effective metric for grouping samples.« less
D'Elia, M.; Edwards, H. C.; Hu, J.; ...
2018-01-18
Previous work has demonstrated that propagating groups of samples, called ensembles, together through forward simulations can dramatically reduce the aggregate cost of sampling-based uncertainty propagation methods [E. Phipps, M. D'Elia, H. C. Edwards, M. Hoemmen, J. Hu, and S. Rajamanickam, SIAM J. Sci. Comput., 39 (2017), pp. C162--C193]. However, critical to the success of this approach when applied to challenging problems of scientific interest is the grouping of samples into ensembles to minimize the total computational work. For example, the total number of linear solver iterations for ensemble systems may be strongly influenced by which samples form the ensemble whenmore » applying iterative linear solvers to parameterized and stochastic linear systems. In this paper we explore sample grouping strategies for local adaptive stochastic collocation methods applied to PDEs with uncertain input data, in particular canonical anisotropic diffusion problems where the diffusion coefficient is modeled by truncated Karhunen--Loève expansions. Finally, we demonstrate that a measure of the total anisotropy of the diffusion coefficient is a good surrogate for the number of linear solver iterations for each sample and therefore provides a simple and effective metric for grouping samples.« less
Improving wave forecasting by integrating ensemble modelling and machine learning
NASA Astrophysics Data System (ADS)
O'Donncha, F.; Zhang, Y.; James, S. C.
2017-12-01
Modern smart-grid networks use technologies to instantly relay information on supply and demand to support effective decision making. Integration of renewable-energy resources with these systems demands accurate forecasting of energy production (and demand) capacities. For wave-energy converters, this requires wave-condition forecasting to enable estimates of energy production. Current operational wave forecasting systems exhibit substantial errors with wave-height RMSEs of 40 to 60 cm being typical, which limits the reliability of energy-generation predictions thereby impeding integration with the distribution grid. In this study, we integrate physics-based models with statistical learning aggregation techniques that combine forecasts from multiple, independent models into a single "best-estimate" prediction of the true state. The Simulating Waves Nearshore physics-based model is used to compute wind- and currents-augmented waves in the Monterey Bay area. Ensembles are developed based on multiple simulations perturbing input data (wave characteristics supplied at the model boundaries and winds) to the model. A learning-aggregation technique uses past observations and past model forecasts to calculate a weight for each model. The aggregated forecasts are compared to observation data to quantify the performance of the model ensemble and aggregation techniques. The appropriately weighted ensemble model outperforms an individual ensemble member with regard to forecasting wave conditions.
Simulation studies of the fidelity of biomolecular structure ensemble recreation
NASA Astrophysics Data System (ADS)
Lätzer, Joachim; Eastwood, Michael P.; Wolynes, Peter G.
2006-12-01
We examine the ability of Bayesian methods to recreate structural ensembles for partially folded molecules from averaged data. Specifically we test the ability of various algorithms to recreate different transition state ensembles for folding proteins using a multiple replica simulation algorithm using input from "gold standard" reference ensembles that were first generated with a Gō-like Hamiltonian having nonpairwise additive terms. A set of low resolution data, which function as the "experimental" ϕ values, were first constructed from this reference ensemble. The resulting ϕ values were then treated as one would treat laboratory experimental data and were used as input in the replica reconstruction algorithm. The resulting ensembles of structures obtained by the replica algorithm were compared to the gold standard reference ensemble, from which those "data" were, in fact, obtained. It is found that for a unimodal transition state ensemble with a low barrier, the multiple replica algorithm does recreate the reference ensemble fairly successfully when no experimental error is assumed. The Kolmogorov-Smirnov test as well as principal component analysis show that the overlap of the recovered and reference ensembles is significantly enhanced when multiple replicas are used. Reduction of the multiple replica ensembles by clustering successfully yields subensembles with close similarity to the reference ensembles. On the other hand, for a high barrier transition state with two distinct transition state ensembles, the single replica algorithm only samples a few structures of one of the reference ensemble basins. This is due to the fact that the ϕ values are intrinsically ensemble averaged quantities. The replica algorithm with multiple copies does sample both reference ensemble basins. In contrast to the single replica case, the multiple replicas are constrained to reproduce the average ϕ values, but allow fluctuations in ϕ for each individual copy. These fluctuations facilitate a more faithful sampling of the reference ensemble basins. Finally, we test how robustly the reconstruction algorithm can function by introducing errors in ϕ comparable in magnitude to those suggested by some authors. In this circumstance we observe that the chances of ensemble recovery with the replica algorithm are poor using a single replica, but are improved when multiple copies are used. A multimodal transition state ensemble, however, turns out to be more sensitive to large errors in ϕ (if appropriately gauged) and attempts at successful recreation of the reference ensemble with simple replica algorithms can fall short.
Jarzynski equality in the context of maximum path entropy
NASA Astrophysics Data System (ADS)
González, Diego; Davis, Sergio
2017-06-01
In the global framework of finding an axiomatic derivation of nonequilibrium Statistical Mechanics from fundamental principles, such as the maximum path entropy - also known as Maximum Caliber principle -, this work proposes an alternative derivation of the well-known Jarzynski equality, a nonequilibrium identity of great importance today due to its applications to irreversible processes: biological systems (protein folding), mechanical systems, among others. This equality relates the free energy differences between two equilibrium thermodynamic states with the work performed when going between those states, through an average over a path ensemble. In this work the analysis of Jarzynski's equality will be performed using the formalism of inference over path space. This derivation highlights the wide generality of Jarzynski's original result, which could even be used in non-thermodynamical settings such as social systems, financial and ecological systems.
A method for determining the weak statistical stationarity of a random process
NASA Technical Reports Server (NTRS)
Sadeh, W. Z.; Koper, C. A., Jr.
1978-01-01
A method for determining the weak statistical stationarity of a random process is presented. The core of this testing procedure consists of generating an equivalent ensemble which approximates a true ensemble. Formation of an equivalent ensemble is accomplished through segmenting a sufficiently long time history of a random process into equal, finite, and statistically independent sample records. The weak statistical stationarity is ascertained based on the time invariance of the equivalent-ensemble averages. Comparison of these averages with their corresponding time averages over a single sample record leads to a heuristic estimate of the ergodicity of a random process. Specific variance tests are introduced for evaluating the statistical independence of the sample records, the time invariance of the equivalent-ensemble autocorrelations, and the ergodicity. Examination and substantiation of these procedures were conducted utilizing turbulent velocity signals.
A Kolmogorov-Smirnov test for the molecular clock based on Bayesian ensembles of phylogenies
Antoneli, Fernando; Passos, Fernando M.; Lopes, Luciano R.
2018-01-01
Divergence date estimates are central to understand evolutionary processes and depend, in the case of molecular phylogenies, on tests of molecular clocks. Here we propose two non-parametric tests of strict and relaxed molecular clocks built upon a framework that uses the empirical cumulative distribution (ECD) of branch lengths obtained from an ensemble of Bayesian trees and well known non-parametric (one-sample and two-sample) Kolmogorov-Smirnov (KS) goodness-of-fit test. In the strict clock case, the method consists in using the one-sample Kolmogorov-Smirnov (KS) test to directly test if the phylogeny is clock-like, in other words, if it follows a Poisson law. The ECD is computed from the discretized branch lengths and the parameter λ of the expected Poisson distribution is calculated as the average branch length over the ensemble of trees. To compensate for the auto-correlation in the ensemble of trees and pseudo-replication we take advantage of thinning and effective sample size, two features provided by Bayesian inference MCMC samplers. Finally, it is observed that tree topologies with very long or very short branches lead to Poisson mixtures and in this case we propose the use of the two-sample KS test with samples from two continuous branch length distributions, one obtained from an ensemble of clock-constrained trees and the other from an ensemble of unconstrained trees. Moreover, in this second form the test can also be applied to test for relaxed clock models. The use of a statistically equivalent ensemble of phylogenies to obtain the branch lengths ECD, instead of one consensus tree, yields considerable reduction of the effects of small sample size and provides a gain of power. PMID:29300759
Brekke, L.D.; Dettinger, M.D.; Maurer, E.P.; Anderson, M.
2008-01-01
Ensembles of historical climate simulations and climate projections from the World Climate Research Programme's (WCRP's) Coupled Model Intercomparison Project phase 3 (CMIP3) multi-model dataset were investigated to determine how model credibility affects apparent relative scenario likelihoods in regional risk assessments. Methods were developed and applied in a Northern California case study. An ensemble of 59 twentieth century climate simulations from 17 WCRP CMIP3 models was analyzed to evaluate relative model credibility associated with a 75-member projection ensemble from the same 17 models. Credibility was assessed based on how models realistically reproduced selected statistics of historical climate relevant to California climatology. Metrics of this credibility were used to derive relative model weights leading to weight-threshold culling of models contributing to the projection ensemble. Density functions were then estimated for two projected quantities (temperature and precipitation), with and without considering credibility-based ensemble reductions. An analysis for Northern California showed that, while some models seem more capable at recreating limited aspects twentieth century climate, the overall tendency is for comparable model performance when several credibility measures are combined. Use of these metrics to decide which models to include in density function development led to local adjustments to function shapes, but led to limited affect on breadth and central tendency, which were found to be more influenced by 'completeness' of the original ensemble in terms of models and emissions pathways. ?? 2007 Springer Science+Business Media B.V.
A model ensemble for projecting multi‐decadal coastal cliff retreat during the 21st century
Limber, Patrick; Barnard, Patrick; Vitousek, Sean; Erikson, Li
2018-01-01
Sea cliff retreat rates are expected to accelerate with rising sea levels during the 21st century. Here we develop an approach for a multi‐model ensemble that efficiently projects time‐averaged sea cliff retreat over multi‐decadal time scales and large (>50 km) spatial scales. The ensemble consists of five simple 1‐D models adapted from the literature that relate sea cliff retreat to wave impacts, sea level rise (SLR), historical cliff behavior, and cross‐shore profile geometry. Ensemble predictions are based on Monte Carlo simulations of each individual model, which account for the uncertainty of model parameters. The consensus of the individual models also weights uncertainty, such that uncertainty is greater when predictions from different models do not agree. A calibrated, but unvalidated, ensemble was applied to the 475 km‐long coastline of Southern California (USA), with 4 SLR scenarios of 0.5, 0.93, 1.5, and 2 m by 2100. Results suggest that future retreat rates could increase relative to mean historical rates by more than two‐fold for the higher SLR scenarios, causing an average total land loss of 19 – 41 m by 2100. However, model uncertainty ranges from +/‐ 5 – 15 m, reflecting the inherent difficulties of projecting cliff retreat over multiple decades. To enhance ensemble performance, future work could include weighting each model by its skill in matching observations in different morphological settings
DOE Office of Scientific and Technical Information (OSTI.GOV)
Man, Jun; Zhang, Jiangjiang; Li, Weixuan
2016-10-01
The ensemble Kalman filter (EnKF) has been widely used in parameter estimation for hydrological models. The focus of most previous studies was to develop more efficient analysis (estimation) algorithms. On the other hand, it is intuitively understandable that a well-designed sampling (data-collection) strategy should provide more informative measurements and subsequently improve the parameter estimation. In this work, a Sequential Ensemble-based Optimal Design (SEOD) method, coupled with EnKF, information theory and sequential optimal design, is proposed to improve the performance of parameter estimation. Based on the first-order and second-order statistics, different information metrics including the Shannon entropy difference (SD), degrees ofmore » freedom for signal (DFS) and relative entropy (RE) are used to design the optimal sampling strategy, respectively. The effectiveness of the proposed method is illustrated by synthetic one-dimensional and two-dimensional unsaturated flow case studies. It is shown that the designed sampling strategies can provide more accurate parameter estimation and state prediction compared with conventional sampling strategies. Optimal sampling designs based on various information metrics perform similarly in our cases. The effect of ensemble size on the optimal design is also investigated. Overall, larger ensemble size improves the parameter estimation and convergence of optimal sampling strategy. Although the proposed method is applied to unsaturated flow problems in this study, it can be equally applied in any other hydrological problems.« less
Zheng, Lianqing; Chen, Mengen; Yang, Wei
2009-06-21
To overcome the pseudoergodicity problem, conformational sampling can be accelerated via generalized ensemble methods, e.g., through the realization of random walks along prechosen collective variables, such as spatial order parameters, energy scaling parameters, or even system temperatures or pressures, etc. As usually observed, in generalized ensemble simulations, hidden barriers are likely to exist in the space perpendicular to the collective variable direction and these residual free energy barriers could greatly abolish the sampling efficiency. This sampling issue is particularly severe when the collective variable is defined in a low-dimension subset of the target system; then the "Hamiltonian lagging" problem, which reveals the fact that necessary structural relaxation falls behind the move of the collective variable, may be likely to occur. To overcome this problem in equilibrium conformational sampling, we adopted the orthogonal space random walk (OSRW) strategy, which was originally developed in the context of free energy simulation [L. Zheng, M. Chen, and W. Yang, Proc. Natl. Acad. Sci. U.S.A. 105, 20227 (2008)]. Thereby, generalized ensemble simulations can simultaneously escape both the explicit barriers along the collective variable direction and the hidden barriers that are strongly coupled with the collective variable move. As demonstrated in our model studies, the present OSRW based generalized ensemble treatments show improved sampling capability over the corresponding classical generalized ensemble treatments.
An ensemble-based approach for breast mass classification in mammography images
NASA Astrophysics Data System (ADS)
Ribeiro, Patricia B.; Papa, João. P.; Romero, Roseli A. F.
2017-03-01
Mammography analysis is an important tool that helps detecting breast cancer at the very early stages of the disease, thus increasing the quality of life of hundreds of thousands of patients worldwide. In Computer-Aided Detection systems, the identification of mammograms with and without masses (without clinical findings) is highly needed to reduce the false positive rates regarding the automatic selection of regions of interest that may contain some suspicious content. In this work, the introduce a variant of the Optimum-Path Forest (OPF) classifier for breast mass identification, as well as we employed an ensemble-based approach that can enhance the effectiveness of individual classifiers aiming at dealing with the aforementioned purpose. The experimental results also comprise the naïve OPF and a traditional neural network, being the most accurate results obtained through the ensemble of classifiers, with an accuracy nearly to 86%.
NASA Astrophysics Data System (ADS)
Bianconi, Ginestra
2009-03-01
In this paper we generalize the concept of random networks to describe network ensembles with nontrivial features by a statistical mechanics approach. This framework is able to describe undirected and directed network ensembles as well as weighted network ensembles. These networks might have nontrivial community structure or, in the case of networks embedded in a given space, they might have a link probability with a nontrivial dependence on the distance between the nodes. These ensembles are characterized by their entropy, which evaluates the cardinality of networks in the ensemble. In particular, in this paper we define and evaluate the structural entropy, i.e., the entropy of the ensembles of undirected uncorrelated simple networks with given degree sequence. We stress the apparent paradox that scale-free degree distributions are characterized by having small structural entropy while they are so widely encountered in natural, social, and technological complex systems. We propose a solution to the paradox by proving that scale-free degree distributions are the most likely degree distribution with the corresponding value of the structural entropy. Finally, the general framework we present in this paper is able to describe microcanonical ensembles of networks as well as canonical or hidden-variable network ensembles with significant implications for the formulation of network-constructing algorithms.
Link prediction based on local weighted paths for complex networks
NASA Astrophysics Data System (ADS)
Yao, Yabing; Zhang, Ruisheng; Yang, Fan; Yuan, Yongna; Hu, Rongjing; Zhao, Zhili
As a significant problem in complex networks, link prediction aims to find the missing and future links between two unconnected nodes by estimating the existence likelihood of potential links. It plays an important role in understanding the evolution mechanism of networks and has broad applications in practice. In order to improve prediction performance, a variety of structural similarity-based methods that rely on different topological features have been put forward. As one topological feature, the path information between node pairs is utilized to calculate the node similarity. However, many path-dependent methods neglect the different contributions of paths for a pair of nodes. In this paper, a local weighted path (LWP) index is proposed to differentiate the contributions between paths. The LWP index considers the effect of the link degrees of intermediate links and the connectivity influence of intermediate nodes on paths to quantify the path weight in the prediction procedure. The experimental results on 12 real-world networks show that the LWP index outperforms other seven prediction baselines.
A brief history of the introduction of generalized ensembles to Markov chain Monte Carlo simulations
NASA Astrophysics Data System (ADS)
Berg, Bernd A.
2017-03-01
The most efficient weights for Markov chain Monte Carlo calculations of physical observables are not necessarily those of the canonical ensemble. Generalized ensembles, which do not exist in nature but can be simulated on computers, lead often to a much faster convergence. In particular, they have been used for simulations of first order phase transitions and for simulations of complex systems in which conflicting constraints lead to a rugged free energy landscape. Starting off with the Metropolis algorithm and Hastings' extension, I present a minireview which focuses on the explosive use of generalized ensembles in the early 1990s. Illustrations are given, which range from spin models to peptides.
WE-E-BRE-05: Ensemble of Graphical Models for Predicting Radiation Pneumontis Risk
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, S; Ybarra, N; Jeyaseelan, K
Purpose: We propose a prior knowledge-based approach to construct an interaction graph of biological and dosimetric radiation pneumontis (RP) covariates for the purpose of developing a RP risk classifier. Methods: We recruited 59 NSCLC patients who received curative radiotherapy with minimum 6 month follow-up. 16 RP events was observed (CTCAE grade ≥2). Blood serum was collected from every patient before (pre-RT) and during RT (mid-RT). From each sample the concentration of the following five candidate biomarkers were taken as covariates: alpha-2-macroglobulin (α2M), angiotensin converting enzyme (ACE), transforming growth factor β (TGF-β), interleukin-6 (IL-6), and osteopontin (OPN). Dose-volumetric parameters were alsomore » included as covariates. The number of biological and dosimetric covariates was reduced by a variable selection scheme implemented by L1-regularized logistic regression (LASSO). Posterior probability distribution of interaction graphs between the selected variables was estimated from the data under the literature-based prior knowledge to weight more heavily the graphs that contain the expected associations. A graph ensemble was formed by averaging the most probable graphs weighted by their posterior, creating a Bayesian Network (BN)-based RP risk classifier. Results: The LASSO selected the following 7 RP covariates: (1) pre-RT concentration level of α2M, (2) α2M level mid- RT/pre-RT, (3) pre-RT IL6 level, (4) IL6 level mid-RT/pre-RT, (5) ACE mid-RT/pre-RT, (6) PTV volume, and (7) mean lung dose (MLD). The ensemble BN model achieved the maximum sensitivity/specificity of 81%/84% and outperformed univariate dosimetric predictors as shown by larger AUC values (0.78∼0.81) compared with MLD (0.61), V20 (0.65) and V30 (0.70). The ensembles obtained by incorporating the prior knowledge improved classification performance for the ensemble size 5∼50. Conclusion: We demonstrated a probabilistic ensemble method to detect robust associations between RP covariates and its potential to improve RP prediction accuracy. Our Bayesian approach to incorporate prior knowledge can enhance efficiency in searching of such associations from data. The authors acknowledge partial support by: 1) CREATE Medical Physics Research Training Network grant of the Natural Sciences and Engineering Research Council (Grant number: 432290) and 2) The Terry Fox Foundation Strategic Training Initiative for Excellence in Radiation Research for the 21st Century (EIRR21)« less
ERIC Educational Resources Information Center
Murphy, Sean
2013-01-01
The saxophone section of a wind ensemble can easily be one of the most frustrating to work with when it comes to producing a clear, characteristic tone. Sometimes, the road to an improved sound can be a long path of daily diligence and practice; however, there are many quicker solutions that will drastically improve a student's tone. This article…
Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction
Rahman, Raziur; Haider, Saad; Ghosh, Souparno; Pal, Ranadip
2015-01-01
Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees’ prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity prediction problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error. PMID:27081304
A climate model projection weighting scheme accounting for performance and interdependence
NASA Astrophysics Data System (ADS)
Knutti, Reto; Sedláček, Jan; Sanderson, Benjamin M.; Lorenz, Ruth; Fischer, Erich M.; Eyring, Veronika
2017-02-01
Uncertainties of climate projections are routinely assessed by considering simulations from different models. Observations are used to evaluate models, yet there is a debate about whether and how to explicitly weight model projections by agreement with observations. Here we present a straightforward weighting scheme that accounts both for the large differences in model performance and for model interdependencies, and we test reliability in a perfect model setup. We provide weighted multimodel projections of Arctic sea ice and temperature as a case study to demonstrate that, for some questions at least, it is meaningless to treat all models equally. The constrained ensemble shows reduced spread and a more rapid sea ice decline than the unweighted ensemble. We argue that the growing number of models with different characteristics and considerable interdependence finally justifies abandoning strict model democracy, and we provide guidance on when and how this can be achieved robustly.
NASA Astrophysics Data System (ADS)
Liu, Li; Xu, Yue-Ping
2017-04-01
Ensemble flood forecasting driven by numerical weather prediction products is becoming more commonly used in operational flood forecasting applications.In this study, a hydrological ensemble flood forecasting system based on Variable Infiltration Capacity (VIC) model and quantitative precipitation forecasts from TIGGE dataset is constructed for Lanjiang Basin, Southeast China. The impacts of calibration strategies and ensemble methods on the performance of the system are then evaluated.The hydrological model is optimized by parallel programmed ɛ-NSGAII multi-objective algorithm and two respectively parameterized models are determined to simulate daily flows and peak flows coupled with a modular approach.The results indicatethat the ɛ-NSGAII algorithm permits more efficient optimization and rational determination on parameter setting.It is demonstrated that the multimodel ensemble streamflow mean have better skills than the best singlemodel ensemble mean (ECMWF) and the multimodel ensembles weighted on members and skill scores outperform other multimodel ensembles. For typical flood event, it is proved that the flood can be predicted 3-4 days in advance, but the flows in rising limb can be captured with only 1-2 days ahead due to the flash feature. With respect to peak flows selected by Peaks Over Threshold approach, the ensemble means from either singlemodel or multimodels are generally underestimated as the extreme values are smoothed out by ensemble process.
2013-01-01
Background Many problems in protein modeling require obtaining a discrete representation of the protein conformational space as an ensemble of conformations. In ab-initio structure prediction, in particular, where the goal is to predict the native structure of a protein chain given its amino-acid sequence, the ensemble needs to satisfy energetic constraints. Given the thermodynamic hypothesis, an effective ensemble contains low-energy conformations which are similar to the native structure. The high-dimensionality of the conformational space and the ruggedness of the underlying energy surface currently make it very difficult to obtain such an ensemble. Recent studies have proposed that Basin Hopping is a promising probabilistic search framework to obtain a discrete representation of the protein energy surface in terms of local minima. Basin Hopping performs a series of structural perturbations followed by energy minimizations with the goal of hopping between nearby energy minima. This approach has been shown to be effective in obtaining conformations near the native structure for small systems. Recent work by us has extended this framework to larger systems through employment of the molecular fragment replacement technique, resulting in rapid sampling of large ensembles. Methods This paper investigates the algorithmic components in Basin Hopping to both understand and control their effect on the sampling of near-native minima. Realizing that such an ensemble is reduced before further refinement in full ab-initio protocols, we take an additional step and analyze the quality of the ensemble retained by ensemble reduction techniques. We propose a novel multi-objective technique based on the Pareto front to filter the ensemble of sampled local minima. Results and conclusions We show that controlling the magnitude of the perturbation allows directly controlling the distance between consecutively-sampled local minima and, in turn, steering the exploration towards conformations near the native structure. For the minimization step, we show that the addition of Metropolis Monte Carlo-based minimization is no more effective than a simple greedy search. Finally, we show that the size of the ensemble of sampled local minima can be effectively and efficiently reduced by a multi-objective filter to obtain a simpler representation of the probed energy surface. PMID:24564970
NASA Astrophysics Data System (ADS)
Clark, Elizabeth; Wood, Andy; Nijssen, Bart; Mendoza, Pablo; Newman, Andy; Nowak, Kenneth; Arnold, Jeffrey
2017-04-01
In an automated forecast system, hydrologic data assimilation (DA) performs the valuable function of correcting raw simulated watershed model states to better represent external observations, including measurements of streamflow, snow, soil moisture, and the like. Yet the incorporation of automated DA into operational forecasting systems has been a long-standing challenge due to the complexities of the hydrologic system, which include numerous lags between state and output variations. To help demonstrate that such methods can succeed in operational automated implementations, we present results from the real-time application of an ensemble particle filter (PF) for short-range (7 day lead) ensemble flow forecasts in western US river basins. We use the System for Hydromet Applications, Research and Prediction (SHARP), developed by the National Center for Atmospheric Research (NCAR) in collaboration with the University of Washington, U.S. Army Corps of Engineers, and U.S. Bureau of Reclamation. SHARP is a fully automated platform for short-term to seasonal hydrologic forecasting applications, incorporating uncertainty in initial hydrologic conditions (IHCs) and in hydrometeorological predictions through ensemble methods. In this implementation, IHC uncertainty is estimated by propagating an ensemble of 100 temperature and precipitation time series through conceptual and physically-oriented models. The resulting ensemble of derived IHCs exhibits a broad range of possible soil moisture and snow water equivalent (SWE) states. The PF selects and/or weights and resamples the IHCs that are most consistent with external streamflow observations, and uses the particles to initialize a streamflow forecast ensemble driven by ensemble precipitation and temperature forecasts downscaled from the Global Ensemble Forecast System (GEFS). We apply this method in real-time for several basins in the western US that are important for water resources management, and perform a hindcast experiment to evaluate the utility of PF-based data assimilation on streamflow forecasts skill. This presentation describes findings, including a comparison of sequential and non-sequential particle weighting methods.
Oliveira, Roberta B; Pereira, Aledir S; Tavares, João Manuel R S
2017-10-01
The number of deaths worldwide due to melanoma has risen in recent times, in part because melanoma is the most aggressive type of skin cancer. Computational systems have been developed to assist dermatologists in early diagnosis of skin cancer, or even to monitor skin lesions. However, there still remains a challenge to improve classifiers for the diagnosis of such skin lesions. The main objective of this article is to evaluate different ensemble classification models based on input feature manipulation to diagnose skin lesions. Input feature manipulation processes are based on feature subset selections from shape properties, colour variation and texture analysis to generate diversity for the ensemble models. Three subset selection models are presented here: (1) a subset selection model based on specific feature groups, (2) a correlation-based subset selection model, and (3) a subset selection model based on feature selection algorithms. Each ensemble classification model is generated using an optimum-path forest classifier and integrated with a majority voting strategy. The proposed models were applied on a set of 1104 dermoscopic images using a cross-validation procedure. The best results were obtained by the first ensemble classification model that generates a feature subset ensemble based on specific feature groups. The skin lesion diagnosis computational system achieved 94.3% accuracy, 91.8% sensitivity and 96.7% specificity. The input feature manipulation process based on specific feature subsets generated the greatest diversity for the ensemble classification model with very promising results. Copyright © 2017 Elsevier B.V. All rights reserved.
Ren, Fulong; Cao, Peng; Li, Wei; Zhao, Dazhe; Zaiane, Osmar
2017-01-01
Diabetic retinopathy (DR) is a progressive disease, and its detection at an early stage is crucial for saving a patient's vision. An automated screening system for DR can help in reduce the chances of complete blindness due to DR along with lowering the work load on ophthalmologists. Among the earliest signs of DR are microaneurysms (MAs). However, current schemes for MA detection appear to report many false positives because detection algorithms have high sensitivity. Inevitably some non-MAs structures are labeled as MAs in the initial MAs identification step. This is a typical "class imbalance problem". Class imbalanced data has detrimental effects on the performance of conventional classifiers. In this work, we propose an ensemble based adaptive over-sampling algorithm for overcoming the class imbalance problem in the false positive reduction, and we use Boosting, Bagging, Random subspace as the ensemble framework to improve microaneurysm detection. The ensemble based over-sampling methods we proposed combine the strength of adaptive over-sampling and ensemble. The objective of the amalgamation of ensemble and adaptive over-sampling is to reduce the induction biases introduced from imbalanced data and to enhance the generalization classification performance of extreme learning machines (ELM). Experimental results show that our ASOBoost method has higher area under the ROC curve (AUC) and G-mean values than many existing class imbalance learning methods. Copyright © 2016 Elsevier Ltd. All rights reserved.
A global sampling approach to designing and reengineering RNA secondary structures.
Levin, Alex; Lis, Mieszko; Ponty, Yann; O'Donnell, Charles W; Devadas, Srinivas; Berger, Bonnie; Waldispühl, Jérôme
2012-11-01
The development of algorithms for designing artificial RNA sequences that fold into specific secondary structures has many potential biomedical and synthetic biology applications. To date, this problem remains computationally difficult, and current strategies to address it resort to heuristics and stochastic search techniques. The most popular methods consist of two steps: First a random seed sequence is generated; next, this seed is progressively modified (i.e. mutated) to adopt the desired folding properties. Although computationally inexpensive, this approach raises several questions such as (i) the influence of the seed; and (ii) the efficiency of single-path directed searches that may be affected by energy barriers in the mutational landscape. In this article, we present RNA-ensign, a novel paradigm for RNA design. Instead of taking a progressive adaptive walk driven by local search criteria, we use an efficient global sampling algorithm to examine large regions of the mutational landscape under structural and thermodynamical constraints until a solution is found. When considering the influence of the seeds and the target secondary structures, our results show that, compared to single-path directed searches, our approach is more robust, succeeds more often and generates more thermodynamically stable sequences. An ensemble approach to RNA design is thus well worth pursuing as a complement to existing approaches. RNA-ensign is available at http://csb.cs.mcgill.ca/RNAensign.
A global sampling approach to designing and reengineering RNA secondary structures
Levin, Alex; Lis, Mieszko; Ponty, Yann; O’Donnell, Charles W.; Devadas, Srinivas; Berger, Bonnie; Waldispühl, Jérôme
2012-01-01
The development of algorithms for designing artificial RNA sequences that fold into specific secondary structures has many potential biomedical and synthetic biology applications. To date, this problem remains computationally difficult, and current strategies to address it resort to heuristics and stochastic search techniques. The most popular methods consist of two steps: First a random seed sequence is generated; next, this seed is progressively modified (i.e. mutated) to adopt the desired folding properties. Although computationally inexpensive, this approach raises several questions such as (i) the influence of the seed; and (ii) the efficiency of single-path directed searches that may be affected by energy barriers in the mutational landscape. In this article, we present RNA-ensign, a novel paradigm for RNA design. Instead of taking a progressive adaptive walk driven by local search criteria, we use an efficient global sampling algorithm to examine large regions of the mutational landscape under structural and thermodynamical constraints until a solution is found. When considering the influence of the seeds and the target secondary structures, our results show that, compared to single-path directed searches, our approach is more robust, succeeds more often and generates more thermodynamically stable sequences. An ensemble approach to RNA design is thus well worth pursuing as a complement to existing approaches. RNA-ensign is available at http://csb.cs.mcgill.ca/RNAensign. PMID:22941632
An Adaptive Defect Weighted Sampling Algorithm to Design Pseudoknotted RNA Secondary Structures
Zandi, Kasra; Butler, Gregory; Kharma, Nawwaf
2016-01-01
Computational design of RNA sequences that fold into targeted secondary structures has many applications in biomedicine, nanotechnology and synthetic biology. An RNA molecule is made of different types of secondary structure elements and an important RNA element named pseudoknot plays a key role in stabilizing the functional form of the molecule. However, due to the computational complexities associated with characterizing pseudoknotted RNA structures, most of the existing RNA sequence designer algorithms generally ignore this important structural element and therefore limit their applications. In this paper we present a new algorithm to design RNA sequences for pseudoknotted secondary structures. We use NUPACK as the folding algorithm to compute the equilibrium characteristics of the pseudoknotted RNAs, and describe a new adaptive defect weighted sampling algorithm named Enzymer to design low ensemble defect RNA sequences for targeted secondary structures including pseudoknots. We used a biological data set of 201 pseudoknotted structures from the Pseudobase library to benchmark the performance of our algorithm. We compared the quality characteristics of the RNA sequences we designed by Enzymer with the results obtained from the state of the art MODENA and antaRNA. Our results show our method succeeds more frequently than MODENA and antaRNA do, and generates sequences that have lower ensemble defect, lower probability defect and higher thermostability. Finally by using Enzymer and by constraining the design to a naturally occurring and highly conserved Hammerhead motif, we designed 8 sequences for a pseudoknotted cis-acting Hammerhead ribozyme. Enzymer is available for download at https://bitbucket.org/casraz/enzymer. PMID:27499762
Allen, R J; Rieger, T R; Musante, C J
2016-03-01
Quantitative systems pharmacology models mechanistically describe a biological system and the effect of drug treatment on system behavior. Because these models rarely are identifiable from the available data, the uncertainty in physiological parameters may be sampled to create alternative parameterizations of the model, sometimes termed "virtual patients." In order to reproduce the statistics of a clinical population, virtual patients are often weighted to form a virtual population that reflects the baseline characteristics of the clinical cohort. Here we introduce a novel technique to efficiently generate virtual patients and, from this ensemble, demonstrate how to select a virtual population that matches the observed data without the need for weighting. This approach improves confidence in model predictions by mitigating the risk that spurious virtual patients become overrepresented in virtual populations.
Bayesian ensemble refinement by replica simulations and reweighting.
Hummer, Gerhard; Köfinger, Jürgen
2015-12-28
We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.
Bayesian ensemble refinement by replica simulations and reweighting
NASA Astrophysics Data System (ADS)
Hummer, Gerhard; Köfinger, Jürgen
2015-12-01
We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.
A hybrid variational ensemble data assimilation for the HIgh Resolution Limited Area Model (HIRLAM)
NASA Astrophysics Data System (ADS)
Gustafsson, N.; Bojarova, J.; Vignes, O.
2014-02-01
A hybrid variational ensemble data assimilation has been developed on top of the HIRLAM variational data assimilation. It provides the possibility of applying a flow-dependent background error covariance model during the data assimilation at the same time as full rank characteristics of the variational data assimilation are preserved. The hybrid formulation is based on an augmentation of the assimilation control variable with localised weights to be assigned to a set of ensemble member perturbations (deviations from the ensemble mean). The flow-dependency of the hybrid assimilation is demonstrated in single simulated observation impact studies and the improved performance of the hybrid assimilation in comparison with pure 3-dimensional variational as well as pure ensemble assimilation is also proven in real observation assimilation experiments. The performance of the hybrid assimilation is comparable to the performance of the 4-dimensional variational data assimilation. The sensitivity to various parameters of the hybrid assimilation scheme and the sensitivity to the applied ensemble generation techniques are also examined. In particular, the inclusion of ensemble perturbations with a lagged validity time has been examined with encouraging results.
Metal Oxide Gas Sensor Drift Compensation Using a Two-Dimensional Classifier Ensemble
Liu, Hang; Chu, Renzhi; Tang, Zhenan
2015-01-01
Sensor drift is the most challenging problem in gas sensing at present. We propose a novel two-dimensional classifier ensemble strategy to solve the gas discrimination problem, regardless of the gas concentration, with high accuracy over extended periods of time. This strategy is appropriate for multi-class classifiers that consist of combinations of pairwise classifiers, such as support vector machines. We compare the performance of the strategy with those of competing methods in an experiment based on a public dataset that was compiled over a period of three years. The experimental results demonstrate that the two-dimensional ensemble outperforms the other methods considered. Furthermore, we propose a pre-aging process inspired by that applied to the sensors to improve the stability of the classifier ensemble. The experimental results demonstrate that the weight of each multi-class classifier model in the ensemble remains fairly static before and after the addition of new classifier models to the ensemble, when a pre-aging procedure is applied. PMID:25942640
Men, Zhongxian; Yee, Eugene; Lien, Fue-Sang; Yang, Zhiling; Liu, Yongqian
2014-01-01
Short-term wind speed and wind power forecasts (for a 72 h period) are obtained using a nonlinear autoregressive exogenous artificial neural network (ANN) methodology which incorporates either numerical weather prediction or high-resolution computational fluid dynamics wind field information as an exogenous input. An ensemble approach is used to combine the predictions from many candidate ANNs in order to provide improved forecasts for wind speed and power, along with the associated uncertainties in these forecasts. More specifically, the ensemble ANN is used to quantify the uncertainties arising from the network weight initialization and from the unknown structure of the ANN. All members forming the ensemble of neural networks were trained using an efficient particle swarm optimization algorithm. The results of the proposed methodology are validated using wind speed and wind power data obtained from an operational wind farm located in Northern China. The assessment demonstrates that this methodology for wind speed and power forecasting generally provides an improvement in predictive skills when compared to the practice of using an "optimal" weight vector from a single ANN while providing additional information in the form of prediction uncertainty bounds.
Lien, Fue-Sang; Yang, Zhiling; Liu, Yongqian
2014-01-01
Short-term wind speed and wind power forecasts (for a 72 h period) are obtained using a nonlinear autoregressive exogenous artificial neural network (ANN) methodology which incorporates either numerical weather prediction or high-resolution computational fluid dynamics wind field information as an exogenous input. An ensemble approach is used to combine the predictions from many candidate ANNs in order to provide improved forecasts for wind speed and power, along with the associated uncertainties in these forecasts. More specifically, the ensemble ANN is used to quantify the uncertainties arising from the network weight initialization and from the unknown structure of the ANN. All members forming the ensemble of neural networks were trained using an efficient particle swarm optimization algorithm. The results of the proposed methodology are validated using wind speed and wind power data obtained from an operational wind farm located in Northern China. The assessment demonstrates that this methodology for wind speed and power forecasting generally provides an improvement in predictive skills when compared to the practice of using an “optimal” weight vector from a single ANN while providing additional information in the form of prediction uncertainty bounds. PMID:27382627
Interferometric Creep Testing.
1985-03-01
33 3 FIGURES (Continued) 16. Temperature of Zerodur sample and apparent strain * as a function of time with PZT-modulated mirror (point b...moves vertically if all mirrors are at 45 deg. The lower beam path et remains constant if the prism moves up or down or if the Zerodur plate expands...using a 2-in. Zerodur test sample at room temperature and no load except that from the weight of the top steel mirror disk, equivalent to 0.5 psi
NASA Astrophysics Data System (ADS)
Li, Hui; Hong, Lu-Yao; Zhou, Qing; Yu, Hai-Jie
2015-08-01
The business failure of numerous companies results in financial crises. The high social costs associated with such crises have made people to search for effective tools for business risk prediction, among which, support vector machine is very effective. Several modelling means, including single-technique modelling, hybrid modelling, and ensemble modelling, have been suggested in forecasting business risk with support vector machine. However, existing literature seldom focuses on the general modelling frame for business risk prediction, and seldom investigates performance differences among different modelling means. We reviewed researches on forecasting business risk with support vector machine, proposed the general assisted prediction modelling frame with hybridisation and ensemble (APMF-WHAE), and finally, investigated the use of principal components analysis, support vector machine, random sampling, and group decision, under the general frame in forecasting business risk. Under the APMF-WHAE frame with support vector machine as the base predictive model, four specific predictive models were produced, namely, pure support vector machine, a hybrid support vector machine involved with principal components analysis, a support vector machine ensemble involved with random sampling and group decision, and an ensemble of hybrid support vector machine using group decision to integrate various hybrid support vector machines on variables produced from principle components analysis and samples from random sampling. The experimental results indicate that hybrid support vector machine and ensemble of hybrid support vector machines were able to produce dominating performance than pure support vector machine and support vector machine ensemble.
Application Bayesian Model Averaging method for ensemble system for Poland
NASA Astrophysics Data System (ADS)
Guzikowski, Jakub; Czerwinska, Agnieszka
2014-05-01
The aim of the project is to evaluate methods for generating numerical ensemble weather prediction using a meteorological data from The Weather Research & Forecasting Model and calibrating this data by means of Bayesian Model Averaging (WRF BMA) approach. We are constructing height resolution short range ensemble forecasts using meteorological data (temperature) generated by nine WRF's models. WRF models have 35 vertical levels and 2.5 km x 2.5 km horizontal resolution. The main emphasis is that the used ensemble members has a different parameterization of the physical phenomena occurring in the boundary layer. To calibrate an ensemble forecast we use Bayesian Model Averaging (BMA) approach. The BMA predictive Probability Density Function (PDF) is a weighted average of predictive PDFs associated with each individual ensemble member, with weights that reflect the member's relative skill. For test we chose a case with heat wave and convective weather conditions in Poland area from 23th July to 1st August 2013. From 23th July to 29th July 2013 temperature oscillated below or above 30 Celsius degree in many meteorology stations and new temperature records were added. During this time the growth of the hospitalized patients with cardiovascular system problems was registered. On 29th July 2013 an advection of moist tropical air masses was recorded in the area of Poland causes strong convection event with mesoscale convection system (MCS). MCS caused local flooding, damage to the transport infrastructure, destroyed buildings, trees and injuries and direct threat of life. Comparison of the meteorological data from ensemble system with the data recorded on 74 weather stations localized in Poland is made. We prepare a set of the model - observations pairs. Then, the obtained data from single ensemble members and median from WRF BMA system are evaluated on the basis of the deterministic statistical error Root Mean Square Error (RMSE), Mean Absolute Error (MAE). To evaluation probabilistic data The Brier Score (BS) and Continuous Ranked Probability Score (CRPS) were used. Finally comparison between BMA calibrated data and data from ensemble members will be displayed.
A simple new filter for nonlinear high-dimensional data assimilation
NASA Astrophysics Data System (ADS)
Tödter, Julian; Kirchgessner, Paul; Ahrens, Bodo
2015-04-01
The ensemble Kalman filter (EnKF) and its deterministic variants, mostly square root filters such as the ensemble transform Kalman filter (ETKF), represent a popular alternative to variational data assimilation schemes and are applied in a wide range of operational and research activities. Their forecast step employs an ensemble integration that fully respects the nonlinear nature of the analyzed system. In the analysis step, they implicitly assume the prior state and observation errors to be Gaussian. Consequently, in nonlinear systems, the analysis mean and covariance are biased, and these filters remain suboptimal. In contrast, the fully nonlinear, non-Gaussian particle filter (PF) only relies on Bayes' theorem, which guarantees an exact asymptotic behavior, but because of the so-called curse of dimensionality it is exposed to weight collapse. This work shows how to obtain a new analysis ensemble whose mean and covariance exactly match the Bayesian estimates. This is achieved by a deterministic matrix square root transformation of the forecast ensemble, and subsequently a suitable random rotation that significantly contributes to filter stability while preserving the required second-order statistics. The forecast step remains as in the ETKF. The proposed algorithm, which is fairly easy to implement and computationally efficient, is referred to as the nonlinear ensemble transform filter (NETF). The properties and performance of the proposed algorithm are investigated via a set of Lorenz experiments. They indicate that such a filter formulation can increase the analysis quality, even for relatively small ensemble sizes, compared to other ensemble filters in nonlinear, non-Gaussian scenarios. Furthermore, localization enhances the potential applicability of this PF-inspired scheme in larger-dimensional systems. Finally, the novel algorithm is coupled to a large-scale ocean general circulation model. The NETF is stable, behaves reasonably and shows a good performance with a realistic ensemble size. The results confirm that, in principle, it can be applied successfully and as simple as the ETKF in high-dimensional problems without further modifications of the algorithm, even though it is only based on the particle weights. This proves that the suggested method constitutes a useful filter for nonlinear, high-dimensional data assimilation, and is able to overcome the curse of dimensionality even in deterministic systems.
Sampling errors for a nadir viewing instrument on the International Space Station
NASA Astrophysics Data System (ADS)
Berger, H. I.; Pincus, R.; Evans, F.; Santek, D.; Ackerman, S.; Ackerman, S.
2001-12-01
In an effort to improve the observational charactarization of ice clouds in the earth's atmosphere, we are developing a sub-millimeter wavelength radiometer which we propose to fly on the International Space Station for two years. Our goal is to accurately measure the ice water path and mass-weighted particle size at the finest possible temporal and spatial resolution. The ISS orbit precesses, sampling through the dirunal cycle every 16 days, but technological constraints limit our instrument to a single pixel viewed near nadir. We discuss sampling errors associated with this instrument/platform configuration. We use as "truth" the ISCCP dataset of pixel-level cloud optical retrievals, which acts as a proxy for ice water path; this dataset is sampled according to the orbital characteristics of the space station, and the statistics computed from the sub-sampled population are compared with those from the full dataset. We explore the tradeoffs in average sampling error as a function of the averaging time and spatial scale, and explore the possibility of resolving the dirunal cycle.
Use of combined radar and radiometer systems in space for precipitation measurement: Some ideas
NASA Technical Reports Server (NTRS)
Moore, R. K.
1981-01-01
A brief survey is given of some fundamental physical concepts of optimal polarization characteristics of a transmission path or scatter ensemble of hydrometers. It is argued that, based on this optimization concept, definite advances in remote atmospheric sensing are to be expected. Basic properties of Kennaugh's optimal polarization theory are identified.
Observation of ground-state quantum beats in atomic spontaneous emission.
Norris, D G; Orozco, L A; Barberis-Blostein, P; Carmichael, H J
2010-09-17
We report ground-state quantum beats in spontaneous emission from a continuously driven atomic ensemble. Beats are visible only in an intensity autocorrelation and evidence spontaneously generated coherence in radiative decay. Our measurement realizes a quantum eraser where a first photon detection prepares a superposition and a second erases the "which path" information in the intermediate state.
Zhang, Cuicui; Liang, Xuefeng; Matsuyama, Takashi
2014-12-08
Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the "small sample size" (SSS) problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1) how to define diverse base classifiers from the small data; (2) how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0-1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system.
Zhang, Cuicui; Liang, Xuefeng; Matsuyama, Takashi
2014-01-01
Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the “small sample size” (SSS) problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1) how to define diverse base classifiers from the small data; (2) how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0–1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system. PMID:25494350
Ensemble stump classifiers and gene expression signatures in lung cancer.
Frey, Lewis; Edgerton, Mary; Fisher, Douglas; Levy, Shawn
2007-01-01
Microarray data sets for cancer tumor tissue generally have very few samples, each sample having thousands of probes (i.e., continuous variables). The sparsity of samples makes it difficult for machine learning techniques to discover probes relevant to the classification of tumor tissue. By combining data from different platforms (i.e., data sources), data sparsity is reduced, but this typically requires normalizing data from the different platforms, which can be non-trivial. This paper proposes a variant on the idea of ensemble learners to circumvent the need for normalization. To facilitate comprehension we build ensembles of very simple classifiers known as decision stumps--decision trees of one test each. The Ensemble Stump Classifier (ESC) identifies an mRNA signature having three probes and high accuracy for distinguishing between adenocarcinoma and squamous cell carcinoma of the lung across four data sets. In terms of accuracy, ESC outperforms a decision tree classifier on all four data sets, outperforms ensemble decision trees on three data sets, and simple stump classifiers on two data sets.
Skill and independence weighting for multi-model assessments
Sanderson, Benjamin M.; Wehner, Michael; Knutti, Reto
2017-06-28
We present a weighting strategy for use with the CMIP5 multi-model archive in the fourth National Climate Assessment, which considers both skill in the climatological performance of models over North America as well as the inter-dependency of models arising from common parameterizations or tuning practices. The method exploits information relating to the climatological mean state of a number of projection-relevant variables as well as metrics representing long-term statistics of weather extremes. The weights, once computed can be used to simply compute weighted means and significance information from an ensemble containing multiple initial condition members from potentially co-dependent models of varyingmore » skill. Two parameters in the algorithm determine the degree to which model climatological skill and model uniqueness are rewarded; these parameters are explored and final values are defended for the assessment. The influence of model weighting on projected temperature and precipitation changes is found to be moderate, partly due to a compensating effect between model skill and uniqueness. However, more aggressive skill weighting and weighting by targeted metrics is found to have a more significant effect on inferred ensemble confidence in future patterns of change for a given projection.« less
An ensemble predictive modeling framework for breast cancer classification.
Nagarajan, Radhakrishnan; Upreti, Meenakshi
2017-12-01
Molecular changes often precede clinical presentation of diseases and can be useful surrogates with potential to assist in informed clinical decision making. Recent studies have demonstrated the usefulness of modeling approaches such as classification that can predict the clinical outcomes from molecular expression profiles. While useful, a majority of these approaches implicitly use all molecular markers as features in the classification process often resulting in sparse high-dimensional projection of the samples often comparable to that of the sample size. In this study, a variant of the recently proposed ensemble classification approach is used for predicting good and poor-prognosis breast cancer samples from their molecular expression profiles. In contrast to traditional single and ensemble classifiers, the proposed approach uses multiple base classifiers with varying feature sets obtained from two-dimensional projection of the samples in conjunction with a majority voting strategy for predicting the class labels. In contrast to our earlier implementation, base classifiers in the ensembles are chosen based on maximal sensitivity and minimal redundancy by choosing only those with low average cosine distance. The resulting ensemble sets are subsequently modeled as undirected graphs. Performance of four different classification algorithms is shown to be better within the proposed ensemble framework in contrast to using them as traditional single classifier systems. Significance of a subset of genes with high-degree centrality in the network abstractions across the poor-prognosis samples is also discussed. Copyright © 2017 Elsevier Inc. All rights reserved.
Ensemble Bayesian forecasting system Part I: Theory and algorithms
NASA Astrophysics Data System (ADS)
Herr, Henry D.; Krzysztofowicz, Roman
2015-05-01
The ensemble Bayesian forecasting system (EBFS), whose theory was published in 2001, is developed for the purpose of quantifying the total uncertainty about a discrete-time, continuous-state, non-stationary stochastic process such as a time series of stages, discharges, or volumes at a river gauge. The EBFS is built of three components: an input ensemble forecaster (IEF), which simulates the uncertainty associated with random inputs; a deterministic hydrologic model (of any complexity), which simulates physical processes within a river basin; and a hydrologic uncertainty processor (HUP), which simulates the hydrologic uncertainty (an aggregate of all uncertainties except input). It works as a Monte Carlo simulator: an ensemble of time series of inputs (e.g., precipitation amounts) generated by the IEF is transformed deterministically through a hydrologic model into an ensemble of time series of outputs, which is next transformed stochastically by the HUP into an ensemble of time series of predictands (e.g., river stages). Previous research indicated that in order to attain an acceptable sampling error, the ensemble size must be on the order of hundreds (for probabilistic river stage forecasts and probabilistic flood forecasts) or even thousands (for probabilistic stage transition forecasts). The computing time needed to run the hydrologic model this many times renders the straightforward simulations operationally infeasible. This motivates the development of the ensemble Bayesian forecasting system with randomization (EBFSR), which takes full advantage of the analytic meta-Gaussian HUP and generates multiple ensemble members after each run of the hydrologic model; this auxiliary randomization reduces the required size of the meteorological input ensemble and makes it operationally feasible to generate a Bayesian ensemble forecast of large size. Such a forecast quantifies the total uncertainty, is well calibrated against the prior (climatic) distribution of predictand, possesses a Bayesian coherence property, constitutes a random sample of the predictand, and has an acceptable sampling error-which makes it suitable for rational decision making under uncertainty.
NASA Technical Reports Server (NTRS)
Keppenne, Christian L.
2013-01-01
A two-step ensemble recentering Kalman filter (ERKF) analysis scheme is introduced. The algorithm consists of a recentering step followed by an ensemble Kalman filter (EnKF) analysis step. The recentering step is formulated such as to adjust the prior distribution of an ensemble of model states so that the deviations of individual samples from the sample mean are unchanged but the original sample mean is shifted to the prior position of the most likely particle, where the likelihood of each particle is measured in terms of closeness to a chosen subset of the observations. The computational cost of the ERKF is essentially the same as that of a same size EnKF. The ERKF is applied to the assimilation of Argo temperature profiles into the OGCM component of an ensemble of NASA GEOS-5 coupled models. Unassimilated Argo salt data are used for validation. A surprisingly small number (16) of model trajectories is sufficient to significantly improve model estimates of salinity over estimates from an ensemble run without assimilation. The two-step algorithm also performs better than the EnKF although its performance is degraded in poorly observed regions.
A new large initial condition ensemble to assess avoided impacts in a climate mitigation scenario
NASA Astrophysics Data System (ADS)
Sanderson, B. M.; Tebaldi, C.; Knutti, R.; Oleson, K. W.
2014-12-01
It has recently been demonstrated that when considering timescales of up to 50 years, natural variability may play an equal role to anthropogenic forcing on subcontinental trends for a variety of climate indicators. Thus, for many questions assessing climate impacts on such time and spatial scales, it has become clear that a significant number of ensemble members may be required to produce robust statistics (and especially so for extreme events). However, large ensemble experiments to date have considered the role of variability in a single scenario, leaving uncertain the relationship between the forced climate trajectory and the variability about that path. To address this issue, we present a new, publicly available, 15 member initial condition ensemble of 21st century climate projections for the RCP 4.5 scenario using the CESM1.1 Earth System Model, which we propose as a companion project to the existing 40 member CESM large ensemble which uses the higher greenhouse gas emission future of RCP8.5. This provides a valuable data set for assessing what societal and ecological impacts might be avoided through a moderate mitigation strategy in contrast to a fossil fuel intensive future. We present some early analyses of these combined ensembles to assess to what degree the climate variability can be considered to combine linearly with the underlying forced response. In regions where there is no detectable relationship between the mean state and the variability about the mean trajectory, then linear assumptions can be trivially exploited to utilize a single ensemble or control simulation to characterize the variability in any scenario of interest. We highlight regions where there is a detectable nonlinearity in extreme event frequency, how far in the future they will be manifested and propose mechanisms to account for these effects.
Metadynamic metainference: Enhanced sampling of the metainference ensemble using metadynamics
Bonomi, Massimiliano; Camilloni, Carlo; Vendruscolo, Michele
2016-01-01
Accurate and precise structural ensembles of proteins and macromolecular complexes can be obtained with metainference, a recently proposed Bayesian inference method that integrates experimental information with prior knowledge and deals with all sources of errors in the data as well as with sample heterogeneity. The study of complex macromolecular systems, however, requires an extensive conformational sampling, which represents a separate challenge. To address such challenge and to exhaustively and efficiently generate structural ensembles we combine metainference with metadynamics and illustrate its application to the calculation of the free energy landscape of the alanine dipeptide. PMID:27561930
NASA Astrophysics Data System (ADS)
Solvang Johansen, Stian; Steinsland, Ingelin; Engeland, Kolbjørn
2016-04-01
Running hydrological models with precipitation and temperature ensemble forcing to generate ensembles of streamflow is a commonly used method in operational hydrology. Evaluations of streamflow ensembles have however revealed that the ensembles are biased with respect to both mean and spread. Thus postprocessing of the ensembles is needed in order to improve the forecast skill. The aims of this study is (i) to to evaluate how postprocessing of streamflow ensembles works for Norwegian catchments within different hydrological regimes and to (ii) demonstrate how post processed streamflow ensembles are used operationally by a hydropower producer. These aims were achieved by postprocessing forecasted daily discharge for 10 lead-times for 20 catchments in Norway by using EPS forcing from ECMWF applied the semi-distributed HBV-model dividing each catchment into 10 elevation zones. Statkraft Energi uses forecasts from these catchments for scheduling hydropower production. The catchments represent different hydrological regimes. Some catchments have stable winter condition with winter low flow and a major flood event during spring or early summer caused by snow melting. Others has a more mixed snow-rain regime, often with a secondary flood season during autumn, and in the coastal areas, the stream flow is dominated by rain, and the main flood season is autumn and winter. For post processing, a Bayesian model averaging model (BMA) close to (Kleiber et al 2011) is used. The model creates a predictive PDF that is a weighted average of PDFs centered on the individual bias corrected forecasts. The weights are here equal since all ensemble members come from the same model, and thus have the same probability. For modeling streamflow, the gamma distribution is chosen as a predictive PDF. The bias correction parameters and the PDF parameters are estimated using a 30-day sliding window training period. Preliminary results show that the improvement varies between catchments depending on where they are situated and the hydrological regime. There is an improvement in CRPS for all catchments compared to raw EPS ensembles. The improvement is up to lead-time 5-7. The postprocessing also improves the MAE for the median of the predictive PDF compared to the median of the raw EPS. But less compared to CRPS, often up to lead-time 2-3. The streamflow ensembles are to some extent used operationally in Statkraft Energi (Hydro Power company, Norway), with respect to early warning, risk assessment and decision-making. Presently all forecast used operationally for short-term scheduling are deterministic, but ensembles are used visually for expert assessment of risk in difficult situations where e.g. there is a chance of overflow in a reservoir. However, there are plans to incorporate ensembles in the daily scheduling of hydropower production.
NASA Astrophysics Data System (ADS)
Livorati, André L. P.; Palmero, Matheus S.; Díaz-I, Gabriel; Dettmann, Carl P.; Caldas, Iberê L.; Leonel, Edson D.
2018-02-01
We study the dynamics of an ensemble of non interacting particles constrained by two infinitely heavy walls, where one of them is moving periodically in time, while the other is fixed. The system presents mixed dynamics, where the accessible region for the particle to diffuse chaotically is bordered by an invariant spanning curve. Statistical analysis for the root mean square velocity, considering high and low velocity ensembles, leads the dynamics to the same steady state plateau for long times. A transport investigation of the dynamics via escape basins reveals that depending of the initial velocity ensemble, the decay rates of the survival probability present different shapes and bumps, in a mix of exponential, power law and stretched exponential decays. After an analysis of step-size averages, we found that the stable manifolds play the role of a preferential path for faster escape, being responsible for the bumps and different shapes of the survival probability.
Upper Limit of Weights in TAI Computation
NASA Technical Reports Server (NTRS)
Thomas, Claudine; Azoubib, Jacques
1996-01-01
The international reference time scale International Atomic Time (TAI) computed by the Bureau International des Poids et Mesures (BIPM) relies on a weighted average of data from a large number of atomic clocks. In it, the weight attributed to a given clock depends on its long-term stability. In this paper the TAI algorithm is used as the basis for a discussion of how to implement an upper limit of weight for clocks contributing to the ensemble time. This problem is approached through the comparison of two different techniques. In one case, a maximum relative weight is fixed: no individual clock can contribute more than a given fraction to the resulting time scale. The weight of each clock is then adjusted according to the qualities of the whole set of contributing elements. In the other case, a parameter characteristic of frequency stability is chosen: no individual clock can appear more stable than the stated limit. This is equivalent to choosing an absolute limit of weight and attributing this to to the most stable clocks independently of the other elements of the ensemble. The first technique is more robust than the second and automatically optimizes the stability of the resulting time scale, but leads to a more complicated computatio. The second technique has been used in the TAI algorithm since the very beginning. Careful analysis of tests on real clock data shows that improvement of the stability of the time scale requires revision from time to time of the fixed value chosen for the upper limit of absolute weight. In particular, we present results which confirm the decision of the CCDS Working Group on TAI to increase the absolute upper limit by a factor of 2.5. We also show that the use of an upper relative contribution further helps to improve the stability and may be a useful step towards better use of the massive ensemble of HP 507IA clocks now contributing to TAI.
Ensemble Averaged Probability Density Function (APDF) for Compressible Turbulent Reacting Flows
NASA Technical Reports Server (NTRS)
Shih, Tsan-Hsing; Liu, Nan-Suey
2012-01-01
In this paper, we present a concept of the averaged probability density function (APDF) for studying compressible turbulent reacting flows. The APDF is defined as an ensemble average of the fine grained probability density function (FG-PDF) with a mass density weighting. It can be used to exactly deduce the mass density weighted, ensemble averaged turbulent mean variables. The transport equation for APDF can be derived in two ways. One is the traditional way that starts from the transport equation of FG-PDF, in which the compressible Navier- Stokes equations are embedded. The resulting transport equation of APDF is then in a traditional form that contains conditional means of all terms from the right hand side of the Navier-Stokes equations except for the chemical reaction term. These conditional means are new unknown quantities that need to be modeled. Another way of deriving the transport equation of APDF is to start directly from the ensemble averaged Navier-Stokes equations. The resulting transport equation of APDF derived from this approach appears in a closed form without any need for additional modeling. The methodology of ensemble averaging presented in this paper can be extended to other averaging procedures: for example, the Reynolds time averaging for statistically steady flow and the Reynolds spatial averaging for statistically homogeneous flow. It can also be extended to a time or spatial filtering procedure to construct the filtered density function (FDF) for the large eddy simulation (LES) of compressible turbulent reacting flows.
NASA Astrophysics Data System (ADS)
Xu, Lei; Chen, Nengcheng; Zhang, Xiang
2018-02-01
Drought is an extreme natural disaster that can lead to huge socioeconomic losses. Drought prediction ahead of months is helpful for early drought warning and preparations. In this study, we developed a statistical model, two weighted dynamic models and a statistical-dynamic (hybrid) model for 1-6 month lead drought prediction in China. Specifically, statistical component refers to climate signals weighting by support vector regression (SVR), dynamic components consist of the ensemble mean (EM) and Bayesian model averaging (BMA) of the North American Multi-Model Ensemble (NMME) climatic models, and the hybrid part denotes a combination of statistical and dynamic components by assigning weights based on their historical performances. The results indicate that the statistical and hybrid models show better rainfall predictions than NMME-EM and NMME-BMA models, which have good predictability only in southern China. In the 2011 China winter-spring drought event, the statistical model well predicted the spatial extent and severity of drought nationwide, although the severity was underestimated in the mid-lower reaches of Yangtze River (MLRYR) region. The NMME-EM and NMME-BMA models largely overestimated rainfall in northern and western China in 2011 drought. In the 2013 China summer drought, the NMME-EM model forecasted the drought extent and severity in eastern China well, while the statistical and hybrid models falsely detected negative precipitation anomaly (NPA) in some areas. Model ensembles such as multiple statistical approaches, multiple dynamic models or multiple hybrid models for drought predictions were highlighted. These conclusions may be helpful for drought prediction and early drought warnings in China.
Rieger, TR; Musante, CJ
2016-01-01
Quantitative systems pharmacology models mechanistically describe a biological system and the effect of drug treatment on system behavior. Because these models rarely are identifiable from the available data, the uncertainty in physiological parameters may be sampled to create alternative parameterizations of the model, sometimes termed “virtual patients.” In order to reproduce the statistics of a clinical population, virtual patients are often weighted to form a virtual population that reflects the baseline characteristics of the clinical cohort. Here we introduce a novel technique to efficiently generate virtual patients and, from this ensemble, demonstrate how to select a virtual population that matches the observed data without the need for weighting. This approach improves confidence in model predictions by mitigating the risk that spurious virtual patients become overrepresented in virtual populations. PMID:27069777
Shafizadeh-Moghadam, Hossein; Valavi, Roozbeh; Shahabi, Himan; Chapi, Kamran; Shirzadi, Ataollah
2018-07-01
In this research, eight individual machine learning and statistical models are implemented and compared, and based on their results, seven ensemble models for flood susceptibility assessment are introduced. The individual models included artificial neural networks, classification and regression trees, flexible discriminant analysis, generalized linear model, generalized additive model, boosted regression trees, multivariate adaptive regression splines, and maximum entropy, and the ensemble models were Ensemble Model committee averaging (EMca), Ensemble Model confidence interval Inferior (EMciInf), Ensemble Model confidence interval Superior (EMciSup), Ensemble Model to estimate the coefficient of variation (EMcv), Ensemble Model to estimate the mean (EMmean), Ensemble Model to estimate the median (EMmedian), and Ensemble Model based on weighted mean (EMwmean). The data set covered 201 flood events in the Haraz watershed (Mazandaran province in Iran) and 10,000 randomly selected non-occurrence points. Among the individual models, the Area Under the Receiver Operating Characteristic (AUROC), which showed the highest value, belonged to boosted regression trees (0.975) and the lowest value was recorded for generalized linear model (0.642). On the other hand, the proposed EMmedian resulted in the highest accuracy (0.976) among all models. In spite of the outstanding performance of some models, nevertheless, variability among the prediction of individual models was considerable. Therefore, to reduce uncertainty, creating more generalizable, more stable, and less sensitive models, ensemble forecasting approaches and in particular the EMmedian is recommended for flood susceptibility assessment. Copyright © 2018 Elsevier Ltd. All rights reserved.
Ensemble codes involving hippocampal neurons are at risk during delayed performance tests.
Hampson, R E; Deadwyler, S A
1996-11-26
Multielectrode recording techniques were used to record ensemble activity from 10 to 16 simultaneously active CA1 and CA3 neurons in the rat hippocampus during performance of a spatial delayed-nonmatch-to-sample task. Extracted sources of variance were used to assess the nature of two different types of errors that accounted for 30% of total trials. The two types of errors included ensemble "miscodes" of sample phase information and errors associated with delay-dependent corruption or disappearance of sample information at the time of the nonmatch response. Statistical assessment of trial sequences and associated "strength" of hippocampal ensemble codes revealed that miscoded error trials always followed delay-dependent error trials in which encoding was "weak," indicating that the two types of errors were "linked." It was determined that the occurrence of weakly encoded, delay-dependent error trials initiated an ensemble encoding "strategy" that increased the chances of being correct on the next trial and avoided the occurrence of further delay-dependent errors. Unexpectedly, the strategy involved "strongly" encoding response position information from the prior (delay-dependent) error trial and carrying it forward to the sample phase of the next trial. This produced a miscode type error on trials in which the "carried over" information obliterated encoding of the sample phase response on the next trial. Application of this strategy, irrespective of outcome, was sufficient to reorient the animal to the proper between trial sequence of response contingencies (nonmatch-to-sample) and boost performance to 73% correct on subsequent trials. The capacity for ensemble analyses of strength of information encoding combined with statistical assessment of trial sequences therefore provided unique insight into the "dynamic" nature of the role hippocampus plays in delay type memory tasks.
Optimized Free Energies from Bidirectional Single-Molecule Force Spectroscopy
NASA Astrophysics Data System (ADS)
Minh, David D. L.; Adib, Artur B.
2008-05-01
An optimized method for estimating path-ensemble averages using data from processes driven in opposite directions is presented. Based on this estimator, bidirectional expressions for reconstructing free energies and potentials of mean force from single-molecule force spectroscopy—valid for biasing potentials of arbitrary stiffness—are developed. Numerical simulations on a model potential indicate that these methods perform better than unidirectional strategies.
Xue, Y.; Liu, S.; Hu, Y.; Yang, J.; Chen, Q.
2007-01-01
To improve the accuracy in prediction, Genetic Algorithm based Adaptive Neural Network Ensemble (GA-ANNE) is presented. Intersections are allowed between different training sets based on the fuzzy clustering analysis, which ensures the diversity as well as the accuracy of individual Neural Networks (NNs). Moreover, to improve the accuracy of the adaptive weights of individual NNs, GA is used to optimize the cluster centers. Empirical results in predicting carbon flux of Duke Forest reveal that GA-ANNE can predict the carbon flux more accurately than Radial Basis Function Neural Network (RBFNN), Bagging NN ensemble, and ANNE. ?? 2007 IEEE.
2018-01-01
This paper measures the adhesion/cohesion force among asphalt molecules at nanoscale level using an Atomic Force Microscopy (AFM) and models the moisture damage by applying state-of-the-art Computational Intelligence (CI) techniques (e.g., artificial neural network (ANN), support vector regression (SVR), and an Adaptive Neuro Fuzzy Inference System (ANFIS)). Various combinations of lime and chemicals as well as dry and wet environments are used to produce different asphalt samples. The parameters that were varied to generate different asphalt samples and measure the corresponding adhesion/cohesion forces are percentage of antistripping agents (e.g., Lime and Unichem), AFM tips K values, and AFM tip types. The CI methods are trained to model the adhesion/cohesion forces given the variation in values of the above parameters. To achieve enhanced performance, the statistical methods such as average, weighted average, and regression of the outputs generated by the CI techniques are used. The experimental results show that, of the three individual CI methods, ANN can model moisture damage to lime- and chemically modified asphalt better than the other two CI techniques for both wet and dry conditions. Moreover, the ensemble of CI along with statistical measurement provides better accuracy than any of the individual CI techniques. PMID:29849551
Ling, Qing-Hua; Song, Yu-Qing; Han, Fei; Yang, Dan; Huang, De-Shuang
2016-01-01
For ensemble learning, how to select and combine the candidate classifiers are two key issues which influence the performance of the ensemble system dramatically. Random vector functional link networks (RVFL) without direct input-to-output links is one of suitable base-classifiers for ensemble systems because of its fast learning speed, simple structure and good generalization performance. In this paper, to obtain a more compact ensemble system with improved convergence performance, an improved ensemble of RVFL based on attractive and repulsive particle swarm optimization (ARPSO) with double optimization strategy is proposed. In the proposed method, ARPSO is applied to select and combine the candidate RVFL. As for using ARPSO to select the optimal base RVFL, ARPSO considers both the convergence accuracy on the validation data and the diversity of the candidate ensemble system to build the RVFL ensembles. In the process of combining RVFL, the ensemble weights corresponding to the base RVFL are initialized by the minimum norm least-square method and then further optimized by ARPSO. Finally, a few redundant RVFL is pruned, and thus the more compact ensemble of RVFL is obtained. Moreover, in this paper, theoretical analysis and justification on how to prune the base classifiers on classification problem is presented, and a simple and practically feasible strategy for pruning redundant base classifiers on both classification and regression problems is proposed. Since the double optimization is performed on the basis of the single optimization, the ensemble of RVFL built by the proposed method outperforms that built by some single optimization methods. Experiment results on function approximation and classification problems verify that the proposed method could improve its convergence accuracy as well as reduce the complexity of the ensemble system. PMID:27835638
Ling, Qing-Hua; Song, Yu-Qing; Han, Fei; Yang, Dan; Huang, De-Shuang
2016-01-01
For ensemble learning, how to select and combine the candidate classifiers are two key issues which influence the performance of the ensemble system dramatically. Random vector functional link networks (RVFL) without direct input-to-output links is one of suitable base-classifiers for ensemble systems because of its fast learning speed, simple structure and good generalization performance. In this paper, to obtain a more compact ensemble system with improved convergence performance, an improved ensemble of RVFL based on attractive and repulsive particle swarm optimization (ARPSO) with double optimization strategy is proposed. In the proposed method, ARPSO is applied to select and combine the candidate RVFL. As for using ARPSO to select the optimal base RVFL, ARPSO considers both the convergence accuracy on the validation data and the diversity of the candidate ensemble system to build the RVFL ensembles. In the process of combining RVFL, the ensemble weights corresponding to the base RVFL are initialized by the minimum norm least-square method and then further optimized by ARPSO. Finally, a few redundant RVFL is pruned, and thus the more compact ensemble of RVFL is obtained. Moreover, in this paper, theoretical analysis and justification on how to prune the base classifiers on classification problem is presented, and a simple and practically feasible strategy for pruning redundant base classifiers on both classification and regression problems is proposed. Since the double optimization is performed on the basis of the single optimization, the ensemble of RVFL built by the proposed method outperforms that built by some single optimization methods. Experiment results on function approximation and classification problems verify that the proposed method could improve its convergence accuracy as well as reduce the complexity of the ensemble system.
Localization of a variational particle smoother
NASA Astrophysics Data System (ADS)
Morzfeld, M.; Hodyss, D.; Poterjoy, J.
2017-12-01
Given the success of 4D-variational methods (4D-Var) in numerical weather prediction,and recent efforts to merge ensemble Kalman filters with 4D-Var,we consider a method to merge particle methods and 4D-Var.This leads us to revisit variational particle smoothers (varPS).We study the collapse of varPS in high-dimensional problemsand show how it can be prevented by weight-localization.We test varPS on the Lorenz'96 model of dimensionsn=40, n=400, and n=2000.In our numerical experiments, weight localization prevents the collapse of the varPS,and we note that the varPS yields results comparable to ensemble formulations of 4D-variational methods,while it outperforms EnKF with tuned localization and inflation,and the localized standard particle filter.Additional numerical experiments suggest that using localized weights in varPS may not yield significant advantages over unweighted or linearizedsolutions in near-Gaussian problems.
Distributed deep learning networks among institutions for medical imaging.
Chang, Ken; Balachandar, Niranjan; Lam, Carson; Yi, Darvin; Brown, James; Beers, Andrew; Rosen, Bruce; Rubin, Daniel L; Kalpathy-Cramer, Jayashree
2018-03-29
Deep learning has become a promising approach for automated support for clinical diagnosis. When medical data samples are limited, collaboration among multiple institutions is necessary to achieve high algorithm performance. However, sharing patient data often has limitations due to technical, legal, or ethical concerns. In this study, we propose methods of distributing deep learning models as an attractive alternative to sharing patient data. We simulate the distribution of deep learning models across 4 institutions using various training heuristics and compare the results with a deep learning model trained on centrally hosted patient data. The training heuristics investigated include ensembling single institution models, single weight transfer, and cyclical weight transfer. We evaluated these approaches for image classification in 3 independent image collections (retinal fundus photos, mammography, and ImageNet). We find that cyclical weight transfer resulted in a performance that was comparable to that of centrally hosted patient data. We also found that there is an improvement in the performance of cyclical weight transfer heuristic with a high frequency of weight transfer. We show that distributing deep learning models is an effective alternative to sharing patient data. This finding has implications for any collaborative deep learning study.
NASA Astrophysics Data System (ADS)
Akibue, Seiseki; Kato, Go
2018-04-01
For distinguishing quantum states sampled from a fixed ensemble, the gap in bipartite and single-party distinguishability can be interpreted as a nonlocality of the ensemble. In this paper, we consider bipartite state discrimination in a composite system consisting of N subsystems, where each subsystem is shared between two parties and the state of each subsystem is randomly sampled from a particular ensemble comprising the Bell states. We show that the success probability of perfectly identifying the state converges to 1 as N →∞ if the entropy of the probability distribution associated with the ensemble is less than 1, even if the success probability is less than 1 for any finite N . In other words, the nonlocality of the N -fold ensemble asymptotically disappears if the probability distribution associated with each ensemble is concentrated. Furthermore, we show that the disappearance of the nonlocality can be regarded as a remarkable counterexample of a fundamental open question in theoretical computer science, called a parallel repetition conjecture of interactive games with two classically communicating players. Measurements for the discrimination task include a projective measurement of one party represented by stabilizer states, which enable the other party to perfectly distinguish states that are sampled with high probability.
Ensemble transcript interaction networks: a case study on Alzheimer's disease.
Armañanzas, Rubén; Larrañaga, Pedro; Bielza, Concha
2012-10-01
Systems biology techniques are a topic of recent interest within the neurological field. Computational intelligence (CI) addresses this holistic perspective by means of consensus or ensemble techniques ultimately capable of uncovering new and relevant findings. In this paper, we propose the application of a CI approach based on ensemble Bayesian network classifiers and multivariate feature subset selection to induce probabilistic dependences that could match or unveil biological relationships. The research focuses on the analysis of high-throughput Alzheimer's disease (AD) transcript profiling. The analysis is conducted from two perspectives. First, we compare the expression profiles of hippocampus subregion entorhinal cortex (EC) samples of AD patients and controls. Second, we use the ensemble approach to study four types of samples: EC and dentate gyrus (DG) samples from both patients and controls. Results disclose transcript interaction networks with remarkable structures and genes not directly related to AD by previous studies. The ensemble is able to identify a variety of transcripts that play key roles in other neurological pathologies. Classical statistical assessment by means of non-parametric tests confirms the relevance of the majority of the transcripts. The ensemble approach pinpoints key metabolic mechanisms that could lead to new findings in the pathogenesis and development of AD. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Gao, Jiali; Major, Dan T; Fan, Yao; Lin, Yen-Lin; Ma, Shuhua; Wong, Kin-Yiu
2008-01-01
A method for incorporating quantum mechanics into enzyme kinetics modeling is presented. Three aspects are emphasized: 1) combined quantum mechanical and molecular mechanical methods are used to represent the potential energy surface for modeling bond forming and breaking processes, 2) instantaneous normal mode analyses are used to incorporate quantum vibrational free energies to the classical potential of mean force, and 3) multidimensional tunneling methods are used to estimate quantum effects on the reaction coordinate motion. Centroid path integral simulations are described to make quantum corrections to the classical potential of mean force. In this method, the nuclear quantum vibrational and tunneling contributions are not separable. An integrated centroid path integral-free energy perturbation and umbrella sampling (PI-FEP/UM) method along with a bisection sampling procedure was summarized, which provides an accurate, easily convergent method for computing kinetic isotope effects for chemical reactions in solution and in enzymes. In the ensemble-averaged variational transition state theory with multidimensional tunneling (EA-VTST/MT), these three aspects of quantum mechanical effects can be individually treated, providing useful insights into the mechanism of enzymatic reactions. These methods are illustrated by applications to a model process in the gas phase, the decarboxylation reaction of N-methyl picolinate in water, and the proton abstraction and reprotonation process catalyzed by alanine racemase. These examples show that the incorporation of quantum mechanical effects is essential for enzyme kinetics simulations.
Method and apparatus for vapor detection
NASA Technical Reports Server (NTRS)
Lerner, Melvin (Inventor); Hood, Lyal V. (Inventor); Rommel, Marjorie A. (Inventor); Pettitt, Bruce C. (Inventor); Erikson, Charles M. (Inventor)
1980-01-01
The method disclosed herein may be practiced by passing the vapors to be sampled along a path with halogen vapor, preferably chlorine vapor, heating the mixed vapors to halogenate those of the sampled vapors subject to halogenation, removing unreacted halogen vapor, and then sensing the vapors for organic halogenated compounds. The apparatus disclosed herein comprises means for flowing the vapors, both sample and halogen vapors, into a common path, means for heating the mixed vapors to effect the halogenation reaction, means for removing unreacted halogen vapor, and a sensing device for sensing halogenated compounds. By such a method and means, the vapors of low molecular weight hydrocarbons, ketones and alcohols, when present, such as methane, ethane, acetone, ethanol, and the like are converted, at least in part, to halogenated compounds, then the excess halogen removed or trapped, and the resultant vapors of the halogenated compounds sensed or detected. The system is highly sensitive. For example, acetone in a concentration of 30 parts per billion (volume) is readily detected.
Morishita, Tetsuya; Yonezawa, Yasushige; Ito, Atsushi M
2017-07-11
Efficient and reliable estimation of the mean force (MF), the derivatives of the free energy with respect to a set of collective variables (CVs), has been a challenging problem because free energy differences are often computed by integrating the MF. Among various methods for computing free energy differences, logarithmic mean-force dynamics (LogMFD) [ Morishita et al., Phys. Rev. E 2012 , 85 , 066702 ] invokes the conservation law in classical mechanics to integrate the MF, which allows us to estimate the free energy profile along the CVs on-the-fly. Here, we present a method called parallel dynamics, which improves the estimation of the MF by employing multiple replicas of the system and is straightforwardly incorporated in LogMFD or a related method. In the parallel dynamics, the MF is evaluated by a nonequilibrium path-ensemble using the multiple replicas based on the Crooks-Jarzynski nonequilibrium work relation. Thanks to the Crooks relation, realizing full-equilibrium states is no longer mandatory for estimating the MF. Additionally, sampling in the hidden subspace orthogonal to the CV space is highly improved with appropriate weights for each metastable state (if any), which is hardly achievable by typical free energy computational methods. We illustrate how to implement parallel dynamics by combining it with LogMFD, which we call logarithmic parallel dynamics (LogPD). Biosystems of alanine dipeptide and adenylate kinase in explicit water are employed as benchmark systems to which LogPD is applied to demonstrate the effect of multiple replicas on the accuracy and efficiency in estimating the free energy profiles using parallel dynamics.
Ensemble Data Assimilation Without Ensembles: Methodology and Application to Ocean Data Assimilation
NASA Technical Reports Server (NTRS)
Keppenne, Christian L.; Rienecker, Michele M.; Kovach, Robin M.; Vernieres, Guillaume
2013-01-01
Two methods to estimate background error covariances for data assimilation are introduced. While both share properties with the ensemble Kalman filter (EnKF), they differ from it in that they do not require the integration of multiple model trajectories. Instead, all the necessary covariance information is obtained from a single model integration. The first method is referred-to as SAFE (Space Adaptive Forecast error Estimation) because it estimates error covariances from the spatial distribution of model variables within a single state vector. It can thus be thought of as sampling an ensemble in space. The second method, named FAST (Flow Adaptive error Statistics from a Time series), constructs an ensemble sampled from a moving window along a model trajectory. The underlying assumption in these methods is that forecast errors in data assimilation are primarily phase errors in space and/or time.
Quantum Ensemble Classification: A Sampling-Based Learning Control Approach.
Chen, Chunlin; Dong, Daoyi; Qi, Bo; Petersen, Ian R; Rabitz, Herschel
2017-06-01
Quantum ensemble classification (QEC) has significant applications in discrimination of atoms (or molecules), separation of isotopes, and quantum information extraction. However, quantum mechanics forbids deterministic discrimination among nonorthogonal states. The classification of inhomogeneous quantum ensembles is very challenging, since there exist variations in the parameters characterizing the members within different classes. In this paper, we recast QEC as a supervised quantum learning problem. A systematic classification methodology is presented by using a sampling-based learning control (SLC) approach for quantum discrimination. The classification task is accomplished via simultaneously steering members belonging to different classes to their corresponding target states (e.g., mutually orthogonal states). First, a new discrimination method is proposed for two similar quantum systems. Then, an SLC method is presented for QEC. Numerical results demonstrate the effectiveness of the proposed approach for the binary classification of two-level quantum ensembles and the multiclass classification of multilevel quantum ensembles.
Jiang, Xiaoying; Wei, Rong; Zhao, Yanjun; Zhang, Tongliang
2008-05-01
The knowledge of subnuclear localization in eukaryotic cells is essential for understanding the life function of nucleus. Developing prediction methods and tools for proteins subnuclear localization become important research fields in protein science for special characteristics in cell nuclear. In this study, a novel approach has been proposed to predict protein subnuclear localization. Sample of protein is represented by Pseudo Amino Acid (PseAA) composition based on approximate entropy (ApEn) concept, which reflects the complexity of time series. A novel ensemble classifier is designed incorporating three AdaBoost classifiers. The base classifier algorithms in three AdaBoost are decision stumps, fuzzy K nearest neighbors classifier, and radial basis-support vector machines, respectively. Different PseAA compositions are used as input data of different AdaBoost classifier in ensemble. Genetic algorithm is used to optimize the dimension and weight factor of PseAA composition. Two datasets often used in published works are used to validate the performance of the proposed approach. The obtained results of Jackknife cross-validation test are higher and more balance than them of other methods on same datasets. The promising results indicate that the proposed approach is effective and practical. It might become a useful tool in protein subnuclear localization. The software in Matlab and supplementary materials are available freely by contacting the corresponding author.
NASA Astrophysics Data System (ADS)
Hirpa, F. A.; Gebremichael, M.; Hopson, T. M.; Wojick, R.
2011-12-01
We present results of data assimilation of ground discharge observation and remotely sensed soil moisture observations into Sacramento Soil Moisture Accounting (SACSMA) model in a small watershed (1593 km2) in Minnesota, the Unites States. Specifically, we perform assimilation experiments with Ensemble Kalman Filter (EnKF) and Particle Filter (PF) in order to improve streamflow forecast accuracy at six hourly time step. The EnKF updates the soil moisture states in the SACSMA from the relative errors of the model and observations, while the PF adjust the weights of the state ensemble members based on the likelihood of the forecast. Results of the improvements of each filter over the reference model (without data assimilation) will be presented. Finally, the EnKF and PF are coupled together to further improve the streamflow forecast accuracy.
Adamo, Kristi B; Papadakis, Sophia; Dojeiji, Laurie; Turnau, Micheline; Simmons, Louise; Parameswaran, Meena; Cunningham, John; Pipe, Andrew L; Reid, Robert D
2010-11-01
Parents have a fundamental role in promoting the healthy weight of their children. To determine parental perceptions of their child's body weight, eating and physical activity (PA) behaviours, and to test a predictive model of parental perceptions regarding their child's PA and healthy eating behaviours. A random-digit telephone survey was conducted among parents of children four to 12 years of age living in the Champlain region of Ontario. Descriptive statistics were used to summarize the responses. Path analysis was used to identify predictors of parental perceptions of PA and healthy eating. The study sample consisted of 1940 parents/caregivers. Only 0.2% of parents reported their child as being obese; 8.6% reported their child as being overweight. Most parents perceived their child to be physically active and eating healthily. Approximately 25% of parents reported that their child spent 2 h/day or more in front of a screen, and that their child consumed less than three servings of fruits and vegetables daily, and regularly consumed fast food. Variables that correlated with PA perceptions included time spent reading/doing homework, interest in PA, perceived importance of PA, frequency of PA, level of parental PA, participation in organized sport, child weight and parental concern for weight. Variables that predicted perceptions regarding healthy eating were parental education, household income, preparation of home-cooked meals, fruit and vegetable intake, and concern for and influence on the child's weight. Parents in the present study sample did not appear to understand, or had little knowledge of the recommendations for PA and healthy eating in children. Parents appeared to base their judgment of healthy levels of PA or healthy eating behaviours using minimal criteria; these criteria are inconsistent with those used by health professionals to define adequate PA and healthy eating. The present survey highlights an important knowledge gap between scientific opinion and parental perceptions of the criteria for healthy PA and eating behaviours.
Choo, Jina; Kang, Hyuncheol
2015-05-01
To identify predictors of initial weight loss among women with abdominal obesity by using a path model. Successful weight loss in the initial stages of long-term weight management may promote weight loss maintenance. A longitudinal study design. Study participants were 75 women with abdominal obesity, who were enrolled in a 12-month Community-based Heart and Weight Management Trial and followed until a 6-month assessment. The Weight Efficacy Lifestyle, Exercise Self-Efficacy and Health Promoting Lifestyle Profile-II measured diet self-efficacy, exercise self-efficacy and health-promoting behaviour respectively. All endogenous and exogenous variables used in our path model were change variables from baseline to 6 months. Data were collected between May 2011-May 2012. Based on the path model, increases in both diet and exercise self-efficacy had significant effects on increases in health-promoting behaviour. Increases in diet self-efficacy had a significant indirect effect on initial weight loss via increases in health-promoting behaviour. Increases in health-promoting behaviour had a significant effect on initial weight loss. Among women with abdominal obesity, increased diet self-efficacy and health-promoting behaviour were predictors of initial weight loss. A mechanism by which increased diet self-efficacy predicts initial weight loss may be partially attributable to health-promoting behavioural change. However, more work is still needed to verify causality. Based on the current findings, intensive nursing strategies for increasing self-efficacy for weight control and health-promoting behaviour may be essential components for better weight loss in the initial stage of a weight management intervention. © 2015 John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Li, J. F.; Waliser, D. E.; Chen, W.; Deng, M.; Lebsock, M. D.; Stephens, G. L.; Guan, B.; Christensen, M.; Teixeira, J.
2013-12-01
Representing clouds and cloud climate feedbacks in global climate models (GCMs) remains a pressing challenge to reduce and quantify uncertainties associated with climate change projection. Vertical structures of clouds simulated by present-day models have not been extensively examined using vertically-resolved cloud hydrometers such as cloud ice water (CIW) content and cloud liquid water (CLW) content. The gap in available observations for cloud mass was clearly evident from the wide disparity in the CIW path [Waliser et al., 2009] and CLW path [Li et al., 2008;2011] values exhibited in the CMIP3 GCMs. We present an observationally-based evaluation of the CIW and CLW of present-day GCMs, notably 20th century CMIP5 simulations, and compare these results to the CMIP3 and two recent reanalyses (ECMWF and MERRA). We use three different CloudSat+CALIPSO CIW products as well as three different observation CLW products, CloudSat, MODIS and AMSRE and their combined product for CLW with methods to remove the contribution from the convective core ice mass and/or precipitating cloud hydrometeors with variable sizes and falling speeds so that a robust observational estimate with uncertainty can be obtained for model evaluations. Note, considering the CloudSat's limitations of CLW retrievals due to contamination from the precipitation and from radar clutter near the surface, an alternative CLW is synergistically constructed using MODIS CLW and CloudSat CLW. The results show that for annual mean CIW path, there are factors of 2-10 in the differences between observations and models for a majority of the GCMs and for a number of regions. Based on a number of metrics, the ensemble behavior of CMIP5 has improved considerably relative to CMIP3 (~ 50%), although neither the CMIP5 ensemble mean nor any individual model performs particularly well, and there are still a number of models that exhibit very large biases despite the availability of relevant observations. For CLW, most of the CMIP3/CMIP5 annual mean CLW path values are overestimated by factors of 2-10 compared to observations globally. For the vertical structure of CIW/CLW content, significant systematic biases are found with many models biased significantly. Based on the Taylor diagram, the ensemble performance of CMIP5 CLW path simulation shows little or no improvement relative to CMIP3. The implications of these results on model representations of the earth radiation balance are discussed, along with caveats and uncertainties associated with the observational estimates, model and observation representations of the precipitating and cloudy ice components, relevant physical processes and parameterizations.
NMR Studies of Dynamic Biomolecular Conformational Ensembles
Torchia, Dennis A.
2015-01-01
Multidimensional heteronuclear NMR approaches can provide nearly complete sequential signal assignments of isotopically enriched biomolecules. The availability of assignments together with measurements of spin relaxation rates, residual spin interactions, J-couplings and chemical shifts provides information at atomic resolution about internal dynamics on timescales ranging from ps to ms, both in solution and in the solid state. However, due to the complexity of biomolecules, it is not possible to extract a unique atomic-resolution description of biomolecular motions even from extensive NMR data when many conformations are sampled on multiple timescales. For this reason, powerful computational approaches are increasingly applied to large NMR data sets to elucidate conformational ensembles sampled by biomolecules. In the past decade, considerable attention has been directed at an important class of biomolecules that function by binding to a wide variety of target molecules. Questions of current interest are: “Does the free biomolecule sample a conformational ensemble that encompasses the conformations found when it binds to various targets; and if so, on what time scale is the ensemble sampled?” This article reviews recent efforts to answer these questions, with a focus on comparing ensembles obtained for the same biomolecules by different investigators. A detailed comparison of results obtained is provided for three biomolecules: ubiquitin, calmodulin and the HIV-1 trans-activation response RNA. PMID:25669739
WEAMR — A Weighted Energy Aware Multipath Reliable Routing Mechanism for Hotline-Based WSNs
Tufail, Ali; Qamar, Arslan; Khan, Adil Mehmood; Baig, Waleed Akram; Kim, Ki-Hyung
2013-01-01
Reliable source to sink communication is the most important factor for an efficient routing protocol especially in domains of military, healthcare and disaster recovery applications. We present weighted energy aware multipath reliable routing (WEAMR), a novel energy aware multipath routing protocol which utilizes hotline-assisted routing to meet such requirements for mission critical applications. The protocol reduces the number of average hops from source to destination and provides unmatched reliability as compared to well known reactive ad hoc protocols i.e., AODV and AOMDV. Our protocol makes efficient use of network paths based on weighted cost calculation and intelligently selects the best possible paths for data transmissions. The path cost calculation considers end to end number of hops, latency and minimum energy node value in the path. In case of path failure path recalculation is done efficiently with minimum latency and control packets overhead. Our evaluation shows that our proposal provides better end-to-end delivery with less routing overhead and higher packet delivery success ratio compared to AODV and AOMDV. The use of multipath also increases overall life time of WSN network using optimum energy available paths between sender and receiver in WDNs. PMID:23669714
WEAMR-a weighted energy aware multipath reliable routing mechanism for hotline-based WSNs.
Tufail, Ali; Qamar, Arslan; Khan, Adil Mehmood; Baig, Waleed Akram; Kim, Ki-Hyung
2013-05-13
Reliable source to sink communication is the most important factor for an efficient routing protocol especially in domains of military, healthcare and disaster recovery applications. We present weighted energy aware multipath reliable routing (WEAMR), a novel energy aware multipath routing protocol which utilizes hotline-assisted routing to meet such requirements for mission critical applications. The protocol reduces the number of average hops from source to destination and provides unmatched reliability as compared to well known reactive ad hoc protocols i.e., AODV and AOMDV. Our protocol makes efficient use of network paths based on weighted cost calculation and intelligently selects the best possible paths for data transmissions. The path cost calculation considers end to end number of hops, latency and minimum energy node value in the path. In case of path failure path recalculation is done efficiently with minimum latency and control packets overhead. Our evaluation shows that our proposal provides better end-to-end delivery with less routing overhead and higher packet delivery success ratio compared to AODV and AOMDV. The use of multipath also increases overall life time of WSN network using optimum energy available paths between sender and receiver in WDNs.
Sikorsky Aircraft Advanced Rotorcraft Transmission (ART) program
NASA Technical Reports Server (NTRS)
Kish, Jules G.
1993-01-01
The objectives of the Advanced Rotorcraft Transmission program were to achieve a 25 percent weight reduction, a 10 dB noise reduction, and a 5,000 hour mean time between removals (MTBR). A three engine Army Cargo Aircraft (ACA) of 85,000 pounds gross weight was used as the baseline. Preliminary designs were conducted of split path and split torque transmissions to evaluate weight, reliability, and noise. A split path gearbox was determined to be 23 percent lighter, greater than 10 dB quieter, and almost four times more reliable than the baseline two stage planetary design. Detail design studies were conducted of the chosen split path configuration, and drawings were produced of a 1/2 size gearbox consisting of a single engine path of the split path section. Fabrication and testing was then conducted on the 1/2 size gearbox. The 1/2 size gearbox testing proved that the concept of the split path gearbox with high reduction ratio double helical output gear was sound. The improvements were attributed to extensive use of composites, spring clutches, advanced high hot hardness gear steels, the split path configuration itself, high reduction ratio, double helical gearing on the output stage, elastomeric load sharing devices, and elimination of accessory drives.
Harmonic-phase path-integral approximation of thermal quantum correlation functions
NASA Astrophysics Data System (ADS)
Robertson, Christopher; Habershon, Scott
2018-03-01
We present an approximation to the thermal symmetric form of the quantum time-correlation function in the standard position path-integral representation. By transforming to a sum-and-difference position representation and then Taylor-expanding the potential energy surface of the system to second order, the resulting expression provides a harmonic weighting function that approximately recovers the contribution of the phase to the time-correlation function. This method is readily implemented in a Monte Carlo sampling scheme and provides exact results for harmonic potentials (for both linear and non-linear operators) and near-quantitative results for anharmonic systems for low temperatures and times that are likely to be relevant to condensed phase experiments. This article focuses on one-dimensional examples to provide insights into convergence and sampling properties, and we also discuss how this approximation method may be extended to many-dimensional systems.
Insights into the deterministic skill of air quality ensembles ...
Simulations from chemical weather models are subject to uncertainties in the input data (e.g. emission inventory, initial and boundary conditions) as well as those intrinsic to the model (e.g. physical parameterization, chemical mechanism). Multi-model ensembles can improve the forecast skill, provided that certain mathematical conditions are fulfilled. In this work, four ensemble methods were applied to two different datasets, and their performance was compared for ozone (O3), nitrogen dioxide (NO2) and particulate matter (PM10). Apart from the unconditional ensemble average, the approach behind the other three methods relies on adding optimum weights to members or constraining the ensemble to those members that meet certain conditions in time or frequency domain. The two different datasets were created for the first and second phase of the Air Quality Model Evaluation International Initiative (AQMEII). The methods are evaluated against ground level observations collected from the EMEP (European Monitoring and Evaluation Programme) and AirBase databases. The goal of the study is to quantify to what extent we can extract predictable signals from an ensemble with superior skill over the single models and the ensemble mean. Verification statistics show that the deterministic models simulate better O3 than NO2 and PM10, linked to different levels of complexity in the represented processes. The unconditional ensemble mean achieves higher skill compared to each stati
Reliable probabilities through statistical post-processing of ensemble predictions
NASA Astrophysics Data System (ADS)
Van Schaeybroeck, Bert; Vannitsem, Stéphane
2013-04-01
We develop post-processing or calibration approaches based on linear regression that make ensemble forecasts more reliable. We enforce climatological reliability in the sense that the total variability of the prediction is equal to the variability of the observations. Second, we impose ensemble reliability such that the spread around the ensemble mean of the observation coincides with the one of the ensemble members. In general the attractors of the model and reality are inhomogeneous. Therefore ensemble spread displays a variability not taken into account in standard post-processing methods. We overcome this by weighting the ensemble by a variable error. The approaches are tested in the context of the Lorenz 96 model (Lorenz 1996). The forecasts become more reliable at short lead times as reflected by a flatter rank histogram. Our best method turns out to be superior to well-established methods like EVMOS (Van Schaeybroeck and Vannitsem, 2011) and Nonhomogeneous Gaussian Regression (Gneiting et al., 2005). References [1] Gneiting, T., Raftery, A. E., Westveld, A., Goldman, T., 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133, 1098-1118. [2] Lorenz, E. N., 1996: Predictability - a problem partly solved. Proceedings, Seminar on Predictability ECMWF. 1, 1-18. [3] Van Schaeybroeck, B., and S. Vannitsem, 2011: Post-processing through linear regression, Nonlin. Processes Geophys., 18, 147.
The integrated process rates (IPR) estimated by the Eta-CMAQ model at grid cells along the trajectory of the air mass transport path were analyzed to quantitatively investigate the relative importance of physical and chemical processes for O3 formation and evolution ov...
Composite pulses for interferometry in a thermal cold atom cloud
NASA Astrophysics Data System (ADS)
Dunning, Alexander; Gregory, Rachel; Bateman, James; Cooper, Nathan; Himsworth, Matthew; Jones, Jonathan A.; Freegarde, Tim
2014-09-01
Atom interferometric sensors and quantum information processors must maintain coherence while the evolving quantum wave function is split, transformed, and recombined, but suffer from experimental inhomogeneities and uncertainties in the speeds and paths of these operations. Several error-correction techniques have been proposed to isolate the variable of interest. Here we apply composite pulse methods to velocity-sensitive Raman state manipulation in a freely expanding thermal atom cloud. We compare several established pulse sequences, and follow the state evolution within them. The agreement between measurements and simple predictions shows the underlying coherence of the atom ensemble, and the inversion infidelity in a ˜80μK atom cloud is halved. Composite pulse techniques, especially if tailored for atom interferometric applications, should allow greater interferometer areas, larger atomic samples, and longer interaction times, and hence improve the sensitivity of quantum technologies from inertial sensing and clocks to quantum information processors and tests of fundamental physics.
Bayesian refinement of protein structures and ensembles against SAXS data using molecular dynamics
Shevchuk, Roman; Hub, Jochen S.
2017-01-01
Small-angle X-ray scattering is an increasingly popular technique used to detect protein structures and ensembles in solution. However, the refinement of structures and ensembles against SAXS data is often ambiguous due to the low information content of SAXS data, unknown systematic errors, and unknown scattering contributions from the solvent. We offer a solution to such problems by combining Bayesian inference with all-atom molecular dynamics simulations and explicit-solvent SAXS calculations. The Bayesian formulation correctly weights the SAXS data versus prior physical knowledge, it quantifies the precision or ambiguity of fitted structures and ensembles, and it accounts for unknown systematic errors due to poor buffer matching. The method further provides a probabilistic criterion for identifying the number of states required to explain the SAXS data. The method is validated by refining ensembles of a periplasmic binding protein against calculated SAXS curves. Subsequently, we derive the solution ensembles of the eukaryotic chaperone heat shock protein 90 (Hsp90) against experimental SAXS data. We find that the SAXS data of the apo state of Hsp90 is compatible with a single wide-open conformation, whereas the SAXS data of Hsp90 bound to ATP or to an ATP-analogue strongly suggest heterogenous ensembles of a closed and a wide-open state. PMID:29045407
Improving and Evaluating Nested Sampling Algorithm for Marginal Likelihood Estimation
NASA Astrophysics Data System (ADS)
Ye, M.; Zeng, X.; Wu, J.; Wang, D.; Liu, J.
2016-12-01
With the growing impacts of climate change and human activities on the cycle of water resources, an increasing number of researches focus on the quantification of modeling uncertainty. Bayesian model averaging (BMA) provides a popular framework for quantifying conceptual model and parameter uncertainty. The ensemble prediction is generated by combining each plausible model's prediction, and each model is attached with a model weight which is determined by model's prior weight and marginal likelihood. Thus, the estimation of model's marginal likelihood is crucial for reliable and accurate BMA prediction. Nested sampling estimator (NSE) is a new proposed method for marginal likelihood estimation. The process of NSE is accomplished by searching the parameters' space from low likelihood area to high likelihood area gradually, and this evolution is finished iteratively via local sampling procedure. Thus, the efficiency of NSE is dominated by the strength of local sampling procedure. Currently, Metropolis-Hasting (M-H) algorithm is often used for local sampling. However, M-H is not an efficient sampling algorithm for high-dimensional or complicated parameter space. For improving the efficiency of NSE, it could be ideal to incorporate the robust and efficient sampling algorithm - DREAMzs into the local sampling of NSE. The comparison results demonstrated that the improved NSE could improve the efficiency of marginal likelihood estimation significantly. However, both improved and original NSEs suffer from heavy instability. In addition, the heavy computation cost of huge number of model executions is overcome by using an adaptive sparse grid surrogates.
NASA Astrophysics Data System (ADS)
Sanders, Ryan L.; Shapley, Alice E.; Zhang, Kai; Yan, Renbin
2017-12-01
Galaxy metallicity scaling relations provide a powerful tool for understanding galaxy evolution, but obtaining unbiased global galaxy gas-phase oxygen abundances requires proper treatment of the various line-emitting sources within spectroscopic apertures. We present a model framework that treats galaxies as ensembles of H II and diffuse ionized gas (DIG) regions of varying metallicities. These models are based upon empirical relations between line ratios and electron temperature for H II regions, and DIG strong-line ratio relations from SDSS-IV MaNGA IFU data. Flux-weighting effects and DIG contamination can significantly affect properties inferred from global galaxy spectra, biasing metallicity estimates by more than 0.3 dex in some cases. We use observationally motivated inputs to construct a model matched to typical local star-forming galaxies, and quantify the biases in strong-line ratios, electron temperatures, and direct-method metallicities as inferred from global galaxy spectra relative to the median values of the H II region distributions in each galaxy. We also provide a generalized set of models that can be applied to individual galaxies or galaxy samples in atypical regions of parameter space. We use these models to correct for the effects of flux-weighting and DIG contamination in the local direct-method mass-metallicity and fundamental metallicity relations, and in the mass-metallicity relation based on strong-line metallicities. Future photoionization models of galaxy line emission need to include DIG emission and represent galaxies as ensembles of emitting regions with varying metallicity, instead of as single H II regions with effective properties, in order to obtain unbiased estimates of key underlying physical properties.
Ensemble theory for slightly deformable granular matter.
Tejada, Ignacio G
2014-09-01
Given a granular system of slightly deformable particles, it is possible to obtain different static and jammed packings subjected to the same macroscopic constraints. These microstates can be compared in a mathematical space defined by the components of the force-moment tensor (i.e. the product of the equivalent stress by the volume of the Voronoi cell). In order to explain the statistical distributions observed there, an athermal ensemble theory can be used. This work proposes a formalism (based on developments of the original theory of Edwards and collaborators) that considers both the internal and the external constraints of the problem. The former give the density of states of the points of this space, and the latter give their statistical weight. The internal constraints are those caused by the intrinsic features of the system (e.g. size distribution, friction, cohesion). They, together with the force-balance condition, determine which the possible local states of equilibrium of a particle are. Under the principle of equal a priori probabilities, and when no other constraints are imposed, it can be assumed that particles are equally likely to be found in any one of these local states of equilibrium. Then a flat sampling over all these local states turns into a non-uniform distribution in the force-moment space that can be represented with density of states functions. Although these functions can be measured, some of their features are explored in this paper. The external constraints are those macroscopic quantities that define the ensemble and are fixed by the protocol. The force-moment, the volume, the elastic potential energy and the stress are some examples of quantities that can be expressed as functions of the force-moment. The associated ensembles are included in the formalism presented here.
A path integral approach to the full Dicke model with dipole-dipole interaction
NASA Astrophysics Data System (ADS)
Aparicio Alcalde, M.; Stephany, J.; Svaiter, N. F.
2011-12-01
We consider the full Dicke spin-boson model composed by a single bosonic mode and an ensemble of N identical two-level atoms with different couplings for the resonant and anti-resonant interaction terms, and incorporate a dipole-dipole interaction between the atoms. Assuming that the system is in thermal equilibrium with a reservoir at temperature β-1, we compute the free energy in the thermodynamic limit N → ∞ in the saddle-point approximation to the path integral and determine the critical temperature for the super-radiant phase transition. In the zero temperature limit, we recover the critical coupling of the quantum phase transition, presented in the literature.
High-density amorphous ice: A path-integral simulation
NASA Astrophysics Data System (ADS)
Herrero, Carlos P.; Ramírez, Rafael
2012-09-01
Structural and thermodynamic properties of high-density amorphous (HDA) ice have been studied by path-integral molecular dynamics simulations in the isothermal-isobaric ensemble. Interatomic interactions were modeled by using the effective q-TIP4P/F potential for flexible water. Quantum nuclear motion is found to affect several observable properties of the amorphous solid. At low temperature (T = 50 K) the molar volume of HDA ice is found to increase by 6%, and the intramolecular O-H distance rises by 1.4% due to quantum motion. Peaks in the radial distribution function of HDA ice are broadened with respect to their classical expectancy. The bulk modulus, B, is found to rise linearly with the pressure, with a slope ∂B/∂P = 7.1. Our results are compared with those derived earlier from classical and path-integral simulations of HDA ice. We discuss similarities and discrepancies with those earlier simulations.
NASA Astrophysics Data System (ADS)
Hardy, Jason; Campbell, Mark; Miller, Isaac; Schimpf, Brian
2008-10-01
The local path planner implemented on Cornell's 2007 DARPA Urban Challenge entry vehicle Skynet utilizes a novel mixture of discrete and continuous path planning steps to facilitate a safe, smooth, and human-like driving behavior. The planner first solves for a feasible path through the local obstacle map using a grid based search algorithm. The resulting path is then refined using a cost-based nonlinear optimization routine with both hard and soft constraints. The behavior of this optimization is influenced by tunable weighting parameters which govern the relative cost contributions assigned to different path characteristics. This paper studies the sensitivity of the vehicle's performance to these path planner weighting parameters using a data driven simulation based on logged data from the National Qualifying Event. The performance of the path planner in both the National Qualifying Event and in the Urban Challenge is also presented and analyzed.
NASA Astrophysics Data System (ADS)
Weber, Steven; Murch, K. W.; Chantasri, A.; Dressel, J.; Jordan, A. N.; Siddiqi, I.
2014-03-01
We use weak measurements to track individual quantum trajectories of a superconducting qubit embedded in a microwave cavity. Using a near-quantum-limited parametric amplifier, we selectively measure either the phase or amplitude of the cavity field, and thereby confine trajectories to either the equator or a meridian of the Bloch sphere. We analyze ensembles of trajectories to determine statistical properties such as the most likely path and most likely time connecting pre and post-selected quantum states. We compare our results with theoretical predictions derived from an action principle for continuous quantum measurement. Furthermore, by introducing a qubit drive, we investigate the interplay between unitary state evolution and non-unitary measurement dynamics. This work was supported by the IARPA CSQ program and the ONR.
Dual-wavelength pump-probe microscopy analysis of melanin composition
NASA Astrophysics Data System (ADS)
Thompson, Andrew; Robles, Francisco E.; Wilson, Jesse W.; Deb, Sanghamitra; Calderbank, Robert; Warren, Warren S.
2016-11-01
Pump-probe microscopy is an emerging technique that provides detailed chemical information of absorbers with sub-micrometer spatial resolution. Recent work has shown that the pump-probe signals from melanin in human skin cancers correlate well with clinical concern, but it has been difficult to infer the molecular origins of these differences. Here we develop a mathematical framework to describe the pump-probe dynamics of melanin in human pigmented tissue samples, which treats the ensemble of individual chromophores that make up melanin as Gaussian absorbers with bandwidth related via Frenkel excitons. Thus, observed signals result from an interplay between the spectral bandwidths of the individual underlying chromophores and spectral proximity of the pump and probe wavelengths. The model is tested using a dual-wavelength pump-probe approach and a novel signal processing method based on gnomonic projections. Results show signals can be described by a single linear transition path with different rates of progress for different individual pump-probe wavelength pairs. Moreover, the combined dual-wavelength data shows a nonlinear transition that supports our mathematical framework and the excitonic model to describe the optical properties of melanin. The novel gnomonic projection analysis can also be an attractive generic tool for analyzing mixing paths in biomolecular and analytical chemistry.
Dual-wavelength pump-probe microscopy analysis of melanin composition
Thompson, Andrew; Robles, Francisco E.; Wilson, Jesse W.; Deb, Sanghamitra; Calderbank, Robert; Warren, Warren S.
2016-01-01
Pump-probe microscopy is an emerging technique that provides detailed chemical information of absorbers with sub-micrometer spatial resolution. Recent work has shown that the pump-probe signals from melanin in human skin cancers correlate well with clinical concern, but it has been difficult to infer the molecular origins of these differences. Here we develop a mathematical framework to describe the pump-probe dynamics of melanin in human pigmented tissue samples, which treats the ensemble of individual chromophores that make up melanin as Gaussian absorbers with bandwidth related via Frenkel excitons. Thus, observed signals result from an interplay between the spectral bandwidths of the individual underlying chromophores and spectral proximity of the pump and probe wavelengths. The model is tested using a dual-wavelength pump-probe approach and a novel signal processing method based on gnomonic projections. Results show signals can be described by a single linear transition path with different rates of progress for different individual pump-probe wavelength pairs. Moreover, the combined dual-wavelength data shows a nonlinear transition that supports our mathematical framework and the excitonic model to describe the optical properties of melanin. The novel gnomonic projection analysis can also be an attractive generic tool for analyzing mixing paths in biomolecular and analytical chemistry. PMID:27833147
NASA Astrophysics Data System (ADS)
Wu, Xiongwu; Brooks, Bernard R.
2011-11-01
The self-guided Langevin dynamics (SGLD) is a method to accelerate conformational searching. This method is unique in the way that it selectively enhances and suppresses molecular motions based on their frequency to accelerate conformational searching without modifying energy surfaces or raising temperatures. It has been applied to studies of many long time scale events, such as protein folding. Recent progress in the understanding of the conformational distribution in SGLD simulations makes SGLD also an accurate method for quantitative studies. The SGLD partition function provides a way to convert the SGLD conformational distribution to the canonical ensemble distribution and to calculate ensemble average properties through reweighting. Based on the SGLD partition function, this work presents a force-momentum-based self-guided Langevin dynamics (SGLDfp) simulation method to directly sample the canonical ensemble. This method includes interaction forces in its guiding force to compensate the perturbation caused by the momentum-based guiding force so that it can approximately sample the canonical ensemble. Using several example systems, we demonstrate that SGLDfp simulations can approximately maintain the canonical ensemble distribution and significantly accelerate conformational searching. With optimal parameters, SGLDfp and SGLD simulations can cross energy barriers of more than 15 kT and 20 kT, respectively, at similar rates for LD simulations to cross energy barriers of 10 kT. The SGLDfp method is size extensive and works well for large systems. For studies where preserving accessible conformational space is critical, such as free energy calculations and protein folding studies, SGLDfp is an efficient approach to search and sample the conformational space.
A Maximum-Likelihood Approach to Force-Field Calibration.
Zaborowski, Bartłomiej; Jagieła, Dawid; Czaplewski, Cezary; Hałabis, Anna; Lewandowska, Agnieszka; Żmudzińska, Wioletta; Ołdziej, Stanisław; Karczyńska, Agnieszka; Omieczynski, Christian; Wirecki, Tomasz; Liwo, Adam
2015-09-28
A new approach to the calibration of the force fields is proposed, in which the force-field parameters are obtained by maximum-likelihood fitting of the calculated conformational ensembles to the experimental ensembles of training system(s). The maximum-likelihood function is composed of logarithms of the Boltzmann probabilities of the experimental conformations, calculated with the current energy function. Because the theoretical distribution is given in the form of the simulated conformations only, the contributions from all of the simulated conformations, with Gaussian weights in the distances from a given experimental conformation, are added to give the contribution to the target function from this conformation. In contrast to earlier methods for force-field calibration, the approach does not suffer from the arbitrariness of dividing the decoy set into native-like and non-native structures; however, if such a division is made instead of using Gaussian weights, application of the maximum-likelihood method results in the well-known energy-gap maximization. The computational procedure consists of cycles of decoy generation and maximum-likelihood-function optimization, which are iterated until convergence is reached. The method was tested with Gaussian distributions and then applied to the physics-based coarse-grained UNRES force field for proteins. The NMR structures of the tryptophan cage, a small α-helical protein, determined at three temperatures (T = 280, 305, and 313 K) by Hałabis et al. ( J. Phys. Chem. B 2012 , 116 , 6898 - 6907 ), were used. Multiplexed replica-exchange molecular dynamics was used to generate the decoys. The iterative procedure exhibited steady convergence. Three variants of optimization were tried: optimization of the energy-term weights alone and use of the experimental ensemble of the folded protein only at T = 280 K (run 1); optimization of the energy-term weights and use of experimental ensembles at all three temperatures (run 2); and optimization of the energy-term weights and the coefficients of the torsional and multibody energy terms and use of experimental ensembles at all three temperatures (run 3). The force fields were subsequently tested with a set of 14 α-helical and two α + β proteins. Optimization run 1 resulted in better agreement with the experimental ensemble at T = 280 K compared with optimization run 2 and in comparable performance on the test set but poorer agreement of the calculated folding temperature with the experimental folding temperature. Optimization run 3 resulted in the best fit of the calculated ensembles to the experimental ones for the tryptophan cage but in much poorer performance on the training set, suggesting that use of a small α-helical protein for extensive force-field calibration resulted in overfitting of the data for this protein at the expense of transferability. The optimized force field resulting from run 2 was found to fold 13 of the 14 tested α-helical proteins and one small α + β protein with the correct topologies; the average structures of 10 of them were predicted with accuracies of about 5 Å C(α) root-mean-square deviation or better. Test simulations with an additional set of 12 α-helical proteins demonstrated that this force field performed better on α-helical proteins than the previous parametrizations of UNRES. The proposed approach is applicable to any problem of maximum-likelihood parameter estimation when the contributions to the maximum-likelihood function cannot be evaluated at the experimental points and the dimension of the configurational space is too high to construct histograms of the experimental distributions.
A New Look into the Effect of Large Drops on Radiative Transfer Process
NASA Technical Reports Server (NTRS)
Marshak, Alexander
2003-01-01
Recent studies indicate that a cloudy atmosphere absorbs more solar radiation than any current 1D or 3D radiation model can predict. The excess absorption is not large, perhaps 10-15 W/sq m or less, but any such systematic bias is of concern since radiative transfer models are assumed to be sufficiently accurate for remote sensing applications and climate modeling. The most natural explanation would be that models do not capture real 3D cloud structure and, as a consequence, their photon path lengths are too short. However, extensive calculations, using increasingly realistic 3D cloud structures, failed to produce photon paths long enough to explain the excess absorption. Other possible explanations have also been unsuccessful so, at this point, conventional models seem to offer no solution to this puzzle. The weakest link in conventional models is the way a size distribution of cloud particles is mathematically handled. Basically, real particles are replaced with a single average particle. This "ensemble assumption" assumes that all particle sizes are well represented in any given elementary volume. But the concentration of larger particles can be so low that this assumption is significantly violated. We show how a different mathematical route, using the concept of a cumulative distribution, avoids the ensemble assumption. The cumulative distribution has jumps, or steps, corresponding to the rarer sizes. These jumps result in an additional term, a kind of Green's function, in the solution of the radiative transfer equation. Solving the cloud radiative transfer equation with the measured particle distributions, described in a cumulative rather than an ensemble fashion, may lead to increased cloud absorption of the magnitude observed.
Firefighters Integrated Response Equipment System
NASA Technical Reports Server (NTRS)
Kaplan, H.; Abeles, F.
1978-01-01
The Firefighters Integrated Response Equipment System (Project FIRES) is a joint National Fire Prevention and Control Administration (NFPCA)/National Aeronautics and Space Administration (NASA) program for the development of an 'ultimate' firefighter's protective ensemble. The overall aim of Project FIRES is to improve firefighter protection against hazards, such as heat, flame, smoke, toxic fumes, moisture, impact penetration, and electricity and, at the same time, improve firefighter performance by increasing maneuverability, lowering weight, and improving human engineering design of his protective ensemble.
Evaluation of Multi-Model Ensemble System for Seasonal and Monthly Prediction
NASA Astrophysics Data System (ADS)
Zhang, Q.; Van den Dool, H. M.
2013-12-01
Since August 2011, the realtime seasonal forecasts of U.S. National Multi-Model Ensemble (NMME) have been made on 8th of each month by NCEP Climate Prediction Center (CPC). During the first year, the participating models were NCEP/CFSv1&2, GFDL/CM2.2, NCAR/U.Miami/COLA/CCSM3, NASA/GEOS5, IRI/ ECHAM-a & ECHAM-f for the realtime NMME forecast. The Canadian Meteorological Center CanCM3 and CM4 replaced the CFSv1 and IRI's models in the second year. The NMME team at CPC collects three variables, including precipitation, 2-meter temperature and sea surface temperature from each modeling center on a 1x1 global grid, removes systematic errors, makes the grand ensemble mean with equal weight for each model and constructs a probability forecast with equal weight for each member. The team then provides the NMME forecast to the operational CPC forecaster responsible for the seasonal and monthly outlook each month. Verification of the seasonal and monthly prediction from NMME is conducted by calculating the anomaly correlation (AC) from the 30-year hindcasts (1982-2011) of individual model and NMME ensemble. The motivation of this study is to provide skill benchmarks for future improvements of the NMME seasonal and monthly prediction system. The experimental (Phase I) stage of the project already supplies routine guidance to users of the NMME forecasts.
Ensemble modelling and structured decision-making to support Emergency Disease Management.
Webb, Colleen T; Ferrari, Matthew; Lindström, Tom; Carpenter, Tim; Dürr, Salome; Garner, Graeme; Jewell, Chris; Stevenson, Mark; Ward, Michael P; Werkman, Marleen; Backer, Jantien; Tildesley, Michael
2017-03-01
Epidemiological models in animal health are commonly used as decision-support tools to understand the impact of various control actions on infection spread in susceptible populations. Different models contain different assumptions and parameterizations, and policy decisions might be improved by considering outputs from multiple models. However, a transparent decision-support framework to integrate outputs from multiple models is nascent in epidemiology. Ensemble modelling and structured decision-making integrate the outputs of multiple models, compare policy actions and support policy decision-making. We briefly review the epidemiological application of ensemble modelling and structured decision-making and illustrate the potential of these methods using foot and mouth disease (FMD) models. In case study one, we apply structured decision-making to compare five possible control actions across three FMD models and show which control actions and outbreak costs are robustly supported and which are impacted by model uncertainty. In case study two, we develop a methodology for weighting the outputs of different models and show how different weighting schemes may impact the choice of control action. Using these case studies, we broadly illustrate the potential of ensemble modelling and structured decision-making in epidemiology to provide better information for decision-making and outline necessary development of these methods for their further application. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.
Comparing ensemble learning methods based on decision tree classifiers for protein fold recognition.
Bardsiri, Mahshid Khatibi; Eftekhari, Mahdi
2014-01-01
In this paper, some methods for ensemble learning of protein fold recognition based on a decision tree (DT) are compared and contrasted against each other over three datasets taken from the literature. According to previously reported studies, the features of the datasets are divided into some groups. Then, for each of these groups, three ensemble classifiers, namely, random forest, rotation forest and AdaBoost.M1 are employed. Also, some fusion methods are introduced for combining the ensemble classifiers obtained in the previous step. After this step, three classifiers are produced based on the combination of classifiers of types random forest, rotation forest and AdaBoost.M1. Finally, the three different classifiers achieved are combined to make an overall classifier. Experimental results show that the overall classifier obtained by the genetic algorithm (GA) weighting fusion method, is the best one in comparison to previously applied methods in terms of classification accuracy.
Synchronization Experiments With A Global Coupled Model of Intermediate Complexity
NASA Astrophysics Data System (ADS)
Selten, Frank; Hiemstra, Paul; Shen, Mao-Lin
2013-04-01
In the super modeling approach an ensemble of imperfect models are connected through nudging terms that nudge the solution of each model to the solution of all other models in the ensemble. The goal is to obtain a synchronized state through a proper choice of connection strengths that closely tracks the trajectory of the true system. For the super modeling approach to be successful, the connections should be dense and strong enough for synchronization to occur. In this study we analyze the behavior of an ensemble of connected global atmosphere-ocean models of intermediate complexity. All atmosphere models are connected to the same ocean model through the surface fluxes of heat, water and momentum, the ocean is integrated using weighted averaged surface fluxes. In particular we analyze the degree of synchronization between the atmosphere models and the characteristics of the ensemble mean solution. The results are interpreted using a low order atmosphere-ocean toy model.
Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data
Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.
2016-01-01
We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872
A Canonical Ensemble Correlation Prediction Model for Seasonal Precipitation Anomaly
NASA Technical Reports Server (NTRS)
Shen, Samuel S. P.; Lau, William K. M.; Kim, Kyu-Myong; Li, Guilong
2001-01-01
This report describes an optimal ensemble forecasting model for seasonal precipitation and its error estimation. Each individual forecast is based on the canonical correlation analysis (CCA) in the spectral spaces whose bases are empirical orthogonal functions (EOF). The optimal weights in the ensemble forecasting crucially depend on the mean square error of each individual forecast. An estimate of the mean square error of a CCA prediction is made also using the spectral method. The error is decomposed onto EOFs of the predictand and decreases linearly according to the correlation between the predictor and predictand. This new CCA model includes the following features: (1) the use of area-factor, (2) the estimation of prediction error, and (3) the optimal ensemble of multiple forecasts. The new CCA model is applied to the seasonal forecasting of the United States precipitation field. The predictor is the sea surface temperature.
An ensemble rank learning approach for gene prioritization.
Lee, Po-Feng; Soo, Von-Wun
2013-01-01
Several different computational approaches have been developed to solve the gene prioritization problem. We intend to use the ensemble boosting learning techniques to combine variant computational approaches for gene prioritization in order to improve the overall performance. In particular we add a heuristic weighting function to the Rankboost algorithm according to: 1) the absolute ranks generated by the adopted methods for a certain gene, and 2) the ranking relationship between all gene-pairs from each prioritization result. We select 13 known prostate cancer genes in OMIM database as training set and protein coding gene data in HGNC database as test set. We adopt the leave-one-out strategy for the ensemble rank boosting learning. The experimental results show that our ensemble learning approach outperforms the four gene-prioritization methods in ToppGene suite in the ranking results of the 13 known genes in terms of mean average precision, ROC and AUC measures.
Cue Reliance in L2 Written Production
ERIC Educational Resources Information Center
Wiechmann, Daniel; Kerz, Elma
2014-01-01
Second language learners reach expert levels in relative cue weighting only gradually. On the basis of ensemble machine learning models fit to naturalistic written productions of German advanced learners of English and expert writers, we set out to reverse engineer differences in the weighting of multiple cues in a clause linearization problem. We…
Path optimization method for the sign problem
NASA Astrophysics Data System (ADS)
Ohnishi, Akira; Mori, Yuto; Kashiwa, Kouji
2018-03-01
We propose a path optimization method (POM) to evade the sign problem in the Monte-Carlo calculations for complex actions. Among many approaches to the sign problem, the Lefschetz-thimble path-integral method and the complex Langevin method are promising and extensively discussed. In these methods, real field variables are complexified and the integration manifold is determined by the flow equations or stochastically sampled. When we have singular points of the action or multiple critical points near the original integral surface, however, we have a risk to encounter the residual and global sign problems or the singular drift term problem. One of the ways to avoid the singular points is to optimize the integration path which is designed not to hit the singular points of the Boltzmann weight. By specifying the one-dimensional integration-path as z = t +if(t)(f ɛ R) and by optimizing f(t) to enhance the average phase factor, we demonstrate that we can avoid the sign problem in a one-variable toy model for which the complex Langevin method is found to fail. In this proceedings, we propose POM and discuss how we can avoid the sign problem in a toy model. We also discuss the possibility to utilize the neural network to optimize the path.
Rauscher, Sarah; Neale, Chris; Pomès, Régis
2009-10-13
Generalized-ensemble algorithms in temperature space have become popular tools to enhance conformational sampling in biomolecular simulations. A random walk in temperature leads to a corresponding random walk in potential energy, which can be used to cross over energetic barriers and overcome the problem of quasi-nonergodicity. In this paper, we introduce two novel methods: simulated tempering distributed replica sampling (STDR) and virtual replica exchange (VREX). These methods are designed to address the practical issues inherent in the replica exchange (RE), simulated tempering (ST), and serial replica exchange (SREM) algorithms. RE requires a large, dedicated, and homogeneous cluster of CPUs to function efficiently when applied to complex systems. ST and SREM both have the drawback of requiring extensive initial simulations, possibly adaptive, for the calculation of weight factors or potential energy distribution functions. STDR and VREX alleviate the need for lengthy initial simulations, and for synchronization and extensive communication between replicas. Both methods are therefore suitable for distributed or heterogeneous computing platforms. We perform an objective comparison of all five algorithms in terms of both implementation issues and sampling efficiency. We use disordered peptides in explicit water as test systems, for a total simulation time of over 42 μs. Efficiency is defined in terms of both structural convergence and temperature diffusion, and we show that these definitions of efficiency are in fact correlated. Importantly, we find that ST-based methods exhibit faster temperature diffusion and correspondingly faster convergence of structural properties compared to RE-based methods. Within the RE-based methods, VREX is superior to both SREM and RE. On the basis of our observations, we conclude that ST is ideal for simple systems, while STDR is well-suited for complex systems.
Ideas for a pattern-oriented approach towards a VERA analysis ensemble
NASA Astrophysics Data System (ADS)
Gorgas, T.; Dorninger, M.
2010-09-01
Ideas for a pattern-oriented approach towards a VERA analysis ensemble For many applications in meteorology and especially for verification purposes it is important to have some information about the uncertainties of observation and analysis data. A high quality of these "reference data" is an absolute necessity as the uncertainties are reflected in verification measures. The VERA (Vienna Enhanced Resolution Analysis) scheme includes a sophisticated quality control tool which accounts for the correction of observational data and provides an estimation of the observation uncertainty. It is crucial for meteorologically and physically reliable analysis fields. VERA is based on a variational principle and does not need any first guess fields. It is therefore NWP model independent and can also be used as an unbiased reference for real time model verification. For downscaling purposes VERA uses an a priori knowledge on small-scale physical processes over complex terrain, the so called "fingerprint technique", which transfers information from rich to data sparse regions. The enhanced Joint D-PHASE and COPS data set forms the data base for the analysis ensemble study. For the WWRP projects D-PHASE and COPS a joint activity has been started to collect GTS and non-GTS data from the national and regional meteorological services in Central Europe for 2007. Data from more than 11.000 stations are available for high resolution analyses. The usage of random numbers as perturbations for ensemble experiments is a common approach in meteorology. In most implementations, like for NWP-model ensemble systems, the focus lies on error growth and propagation on the spatial and temporal scale. When defining errors in analysis fields we have to consider the fact that analyses are not time dependent and that no perturbation method aimed at temporal evolution is possible. Further, the method applied should respect two major sources of analysis errors: Observation errors AND analysis or interpolation errors. With the concept of an analysis ensemble we hope to get a more detailed sight on both sources of analysis errors. For the computation of the VERA ensemble members a sample of Gaussian random perturbations is produced for each station and parameter. The deviation of perturbations is based on the correction proposals by the VERA QC scheme to provide some "natural" limits for the ensemble. In order to put more emphasis on the weather situation we aim to integrate the main synoptic field structures as weighting factors for the perturbations. Two widely approved approaches are used for the definition of these main field structures: The Principal Component Analysis and a 2D-Discrete Wavelet Transform. The results of tests concerning the implementation of this pattern-supported analysis ensemble system and a comparison of the different approaches are given in the presentation.
Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang
2016-11-16
The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.
Enhanced Sampling in the Well-Tempered Ensemble
NASA Astrophysics Data System (ADS)
Bonomi, M.; Parrinello, M.
2010-05-01
We introduce the well-tempered ensemble (WTE) which is the biased ensemble sampled by well-tempered metadynamics when the energy is used as collective variable. WTE can be designed so as to have approximately the same average energy as the canonical ensemble but much larger fluctuations. These two properties lead to an extremely fast exploration of phase space. An even greater efficiency is obtained when WTE is combined with parallel tempering. Unbiased Boltzmann averages are computed on the fly by a recently developed reweighting method [M. Bonomi , J. Comput. Chem. 30, 1615 (2009)JCCHDD0192-865110.1002/jcc.21305]. We apply WTE and its parallel tempering variant to the 2d Ising model and to a Gō model of HIV protease, demonstrating in these two representative cases that convergence is accelerated by orders of magnitude.
Enhanced sampling in the well-tempered ensemble.
Bonomi, M; Parrinello, M
2010-05-14
We introduce the well-tempered ensemble (WTE) which is the biased ensemble sampled by well-tempered metadynamics when the energy is used as collective variable. WTE can be designed so as to have approximately the same average energy as the canonical ensemble but much larger fluctuations. These two properties lead to an extremely fast exploration of phase space. An even greater efficiency is obtained when WTE is combined with parallel tempering. Unbiased Boltzmann averages are computed on the fly by a recently developed reweighting method [M. Bonomi, J. Comput. Chem. 30, 1615 (2009)]. We apply WTE and its parallel tempering variant to the 2d Ising model and to a Gō model of HIV protease, demonstrating in these two representative cases that convergence is accelerated by orders of magnitude.
NASA Astrophysics Data System (ADS)
Weathers, T. S.; Ginn, T. R.; Spycher, N.; Barkouki, T. H.; Fujita, Y.; Smith, R. W.
2009-12-01
Subsurface contamination is often mitigated with an injection/extraction well system. An understanding of heterogeneities within this radial flowfield is critical for modeling, prediction, and remediation of the subsurface. We address this using a Lagrangian approach: instead of depicting spatial extents of solutes in the subsurface we focus on their arrival distribution at the control well(s). A well-to-well treatment system that incorporates in situ microbially-mediated ureolysis to induce calcite precipitation for the immobilization of strontium-90 has been explored at the Vadose Zone Research Park (VZRP) near Idaho Falls, Idaho. PHREEQC2 is utilized to model the kinetically-controlled ureolysis and consequent calcite precipitation. PHREEQC2 provides a one-dimensional advective-dispersive transport option that can be and has been used in streamtube ensemble models. Traditionally, each streamtube maintains uniform velocity; however in radial flow in homogeneous media, the velocity within any given streamtube is variable in space, being highest at the input and output wells and approaching a minimum at the midpoint between the wells. This idealized velocity variability is of significance if kinetic reactions are present with multiple components, if kinetic reaction rates vary in space, if the reactions involve multiple phases (e.g. heterogeneous reactions), and/or if they impact physical characteristics (porosity/permeability), as does ureolytically driven calcite precipitation. Streamtube velocity patterns for any particular configuration of injection and withdrawal wells are available as explicit calculations from potential theory, and also from particle tracking programs. To approximate the actual spatial distribution of velocity along streamtubes, we assume idealized non-uniform velocity associated with homogeneous media. This is implemented in PHREEQC2 via a non-uniform spatial discretization within each streamtube that honors both the streamtube’s travel time and the idealized “fast-slow-fast” nonuniform velocity along the streamline. Breakthrough curves produced by each simulation are weighted by the path-respective flux fractions (obtained by deconvolution of tracer tests conducted at the VZRP) to obtain the flux-average of flow contributions to the observation well. Breakthrough data from urea injection experiments performed at the VZRP are compared to the model results from the PHREEQC2 variable velocity ensemble.
NASA Astrophysics Data System (ADS)
Elsberry, Russell L.; Jordan, Mary S.; Vitart, Frederic
2010-05-01
The objective of this study is to provide evidence of predictability on intraseasonal time scales (10-30 days) for western North Pacific tropical cyclone formation and subsequent tracks using the 51-member ECMWF 32-day forecasts made once a week from 5 June through 25 December 2008. Ensemble storms are defined by grouping ensemble member vortices whose positions are within a specified separation distance that is equal to 180 n mi at the initial forecast time t and increases linearly to 420 n mi at Day 14 and then is constant. The 12-h track segments are calculated with a Weighted-Mean Vector Motion technique in which the weighting factor is inversely proportional to the distance from the endpoint of the previous 12-h motion vector. Seventy-six percent of the ensemble storms had five or fewer member vortices. On average, the ensemble storms begin 2.5 days before the first entry of the Joint Typhoon Warning Center (JTWC) best-track file, tend to translate too slowly in the deep tropics, and persist for longer periods over land. A strict objective matching technique with the JTWC storms is combined with a second subjective procedure that is then applied to identify nearby ensemble storms that would indicate a greater likelihood of a tropical cyclone developing in that region with that track orientation. The ensemble storms identified in the ECMWF 32-day forecasts provided guidance on intraseasonal timescales of the formations and tracks of the three strongest typhoons and two other typhoons, but not for two early season typhoons and the late season Dolphin. Four strong tropical storms were predicted consistently over Week-1 through Week-4, as was one weak tropical storm. Two other weak tropical storms, three tropical cyclones that developed from precursor baroclinic systems, and three other tropical depressions were not predicted on intraseasonal timescales. At least for the strongest tropical cyclones during the peak season, the ECMWF 32-day ensemble provides guidance of formation and tracks on 10-30 day timescales.
Scanless nonlinear optical microscope for image reconstruction and space-time correlation analysis
NASA Astrophysics Data System (ADS)
Ceffa, N. G.; Radaelli, F.; Pozzi, P.; Collini, M.; Sironi, L.; D'alfonso, L.; Chirico, G.
2017-06-01
Optical Microscopy has been applied to life science from its birth and reached widespread application due to its major advantages: limited perturbation of the biological tissue and the easy accessibility of the light sources. However, as the spatial and time resolution requirements and the time stability of the microscopes increase, researchers are struggling against some of its limitations: limited transparency and the refractivity of the living tissue to light and the field perturbations induced by the path in the tissue. We have developed a compact stand-alone, completely scan-less, optical setup that allows to acquire non-linear excitation images and to measure the sample dynamics simultaneously on an ensemble of arbitrary chosen regions of interests. The image is obtained by shining a square array of spots on the sample obtained by a spatial light modulator and by shifting it (10 ms refresh time) on the sample. The final image is computed from the superposition of (100-1000) images. Filtering procedures can be applied to the raw images of the excitation array before building the image. We discuss results that show how this setup can be used for the correction of wave front aberrations induced by turbid samples (such as living tissues) and for the computation of space-time cross-correlations in complex networks.
Finite-size anomalies of the Drude weight: Role of symmetries and ensembles
NASA Astrophysics Data System (ADS)
Sánchez, R. J.; Varma, V. K.
2017-12-01
We revisit the numerical problem of computing the high temperature spin stiffness, or Drude weight, D of the spin-1 /2 X X Z chain using exact diagonalization to systematically analyze its dependence on system symmetries and ensemble. Within the canonical ensemble and for states with zero total magnetization, we find D vanishes exactly due to spin-inversion symmetry for all but the anisotropies Δ˜M N=cos(π M /N ) with N ,M ∈Z+ coprimes and N >M , provided system sizes L ≥2 N , for which states with different spin-inversion signature become degenerate due to the underlying s l2 loop algebra symmetry. All these loop-algebra degenerate states carry finite currents which we conjecture [based on data from the system sizes and anisotropies Δ˜M N (with N
Multivariate localization methods for ensemble Kalman filtering
NASA Astrophysics Data System (ADS)
Roh, S.; Jun, M.; Szunyogh, I.; Genton, M. G.
2015-05-01
In ensemble Kalman filtering (EnKF), the small number of ensemble members that is feasible to use in a practical data assimilation application leads to sampling variability of the estimates of the background error covariances. The standard approach to reducing the effects of this sampling variability, which has also been found to be highly efficient in improving the performance of EnKF, is the localization of the estimates of the covariances. One family of localization techniques is based on taking the Schur (entry-wise) product of the ensemble-based sample covariance matrix and a correlation matrix whose entries are obtained by the discretization of a distance-dependent correlation function. While the proper definition of the localization function for a single state variable has been extensively investigated, a rigorous definition of the localization function for multiple state variables has been seldom considered. This paper introduces two strategies for the construction of localization functions for multiple state variables. The proposed localization functions are tested by assimilating simulated observations experiments into the bivariate Lorenz 95 model with their help.
A Simple Approach to Account for Climate Model Interdependence in Multi-Model Ensembles
NASA Astrophysics Data System (ADS)
Herger, N.; Abramowitz, G.; Angelil, O. M.; Knutti, R.; Sanderson, B.
2016-12-01
Multi-model ensembles are an indispensable tool for future climate projection and its uncertainty quantification. Ensembles containing multiple climate models generally have increased skill, consistency and reliability. Due to the lack of agreed-on alternatives, most scientists use the equally-weighted multi-model mean as they subscribe to model democracy ("one model, one vote").Different research groups are known to share sections of code, parameterizations in their model, literature, or even whole model components. Therefore, individual model runs do not represent truly independent estimates. Ignoring this dependence structure might lead to a false model consensus, wrong estimation of uncertainty and effective number of independent models.Here, we present a way to partially address this problem by selecting a subset of CMIP5 model runs so that its climatological mean minimizes the RMSE compared to a given observation product. Due to the cancelling out of errors, regional biases in the ensemble mean are reduced significantly.Using a model-as-truth experiment we demonstrate that those regional biases persist into the future and we are not fitting noise, thus providing improved observationally-constrained projections of the 21st century. The optimally selected ensemble shows significantly higher global mean surface temperature projections than the original ensemble, where all the model runs are considered. Moreover, the spread is decreased well beyond that expected from the decreased ensemble size.Several previous studies have recommended an ensemble selection approach based on performance ranking of the model runs. Here, we show that this approach can perform even worse than randomly selecting ensemble members and can thus be harmful. We suggest that accounting for interdependence in the ensemble selection process is a necessary step for robust projections for use in impact assessments, adaptation and mitigation of climate change.
Giuliani, Alessandro; Tomita, Masaru
2010-01-01
Cell fate decision remarkably generates specific cell differentiation path among the multiple possibilities that can arise through the complex interplay of high-dimensional genome activities. The coordinated action of thousands of genes to switch cell fate decision has indicated the existence of stable attractors guiding the process. However, origins of the intracellular mechanisms that create “cellular attractor” still remain unknown. Here, we examined the collective behavior of genome-wide expressions for neutrophil differentiation through two different stimuli, dimethyl sulfoxide (DMSO) and all-trans-retinoic acid (atRA). To overcome the difficulties of dealing with single gene expression noises, we grouped genes into ensembles and analyzed their expression dynamics in correlation space defined by Pearson correlation and mutual information. The standard deviation of correlation distributions of gene ensembles reduces when the ensemble size is increased following the inverse square root law, for both ensembles chosen randomly from whole genome and ranked according to expression variances across time. Choosing the ensemble size of 200 genes, we show the two probability distributions of correlations of randomly selected genes for atRA and DMSO responses overlapped after 48 hours, defining the neutrophil attractor. Next, tracking the ranked ensembles' trajectories, we noticed that only certain, not all, fall into the attractor in a fractal-like manner. The removal of these genome elements from the whole genomes, for both atRA and DMSO responses, destroys the attractor providing evidence for the existence of specific genome elements (named “genome vehicle”) responsible for the neutrophil attractor. Notably, within the genome vehicles, genes with low or moderate expression changes, which are often considered noisy and insignificant, are essential components for the creation of the neutrophil attractor. Further investigations along with our findings might provide a comprehensive mechanistic view of cell fate decision. PMID:20725638
Quantum chaos inside black holes
NASA Astrophysics Data System (ADS)
Addazi, Andrea
2017-06-01
We show how semiclassical black holes can be reinterpreted as an effective geometry, composed of a large ensemble of horizonless naked singularities (eventually smoothed at the Planck scale). We call these new items frizzy-balls, which can be rigorously defined by Euclidean path integral approach. This leads to interesting implications about information paradoxes. We demonstrate that infalling information will chaotically propagate inside this system before going to the full quantum gravity regime (Planck scale).
Path statistics, memory, and coarse-graining of continuous-time random walks on networks
Kion-Crosby, Willow; Morozov, Alexandre V.
2015-01-01
Continuous-time random walks (CTRWs) on discrete state spaces, ranging from regular lattices to complex networks, are ubiquitous across physics, chemistry, and biology. Models with coarse-grained states (for example, those employed in studies of molecular kinetics) or spatial disorder can give rise to memory and non-exponential distributions of waiting times and first-passage statistics. However, existing methods for analyzing CTRWs on complex energy landscapes do not address these effects. Here we use statistical mechanics of the nonequilibrium path ensemble to characterize first-passage CTRWs on networks with arbitrary connectivity, energy landscape, and waiting time distributions. Our approach can be applied to calculating higher moments (beyond the mean) of path length, time, and action, as well as statistics of any conservative or non-conservative force along a path. For homogeneous networks, we derive exact relations between length and time moments, quantifying the validity of approximating a continuous-time process with its discrete-time projection. For more general models, we obtain recursion relations, reminiscent of transfer matrix and exact enumeration techniques, to efficiently calculate path statistics numerically. We have implemented our algorithm in PathMAN (Path Matrix Algorithm for Networks), a Python script that users can apply to their model of choice. We demonstrate the algorithm on a few representative examples which underscore the importance of non-exponential distributions, memory, and coarse-graining in CTRWs. PMID:26646868
Ensemble coding of face identity is present but weaker in congenital prosopagnosia.
Robson, Matthew K; Palermo, Romina; Jeffery, Linda; Neumann, Markus F
2018-03-01
Individuals with congenital prosopagnosia (CP) are impaired at identifying individual faces but do not appear to show impairments in extracting the average identity from a group of faces (known as ensemble coding). However, possible deficits in ensemble coding in a previous study (CPs n = 4) may have been masked because CPs relied on pictorial (image) cues rather than identity cues. Here we asked whether a larger sample of CPs (n = 11) would show intact ensemble coding of identity when availability of image cues was minimised. Participants viewed a "set" of four faces and then judged whether a subsequent individual test face, either an exemplar or a "set average", was in the preceding set. Ensemble coding occurred when matching (vs. mismatching) averages were mistakenly endorsed as set members. We assessed both image- and identity-based ensemble coding, by varying whether test faces were either the same or different images of the identities in the set. CPs showed significant ensemble coding in both tasks, indicating that their performance was independent of image cues. As a group, CPs' ensemble coding was weaker than controls in both tasks, consistent with evidence that perceptual processing of face identity is disrupted in CP. This effect was driven by CPs (n= 3) who, in addition to having impaired face memory, also performed particularly poorly on a measure of face perception (CFPT). Future research, using larger samples, should examine whether deficits in ensemble coding may be restricted to CPs who also have substantial face perception deficits. Copyright © 2018 Elsevier Ltd. All rights reserved.
Visualization and classification of physiological failure modes in ensemble hemorrhage simulation
NASA Astrophysics Data System (ADS)
Zhang, Song; Pruett, William Andrew; Hester, Robert
2015-01-01
In an emergency situation such as hemorrhage, doctors need to predict which patients need immediate treatment and care. This task is difficult because of the diverse response to hemorrhage in human population. Ensemble physiological simulations provide a means to sample a diverse range of subjects and may have a better chance of containing the correct solution. However, to reveal the patterns and trends from the ensemble simulation is a challenging task. We have developed a visualization framework for ensemble physiological simulations. The visualization helps users identify trends among ensemble members, classify ensemble member into subpopulations for analysis, and provide prediction to future events by matching a new patient's data to existing ensembles. We demonstrated the effectiveness of the visualization on simulated physiological data. The lessons learned here can be applied to clinically-collected physiological data in the future.
Selecting a Classification Ensemble and Detecting Process Drift in an Evolving Data Stream
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heredia-Langner, Alejandro; Rodriguez, Luke R.; Lin, Andy
2015-09-30
We characterize the commercial behavior of a group of companies in a common line of business using a small ensemble of classifiers on a stream of records containing commercial activity information. This approach is able to effectively find a subset of classifiers that can be used to predict company labels with reasonable accuracy. Performance of the ensemble, its error rate under stable conditions, can be characterized using an exponentially weighted moving average (EWMA) statistic. The behavior of the EWMA statistic can be used to monitor a record stream from the commercial network and determine when significant changes have occurred. Resultsmore » indicate that larger classification ensembles may not necessarily be optimal, pointing to the need to search the combinatorial classifier space in a systematic way. Results also show that current and past performance of an ensemble can be used to detect when statistically significant changes in the activity of the network have occurred. The dataset used in this work contains tens of thousands of high level commercial activity records with continuous and categorical variables and hundreds of labels, making classification challenging.« less
Designing boosting ensemble of relational fuzzy systems.
Scherer, Rafał
2010-10-01
A method frequently used in classification systems for improving classification accuracy is to combine outputs of several classifiers. Among various types of classifiers, fuzzy ones are tempting because of using intelligible fuzzy if-then rules. In the paper we build an AdaBoost ensemble of relational neuro-fuzzy classifiers. Relational fuzzy systems bond input and output fuzzy linguistic values by a binary relation; thus, fuzzy rules have additional, comparing to traditional fuzzy systems, weights - elements of a fuzzy relation matrix. Thanks to this the system is better adjustable to data during learning. In the paper an ensemble of relational fuzzy systems is proposed. The problem is that such an ensemble contains separate rule bases which cannot be directly merged. As systems are separate, we cannot treat fuzzy rules coming from different systems as rules from the same (single) system. In the paper, the problem is addressed by a novel design of fuzzy systems constituting the ensemble, resulting in normalization of individual rule bases during learning. The method described in the paper is tested on several known benchmarks and compared with other machine learning solutions from the literature.
The Limits of Coding with Joint Constraints on Detected and Undetected Error Rates
NASA Technical Reports Server (NTRS)
Dolinar, Sam; Andrews, Kenneth; Pollara, Fabrizio; Divsalar, Dariush
2008-01-01
We develop a remarkably tight upper bound on the performance of a parameterized family of bounded angle maximum-likelihood (BA-ML) incomplete decoders. The new bound for this class of incomplete decoders is calculated from the code's weight enumerator, and is an extension of Poltyrev-type bounds developed for complete ML decoders. This bound can also be applied to bound the average performance of random code ensembles in terms of an ensemble average weight enumerator. We also formulate conditions defining a parameterized family of optimal incomplete decoders, defined to minimize both the total codeword error probability and the undetected error probability for any fixed capability of the decoder to detect errors. We illustrate the gap between optimal and BA-ML incomplete decoding via simulation of a small code.
A stochastic Markov chain model to describe lung cancer growth and metastasis.
Newton, Paul K; Mason, Jeremy; Bethel, Kelly; Bazhenova, Lyudmila A; Nieva, Jorge; Kuhn, Peter
2012-01-01
A stochastic Markov chain model for metastatic progression is developed for primary lung cancer based on a network construction of metastatic sites with dynamics modeled as an ensemble of random walkers on the network. We calculate a transition matrix, with entries (transition probabilities) interpreted as random variables, and use it to construct a circular bi-directional network of primary and metastatic locations based on postmortem tissue analysis of 3827 autopsies on untreated patients documenting all primary tumor locations and metastatic sites from this population. The resulting 50 potential metastatic sites are connected by directed edges with distributed weightings, where the site connections and weightings are obtained by calculating the entries of an ensemble of transition matrices so that the steady-state distribution obtained from the long-time limit of the Markov chain dynamical system corresponds to the ensemble metastatic distribution obtained from the autopsy data set. We condition our search for a transition matrix on an initial distribution of metastatic tumors obtained from the data set. Through an iterative numerical search procedure, we adjust the entries of a sequence of approximations until a transition matrix with the correct steady-state is found (up to a numerical threshold). Since this constrained linear optimization problem is underdetermined, we characterize the statistical variance of the ensemble of transition matrices calculated using the means and variances of their singular value distributions as a diagnostic tool. We interpret the ensemble averaged transition probabilities as (approximately) normally distributed random variables. The model allows us to simulate and quantify disease progression pathways and timescales of progression from the lung position to other sites and we highlight several key findings based on the model.
NASA Astrophysics Data System (ADS)
Mekonnen, Z. T.; Gebremichael, M.
2017-12-01
ABSTRACT In a basin like the Nile where millions of people depend on rainfed agriculture and surface water resources for their livelihoods, changes in precipitation will have tremendous social and economic consequences. General circulation models (GCMs) have been associated with high uncertainty in their projection of future precipitation for the Nile basin. Some studies tried to compare performance of different GCMs by doing a Multi-Model comparison for the region. Many indicated that there is no single model that gives the "best estimate" of precipitation for a very complex and large basin like the Nile. In this study, we used a combination of satellite and long term rain gauge precipitation measurements (TRMM and CenTrends) to evaluate the performance of 10 GCMs from the 5th Coupled Model Intercomparison Project (CMIP5) at different spatial and seasonal scales and produce a weighted ensemble projection. Our results confirm that there is no single model that gives best estimate over the region, hence the approach of creating an ensemble depending on how the model performed in specific areas and seasons resulted in an improved estimate of precipitation compared with observed values. Following the same approach, we created an ensemble of future precipitation projections for four different time periods (2000-2024, 2025-2049 and 2050-2100). The analysis showed that all the major sub-basins of the Nile will get will get more precipitation with time, even though the distribution with in the sub basin might be different. Overall the analysis showed a 15 % increase (125 mm/year) by the end of the century averaged over the area up to the Aswan dam. KEY WORDS: Climate Change, CMIP5, Nile, East Africa, CenTrends, Precipitation, Weighted Ensembles
Mapping chemicals in air using an environmental CAT scanning system: evaluation of algorithms
NASA Astrophysics Data System (ADS)
Samanta, A.; Todd, L. A.
A new technique is being developed which creates near real-time maps of chemical concentrations in air for environmental and occupational environmental applications. This technique, we call Environmental CAT Scanning, combines the real-time measuring technique of open-path Fourier transform infrared spectroscopy with the mapping capabilitites of computed tomography to produce two-dimensional concentration maps. With this system, a network of open-path measurements is obtained over an area; measurements are then processed using a tomographic algorithm to reconstruct the concentrations. This research focussed on the process of evaluating and selecting appropriate reconstruction algorithms, for use in the field, by using test concentration data from both computer simultation and laboratory chamber studies. Four algorithms were tested using three types of data: (1) experimental open-path data from studies that used a prototype opne-path Fourier transform/computed tomography system in an exposure chamber; (2) synthetic open-path data generated from maps created by kriging point samples taken in the chamber studies (in 1), and; (3) synthetic open-path data generated using a chemical dispersion model to create time seires maps. The iterative algorithms used to reconstruct the concentration data were: Algebraic Reconstruction Technique without Weights (ART1), Algebraic Reconstruction Technique with Weights (ARTW), Maximum Likelihood with Expectation Maximization (MLEM) and Multiplicative Algebraic Reconstruction Technique (MART). Maps were evaluated quantitatively and qualitatively. In general, MART and MLEM performed best, followed by ARTW and ART1. However, algorithm performance varied under different contaminant scenarios. This study showed the importance of using a variety of maps, particulary those generated using dispersion models. The time series maps provided a more rigorous test of the algorithms and allowed distinctions to be made among the algorithms. A comprehensive evaluation of algorithms, for the environmental application of tomography, requires the use of a battery of test concentration data before field implementation, which models reality and tests the limits of the algorithms.
A Prototype Cesium Clock Ensemble for The Loran-C Radionavigation System
2008-12-01
ability to discipline using all-in-view GNSS and Two-Way Satellite Time and Frequency Transfer ( TWSTFT ). I. INTRODUCTION In the mid-1990s, the Coast...the clock weighting to favor the “best” oscillator(s) or switch the AOG discipline source to use an external source of timing such as GPS or TWSTFT ...cesium trio ensemble; however, it may also use external sources such as GPS or TWSTFT . Control: The field in the lower right corner of the GUI
Waldispühl, Jérôme; Ponty, Yann
2011-11-01
The analysis of the relationship between sequences and structures (i.e., how mutations affect structures and reciprocally how structures influence mutations) is essential to decipher the principles driving molecular evolution, to infer the origins of genetic diseases, and to develop bioengineering applications such as the design of artificial molecules. Because their structures can be predicted from the sequence data only, RNA molecules provide a good framework to study this sequence-structure relationship. We recently introduced a suite of algorithms called RNAmutants which allows a complete exploration of RNA sequence-structure maps in polynomial time and space. Formally, RNAmutants takes an input sequence (or seed) to compute the Boltzmann-weighted ensembles of mutants with exactly k mutations, and sample mutations from these ensembles. However, this approach suffers from major limitations. Indeed, since the Boltzmann probabilities of the mutations depend of the free energy of the structures, RNAmutants has difficulties to sample mutant sequences with low G+C-contents. In this article, we introduce an unbiased adaptive sampling algorithm that enables RNAmutants to sample regions of the mutational landscape poorly covered by classical algorithms. We applied these methods to sample mutations with low G+C-contents. These adaptive sampling techniques can be easily adapted to explore other regions of the sequence and structural landscapes which are difficult to sample. Importantly, these algorithms come at a minimal computational cost. We demonstrate the insights offered by these techniques on studies of complete RNA sequence structures maps of sizes up to 40 nucleotides. Our results indicate that the G+C-content has a strong influence on the size and shape of the evolutionary accessible sequence and structural spaces. In particular, we show that low G+C-contents favor the apparition of internal loops and thus possibly the synthesis of tertiary structure motifs. On the other hand, high G+C-contents significantly reduce the size of the evolutionary accessible mutational landscapes.
NASA Astrophysics Data System (ADS)
Courdent, Vianney; Grum, Morten; Mikkelsen, Peter Steen
2018-01-01
Precipitation constitutes a major contribution to the flow in urban storm- and wastewater systems. Forecasts of the anticipated runoff flows, created from radar extrapolation and/or numerical weather predictions, can potentially be used to optimize operation in both wet and dry weather periods. However, flow forecasts are inevitably uncertain and their use will ultimately require a trade-off between the value of knowing what will happen in the future and the probability and consequence of being wrong. In this study we examine how ensemble forecasts from the HIRLAM-DMI-S05 numerical weather prediction (NWP) model subject to three different ensemble post-processing approaches can be used to forecast flow exceedance in a combined sewer for a wide range of ratios between the probability of detection (POD) and the probability of false detection (POFD). We use a hydrological rainfall-runoff model to transform the forecasted rainfall into forecasted flow series and evaluate three different approaches to establishing the relative operating characteristics (ROC) diagram of the forecast, which is a plot of POD against POFD for each fraction of concordant ensemble members and can be used to select the weight of evidence that matches the desired trade-off between POD and POFD. In the first approach, the rainfall input to the model is calculated for each of 25 ensemble members as a weighted average of rainfall from the NWP cells over the catchment where the weights are proportional to the areal intersection between the catchment and the NWP cells. In the second approach, a total of 2825 flow ensembles are generated using rainfall input from the neighbouring NWP cells up to approximately 6 cells in all directions from the catchment. In the third approach, the first approach is extended spatially by successively increasing the area covered and for each spatial increase and each time step selecting only the cell with the highest intensity resulting in a total of 175 ensemble members. While the first and second approaches have the disadvantage of not covering the full range of the ROC diagram and being computationally heavy, respectively, the third approach leads to both a broad coverage of the ROC diagram range at a relatively low computational cost. A broad coverage of the ROC diagram offers a larger selection of prediction skill to choose from to best match to the prediction purpose. The study distinguishes itself from earlier research in being the first application to urban hydrology, with fast runoff and small catchments that are highly sensitive to local extremes. Furthermore, no earlier reference has been found on the highly efficient third approach using only neighbouring cells with the highest threat to expand the range of the ROC diagram. This study provides an efficient and robust approach to using ensemble rainfall forecasts affected by bias and misplacement errors for predicting flow threshold exceedance in urban drainage systems.
Regional patterns of future runoff changes from Earth system models constrained by observation
NASA Astrophysics Data System (ADS)
Yang, Hui; Zhou, Feng; Piao, Shilong; Huang, Mengtian; Chen, Anping; Ciais, Philippe; Li, Yue; Lian, Xu; Peng, Shushi; Zeng, Zhenzhong
2017-06-01
In the recent Intergovernmental Panel on Climate Change assessment, multimodel ensembles (arithmetic model averaging, AMA) were constructed with equal weights given to Earth system models, without considering the performance of each model at reproducing current conditions. Here we use Bayesian model averaging (BMA) to construct a weighted model ensemble for runoff projections. Higher weights are given to models with better performance in estimating historical decadal mean runoff. Using the BMA method, we find that by the end of this century, the increase of global runoff (9.8 ± 1.5%) under Representative Concentration Pathway 8.5 is significantly lower than estimated from AMA (12.2 ± 1.3%). BMA presents a less severe runoff increase than AMA at northern high latitudes and a more severe decrease in Amazonia. Runoff decrease in Amazonia is stronger than the intermodel difference. The intermodel difference in runoff changes is mainly caused not only by precipitation differences among models, but also by evapotranspiration differences at the high northern latitudes.
Multi-model ensembles for assessment of flood losses and associated uncertainty
NASA Astrophysics Data System (ADS)
Figueiredo, Rui; Schröter, Kai; Weiss-Motz, Alexander; Martina, Mario L. V.; Kreibich, Heidi
2018-05-01
Flood loss modelling is a crucial part of risk assessments. However, it is subject to large uncertainty that is often neglected. Most models available in the literature are deterministic, providing only single point estimates of flood loss, and large disparities tend to exist among them. Adopting any one such model in a risk assessment context is likely to lead to inaccurate loss estimates and sub-optimal decision-making. In this paper, we propose the use of multi-model ensembles to address these issues. This approach, which has been applied successfully in other scientific fields, is based on the combination of different model outputs with the aim of improving the skill and usefulness of predictions. We first propose a model rating framework to support ensemble construction, based on a probability tree of model properties, which establishes relative degrees of belief between candidate models. Using 20 flood loss models in two test cases, we then construct numerous multi-model ensembles, based both on the rating framework and on a stochastic method, differing in terms of participating members, ensemble size and model weights. We evaluate the performance of ensemble means, as well as their probabilistic skill and reliability. Our results demonstrate that well-designed multi-model ensembles represent a pragmatic approach to consistently obtain more accurate flood loss estimates and reliable probability distributions of model uncertainty.
Accelerated weight histogram method for exploring free energy landscapes
NASA Astrophysics Data System (ADS)
Lindahl, V.; Lidmar, J.; Hess, B.
2014-07-01
Calculating free energies is an important and notoriously difficult task for molecular simulations. The rapid increase in computational power has made it possible to probe increasingly complex systems, yet extracting accurate free energies from these simulations remains a major challenge. Fully exploring the free energy landscape of, say, a biological macromolecule typically requires sampling large conformational changes and slow transitions. Often, the only feasible way to study such a system is to simulate it using an enhanced sampling method. The accelerated weight histogram (AWH) method is a new, efficient extended ensemble sampling technique which adaptively biases the simulation to promote exploration of the free energy landscape. The AWH method uses a probability weight histogram which allows for efficient free energy updates and results in an easy discretization procedure. A major advantage of the method is its general formulation, making it a powerful platform for developing further extensions and analyzing its relation to already existing methods. Here, we demonstrate its efficiency and general applicability by calculating the potential of mean force along a reaction coordinate for both a single dimension and multiple dimensions. We make use of a non-uniform, free energy dependent target distribution in reaction coordinate space so that computational efforts are not wasted on physically irrelevant regions. We present numerical results for molecular dynamics simulations of lithium acetate in solution and chignolin, a 10-residue long peptide that folds into a β-hairpin. We further present practical guidelines for setting up and running an AWH simulation.
Accelerated weight histogram method for exploring free energy landscapes.
Lindahl, V; Lidmar, J; Hess, B
2014-07-28
Calculating free energies is an important and notoriously difficult task for molecular simulations. The rapid increase in computational power has made it possible to probe increasingly complex systems, yet extracting accurate free energies from these simulations remains a major challenge. Fully exploring the free energy landscape of, say, a biological macromolecule typically requires sampling large conformational changes and slow transitions. Often, the only feasible way to study such a system is to simulate it using an enhanced sampling method. The accelerated weight histogram (AWH) method is a new, efficient extended ensemble sampling technique which adaptively biases the simulation to promote exploration of the free energy landscape. The AWH method uses a probability weight histogram which allows for efficient free energy updates and results in an easy discretization procedure. A major advantage of the method is its general formulation, making it a powerful platform for developing further extensions and analyzing its relation to already existing methods. Here, we demonstrate its efficiency and general applicability by calculating the potential of mean force along a reaction coordinate for both a single dimension and multiple dimensions. We make use of a non-uniform, free energy dependent target distribution in reaction coordinate space so that computational efforts are not wasted on physically irrelevant regions. We present numerical results for molecular dynamics simulations of lithium acetate in solution and chignolin, a 10-residue long peptide that folds into a β-hairpin. We further present practical guidelines for setting up and running an AWH simulation.
Ozçift, Akin
2011-05-01
Supervised classification algorithms are commonly used in the designing of computer-aided diagnosis systems. In this study, we present a resampling strategy based Random Forests (RF) ensemble classifier to improve diagnosis of cardiac arrhythmia. Random forests is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. In this way, an RF ensemble classifier performs better than a single tree from classification performance point of view. In general, multiclass datasets having unbalanced distribution of sample sizes are difficult to analyze in terms of class discrimination. Cardiac arrhythmia is such a dataset that has multiple classes with small sample sizes and it is therefore adequate to test our resampling based training strategy. The dataset contains 452 samples in fourteen types of arrhythmias and eleven of these classes have sample sizes less than 15. Our diagnosis strategy consists of two parts: (i) a correlation based feature selection algorithm is used to select relevant features from cardiac arrhythmia dataset. (ii) RF machine learning algorithm is used to evaluate the performance of selected features with and without simple random sampling to evaluate the efficiency of proposed training strategy. The resultant accuracy of the classifier is found to be 90.0% and this is a quite high diagnosis performance for cardiac arrhythmia. Furthermore, three case studies, i.e., thyroid, cardiotocography and audiology, are used to benchmark the effectiveness of the proposed method. The results of experiments demonstrated the efficiency of random sampling strategy in training RF ensemble classification algorithm. Copyright © 2011 Elsevier Ltd. All rights reserved.
Liu, Wei; Du, Peijun; Wang, Dongchen
2015-01-01
One important method to obtain the continuous surfaces of soil properties from point samples is spatial interpolation. In this paper, we propose a method that combines ensemble learning with ancillary environmental information for improved interpolation of soil properties (hereafter, EL-SP). First, we calculated the trend value for soil potassium contents at the Qinghai Lake region in China based on measured values. Then, based on soil types, geology types, land use types, and slope data, the remaining residual was simulated with the ensemble learning model. Next, the EL-SP method was applied to interpolate soil potassium contents at the study site. To evaluate the utility of the EL-SP method, we compared its performance with other interpolation methods including universal kriging, inverse distance weighting, ordinary kriging, and ordinary kriging combined geographic information. Results show that EL-SP had a lower mean absolute error and root mean square error than the data produced by the other models tested in this paper. Notably, the EL-SP maps can describe more locally detailed information and more accurate spatial patterns for soil potassium content than the other methods because of the combined use of different types of environmental information; these maps are capable of showing abrupt boundary information for soil potassium content. Furthermore, the EL-SP method not only reduces prediction errors, but it also compliments other environmental information, which makes the spatial interpolation of soil potassium content more reasonable and useful.
Minimalist ensemble algorithms for genome-wide protein localization prediction.
Lin, Jhih-Rong; Mondal, Ananda Mohan; Liu, Rong; Hu, Jianjun
2012-07-03
Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.
Minimalist ensemble algorithms for genome-wide protein localization prediction
2012-01-01
Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. PMID:22759391
On the generation of climate model ensembles
NASA Astrophysics Data System (ADS)
Haughton, Ned; Abramowitz, Gab; Pitman, Andy; Phipps, Steven J.
2014-10-01
Climate model ensembles are used to estimate uncertainty in future projections, typically by interpreting the ensemble distribution for a particular variable probabilistically. There are, however, different ways to produce climate model ensembles that yield different results, and therefore different probabilities for a future change in a variable. Perhaps equally importantly, there are different approaches to interpreting the ensemble distribution that lead to different conclusions. Here we use a reduced-resolution climate system model to compare three common ways to generate ensembles: initial conditions perturbation, physical parameter perturbation, and structural changes. Despite these three approaches conceptually representing very different categories of uncertainty within a modelling system, when comparing simulations to observations of surface air temperature they can be very difficult to separate. Using the twentieth century CMIP5 ensemble for comparison, we show that initial conditions ensembles, in theory representing internal variability, significantly underestimate observed variance. Structural ensembles, perhaps less surprisingly, exhibit over-dispersion in simulated variance. We argue that future climate model ensembles may need to include parameter or structural perturbation members in addition to perturbed initial conditions members to ensure that they sample uncertainty due to internal variability more completely. We note that where ensembles are over- or under-dispersive, such as for the CMIP5 ensemble, estimates of uncertainty need to be treated with care.
Differential-Evolution Control Parameter Optimization for Unmanned Aerial Vehicle Path Planning
Kok, Kai Yit; Rajendran, Parvathy
2016-01-01
The differential evolution algorithm has been widely applied on unmanned aerial vehicle (UAV) path planning. At present, four random tuning parameters exist for differential evolution algorithm, namely, population size, differential weight, crossover, and generation number. These tuning parameters are required, together with user setting on path and computational cost weightage. However, the optimum settings of these tuning parameters vary according to application. Instead of trial and error, this paper presents an optimization method of differential evolution algorithm for tuning the parameters of UAV path planning. The parameters that this research focuses on are population size, differential weight, crossover, and generation number. The developed algorithm enables the user to simply define the weightage desired between the path and computational cost to converge with the minimum generation required based on user requirement. In conclusion, the proposed optimization of tuning parameters in differential evolution algorithm for UAV path planning expedites and improves the final output path and computational cost. PMID:26943630
Program for narrow-band analysis of aircraft flyover noise using ensemble averaging techniques
NASA Technical Reports Server (NTRS)
Gridley, D.
1982-01-01
A package of computer programs was developed for analyzing acoustic data from an aircraft flyover. The package assumes the aircraft is flying at constant altitude and constant velocity in a fixed attitude over a linear array of ground microphones. Aircraft position is provided by radar and an option exists for including the effects of the aircraft's rigid-body attitude relative to the flight path. Time synchronization between radar and acoustic recording stations permits ensemble averaging techniques to be applied to the acoustic data thereby increasing the statistical accuracy of the acoustic results. Measured layered meteorological data obtained during the flyovers are used to compute propagation effects through the atmosphere. Final results are narrow-band spectra and directivities corrected for the flight environment to an equivalent static condition at a specified radius.
A new approach to human microRNA target prediction using ensemble pruning and rotation forest.
Mousavi, Reza; Eftekhari, Mahdi; Haghighi, Mehdi Ghezelbash
2015-12-01
MicroRNAs (miRNAs) are small non-coding RNAs that have important functions in gene regulation. Since finding miRNA target experimentally is costly and needs spending much time, the use of machine learning methods is a growing research area for miRNA target prediction. In this paper, a new approach is proposed by using two popular ensemble strategies, i.e. Ensemble Pruning and Rotation Forest (EP-RTF), to predict human miRNA target. For EP, the approach utilizes Genetic Algorithm (GA). In other words, a subset of classifiers from the heterogeneous ensemble is first selected by GA. Next, the selected classifiers are trained based on the RTF method and then are combined using weighted majority voting. In addition to seeking a better subset of classifiers, the parameter of RTF is also optimized by GA. Findings of the present study confirm that the newly developed EP-RTF outperforms (in terms of classification accuracy, sensitivity, and specificity) the previously applied methods over four datasets in the field of human miRNA target. Diversity-error diagrams reveal that the proposed ensemble approach constructs individual classifiers which are more accurate and usually diverse than the other ensemble approaches. Given these experimental results, we highly recommend EP-RTF for improving the performance of miRNA target prediction.
NASA Astrophysics Data System (ADS)
Multsch, S.; Exbrayat, J.-F.; Kirby, M.; Viney, N. R.; Frede, H.-G.; Breuer, L.
2014-11-01
Irrigation agriculture plays an increasingly important role in food supply. Many evapotranspiration models are used today to estimate the water demand for irrigation. They consider different stages of crop growth by empirical crop coefficients to adapt evapotranspiration throughout the vegetation period. We investigate the importance of the model structural vs. model parametric uncertainty for irrigation simulations by considering six evapotranspiration models and five crop coefficient sets to estimate irrigation water requirements for growing wheat in the Murray-Darling Basin, Australia. The study is carried out using the spatial decision support system SPARE:WATER. We find that structural model uncertainty is far more important than model parametric uncertainty to estimate irrigation water requirement. Using the Reliability Ensemble Averaging (REA) technique, we are able to reduce the overall predictive model uncertainty by more than 10%. The exceedance probability curve of irrigation water requirements shows that a certain threshold, e.g. an irrigation water limit due to water right of 400 mm, would be less frequently exceeded in case of the REA ensemble average (45%) in comparison to the equally weighted ensemble average (66%). We conclude that multi-model ensemble predictions and sophisticated model averaging techniques are helpful in predicting irrigation demand and provide relevant information for decision making.
NASA Astrophysics Data System (ADS)
Liu, Li; Gao, Chao; Xuan, Weidong; Xu, Yue-Ping
2017-11-01
Ensemble flood forecasts by hydrological models using numerical weather prediction products as forcing data are becoming more commonly used in operational flood forecasting applications. In this study, a hydrological ensemble flood forecasting system comprised of an automatically calibrated Variable Infiltration Capacity model and quantitative precipitation forecasts from TIGGE dataset is constructed for Lanjiang Basin, Southeast China. The impacts of calibration strategies and ensemble methods on the performance of the system are then evaluated. The hydrological model is optimized by the parallel programmed ε-NSGA II multi-objective algorithm. According to the solutions by ε-NSGA II, two differently parameterized models are determined to simulate daily flows and peak flows at each of the three hydrological stations. Then a simple yet effective modular approach is proposed to combine these daily and peak flows at the same station into one composite series. Five ensemble methods and various evaluation metrics are adopted. The results show that ε-NSGA II can provide an objective determination on parameter estimation, and the parallel program permits a more efficient simulation. It is also demonstrated that the forecasts from ECMWF have more favorable skill scores than other Ensemble Prediction Systems. The multimodel ensembles have advantages over all the single model ensembles and the multimodel methods weighted on members and skill scores outperform other methods. Furthermore, the overall performance at three stations can be satisfactory up to ten days, however the hydrological errors can degrade the skill score by approximately 2 days, and the influence persists until a lead time of 10 days with a weakening trend. With respect to peak flows selected by the Peaks Over Threshold approach, the ensemble means from single models or multimodels are generally underestimated, indicating that the ensemble mean can bring overall improvement in forecasting of flows. For peak values taking flood forecasts from each individual member into account is more appropriate.
NASA Astrophysics Data System (ADS)
Clark, E.; Wood, A.; Nijssen, B.; Newman, A. J.; Mendoza, P. A.
2016-12-01
The System for Hydrometeorological Applications, Research and Prediction (SHARP), developed at the National Center for Atmospheric Research (NCAR), University of Washington, U.S. Army Corps of Engineers, and U.S. Bureau of Reclamation, is a fully automated ensemble prediction system for short-term to seasonal applications. It incorporates uncertainty in initial hydrologic conditions (IHCs) and in hydrometeorological predictions. In this implementation, IHC uncertainty is estimated by propagating an ensemble of 100 plausible temperature and precipitation time series through the Sacramento/Snow-17 model. The forcing ensemble explicitly accounts for measurement and interpolation uncertainties in the development of gridded meteorological forcing time series. The resulting ensemble of derived IHCs exhibits a broad range of possible soil moisture and snow water equivalent (SWE) states. To select the IHCs that are most consistent with the observations, we employ a particle filter (PF) that weights IHC ensemble members based on observations of streamflow and SWE. These particles are then used to initialize ensemble precipitation and temperature forecasts downscaled from the Global Ensemble Forecast System (GEFS), generating a streamflow forecast ensemble. We test this method in two basins in the Pacific Northwest that are important for water resources management: 1) the Green River upstream of Howard Hanson Dam, and 2) the South Fork Flathead River upstream of Hungry Horse Dam. The first of these is characterized by mixed snow and rain, while the second is snow-dominated. The PF-based forecasts are compared to forecasts based on a single IHC (corresponding to median streamflow) paired with the full GEFS ensemble, and 2) the full IHC ensemble, without filtering, paired with the full GEFS ensemble. In addition to assessing improvements in the spread of IHCs, we perform a hindcast experiment to evaluate the utility of PF-based data assimilation on streamflow forecasts at 1- to 7-day lead times.
Quasi-most unstable modes: a window to 'À la carte' ensemble diversity?
NASA Astrophysics Data System (ADS)
Homar Santaner, Victor; Stensrud, David J.
2010-05-01
The atmospheric scientific community is nowadays facing the ambitious challenge of providing useful forecasts of atmospheric events that produce high societal impact. The low level of social resilience to false alarms creates tremendous pressure on forecasting offices to issue accurate, timely and reliable warnings.Currently, no operational numerical forecasting system is able to respond to the societal demand for high-resolution (in time and space) predictions in the 12-72h time span. The main reasons for such deficiencies are the lack of adequate observations and the high non-linearity of the numerical models that are currently used. The whole weather forecasting problem is intrinsically probabilistic and current methods aim at coping with the various sources of uncertainties and the error propagation throughout the forecasting system. This probabilistic perspective is often created by generating ensembles of deterministic predictions that are aimed at sampling the most important sources of uncertainty in the forecasting system. The ensemble generation/sampling strategy is a crucial aspect of their performance and various methods have been proposed. Although global forecasting offices have been using ensembles of perturbed initial conditions for medium-range operational forecasts since 1994, no consensus exists regarding the optimum sampling strategy for high resolution short-range ensemble forecasts. Bred vectors, however, have been hypothesized to better capture the growing modes in the highly nonlinear mesoscale dynamics of severe episodes than singular vectors or observation perturbations. Yet even this technique is not able to produce enough diversity in the ensembles to accurately and routinely predict extreme phenomena such as severe weather. Thus, we propose a new method to generate ensembles of initial conditions perturbations that is based on the breeding technique. Given a standard bred mode, a set of customized perturbations is derived with specified amplitudes and horizontal scales. This allows the ensemble to excite growing modes across a wider range of scales. Results show that this approach produces significantly more spread in the ensemble prediction than standard bred modes alone. Several examples that illustrate the benefits from this approach for severe weather forecasts will be provided.
Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.
Yu, Zhiwen; Chen, Hantao; You, Jane; Han, Guoqiang; Li, Le
2013-01-01
Cancer class discovery using biomolecular data is one of the most important tasks for cancer diagnosis and treatment. Tumor clustering from gene expression data provides a new way to perform cancer class discovery. Most of the existing research works adopt single-clustering algorithms to perform tumor clustering is from biomolecular data that lack robustness, stability, and accuracy. To further improve the performance of tumor clustering from biomolecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from biomolecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks (HFCEF), named as HFCEF-I, HFCEF-II, HFCEF-III, and HFCEF-IV, respectively, to identify samples that belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. HFCEFs adopt suitable consensus functions, such as the fuzzy c-means algorithm or the normalized cut algorithm (Ncut), to summarize generated fuzzy matrices, and obtain the final results. The experiments on real data sets from UCI machine learning repository and cancer gene expression profiles illustrate that 1) the proposed hybrid fuzzy cluster ensemble frameworks work well on real data sets, especially biomolecular data, and 2) the proposed approaches are able to provide more robust, stable, and accurate results when compared with the state-of-the-art single clustering algorithms and traditional cluster ensemble approaches.
Using simulation to interpret experimental data in terms of protein conformational ensembles.
Allison, Jane R
2017-04-01
In their biological environment, proteins are dynamic molecules, necessitating an ensemble structural description. Molecular dynamics simulations and solution-state experiments provide complimentary information in the form of atomically detailed coordinates and averaged or distributions of structural properties or related quantities. Recently, increases in the temporal and spatial scale of conformational sampling and comparison of the more diverse conformational ensembles thus generated have revealed the importance of sampling rare events. Excitingly, new methods based on maximum entropy and Bayesian inference are promising to provide a statistically sound mechanism for combining experimental data with molecular dynamics simulations. Copyright © 2016 Elsevier Ltd. All rights reserved.
Georgieva, Elka R.; Roy, Aritro S.; Grigoryants, Vladimir M.; Borbat, Petr P.; Earle, Keith A.; Scholes, Charles P.; Freed, Jack H.
2012-01-01
Pulsed dipolar ESR spectroscopy, DEER and DQC, require frozen samples. An important issue in the biological application of this technique is how the freezing rate and concentration of cryoprotectant could possibly affect the conformation of biomacromolecule and/or spin-label. We studied in detail the effect of these experimental variables on the distance distributions obtained by DEER from a series of doubly spin-labeled T4 lysozyme mutants. We found that the rate of sample freezing affects mainly the ensemble of spin-label rotamers, but the distance maxima remain essentially unchanged. This suggests that proteins frozen in a regular manner in liquid nitrogen faithfully maintain the distance-dependent structural properties in solution. We compared the results from rapidly freeze-quenched (≤100 μs) samples to those from commonly shock-frozen (slow freeze, 1s or longer) samples. For all the mutants studied we obtained inter-spin distance distributions, which were broader for rapidly frozen samples than for slowly frozen ones. We infer that rapid freezing trapped a larger ensemble of spin label rotamers; whereas, on the time-scale of slower freezing the protein and spin-label achieve a population showing fewer low-energy conformers. We used glycerol as a cryoprotectant in concentrations of 10% and 30% by weight. With 10% glycerol and slow freezing, we observed an increased slope of background signals, which in DEER is related to increased local spin concentration, in this case due to insufficient solvent vitrification, and therefore protein aggregation. This effect was considerably suppressed in slowly frozen samples containing 30% glycerol and rapidly frozen samples containing 10% glycerol. The assignment of bimodal distributions to tether rotamers as opposed to protein conformations is aided by comparing results using MTSL and 4-Bromo MTSL spin-labels. The latter usually produce narrower distance distributions. PMID:22341208
Efficient sampling of complex network with modified random walk strategies
NASA Astrophysics Data System (ADS)
Xie, Yunya; Chang, Shuhua; Zhang, Zhipeng; Zhang, Mi; Yang, Lei
2018-02-01
We present two novel random walk strategies, choosing seed node (CSN) random walk and no-retracing (NR) random walk. Different from the classical random walk sampling, the CSN and NR strategies focus on the influences of the seed node choice and path overlap, respectively. Three random walk samplings are applied in the Erdös-Rényi (ER), Barabási-Albert (BA), Watts-Strogatz (WS), and the weighted USAir networks, respectively. Then, the major properties of sampled subnets, such as sampling efficiency, degree distributions, average degree and average clustering coefficient, are studied. The similar conclusions can be reached with these three random walk strategies. Firstly, the networks with small scales and simple structures are conducive to the sampling. Secondly, the average degree and the average clustering coefficient of the sampled subnet tend to the corresponding values of original networks with limited steps. And thirdly, all the degree distributions of the subnets are slightly biased to the high degree side. However, the NR strategy performs better for the average clustering coefficient of the subnet. In the real weighted USAir networks, some obvious characters like the larger clustering coefficient and the fluctuation of degree distribution are reproduced well by these random walk strategies.
Prediction of drug synergy in cancer using ensemble-based machine learning techniques
NASA Astrophysics Data System (ADS)
Singh, Harpreet; Rana, Prashant Singh; Singh, Urvinder
2018-04-01
Drug synergy prediction plays a significant role in the medical field for inhibiting specific cancer agents. It can be developed as a pre-processing tool for therapeutic successes. Examination of different drug-drug interaction can be done by drug synergy score. It needs efficient regression-based machine learning approaches to minimize the prediction errors. Numerous machine learning techniques such as neural networks, support vector machines, random forests, LASSO, Elastic Nets, etc., have been used in the past to realize requirement as mentioned above. However, these techniques individually do not provide significant accuracy in drug synergy score. Therefore, the primary objective of this paper is to design a neuro-fuzzy-based ensembling approach. To achieve this, nine well-known machine learning techniques have been implemented by considering the drug synergy data. Based on the accuracy of each model, four techniques with high accuracy are selected to develop ensemble-based machine learning model. These models are Random forest, Fuzzy Rules Using Genetic Cooperative-Competitive Learning method (GFS.GCCL), Adaptive-Network-Based Fuzzy Inference System (ANFIS) and Dynamic Evolving Neural-Fuzzy Inference System method (DENFIS). Ensembling is achieved by evaluating the biased weighted aggregation (i.e. adding more weights to the model with a higher prediction score) of predicted data by selected models. The proposed and existing machine learning techniques have been evaluated on drug synergy score data. The comparative analysis reveals that the proposed method outperforms others in terms of accuracy, root mean square error and coefficient of correlation.
Multiple-path model of spectral reflectance of a dyed fabric.
Rogers, Geoffrey; Dalloz, Nicolas; Fournel, Thierry; Hebert, Mathieu
2017-05-01
Experimental results are presented of the spectral reflectance of a dyed fabric as analyzed by a multiple-path model of reflection. The multiple-path model provides simple analytic expressions for reflection and transmission of turbid media by applying the Beer-Lambert law to each path through the medium and summing over all paths, each path weighted by its probability. The path-length probability is determined by a random-walk analysis. The experimental results presented here show excellent agreement with predictions made by the model.
Expansion of effective wet bulb globe temperature for vapor impermeable protective clothing.
Sakoi, Tomonori; Mochida, Tohru; Kurazumi, Yoshihito; Sawada, Shin-Ichi; Horiba, Yosuke; Kuwabara, Kohei
2018-01-01
The wet bulb globe temperature (WBGT) is an effective measure for risk screening to prevent heat dISOrders. However, a heat risk evaluation by WBGT requires adjustments depending on the clothing. In this study, we proposed a new effective WBGT (WBGT eff * ) for general vapor permeable clothing ensembles and vapor impermeable protective clothing that is applicable to occupants engaged in moderate intensity work with a metabolic heat production value of around 174W/m 2 . WBGT eff * enables the conversion of heat stress into the scale experienced by the occupant dressed in the basic clothing ensemble (work clothes) based on the heat balances for a human body. We confirmed that WBGT eff * was effective for expressing the critical thermal environments for the prescriptive zones for occupants wearing vapor impermeable protective clothing. Based on WBGT eff * , we succeeded in clarifying how the weights for natural wet bulb, globe, and air temperatures and the intercept changed depending on clothing properties and the surrounding environmental factors when heat stress is expressed by the weighted sum of natural wet bulb, globe, and air temperatures and the intercept. The weight of environmental temperatures (globe and air temperatures) for WBGT eff * for vapor impermeable protective clothing increased compared with that for general vapor permeable clothing, whereas that of the natural wet bulb temperature decreased. For WBGT eff * in outdoor conditions with a solar load, the weighting ratio of globe temperature increased and that of air temperature decreased with air velocity. Approximation equations of WBGT eff * were proposed for both general vapor permeable clothing ensembles and for vapor impermeable protective clothing. Copyright © 2017 Elsevier Ltd. All rights reserved.
Micro Unmanned Surface Vehicle for Shallow Littoral Data Sampling
NASA Astrophysics Data System (ADS)
Murphy, R. R.; Wilde, G.
2016-02-01
This paper describes the creation of an autonomous air boat that can be carried by one person, called a micro unmanned surface vehicle (USV), for sensor sampling in shallow littoral areas such as inlets and creeks. A USV offers advantages over other types of unmanned marine vehicles. Unlike an autonomous underwater vehicle, the Challenge 1.0 air boat can operate in shallow water of less than 15 cm depth and maintain network connectivity for control and data sampling. A USV does not require a tether, like a remotely operated marine vehicle (ROV), which would limit the distance and mobility. However, a USV operating in shallow littoral areas poses several challenges. Navigation is a challenge since rivers and bays may have semi-submerged obstacles and there may be no depth maps; the approach taken in the Challenge 1.0 project is to let the operator specify a safe area of the water by visual inspection and then the USV autonomously creates a path to optimally sample the collision free area. Navigation is also a challenge because of platform dynamics-the USV we describe is a non-holonomic vehicle; this paper explores spiral paths rather than boustrophedon paths. Another challenge is the quality of sensing. Water-based sensing is noisy and thus a reading at a single point may not reflect the overall value. In practice, areas are sampled rather than a single point, but the noise in the point values within the sampled area produce a survey with widely varying numbers and are difficult for humans to interpret. This paper implements an inverse distance weighting interpolation algorithm to produce a visual "heatmap" that reliably portrays the smoothed data.
2012-01-01
Background Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble? Results The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity. Conclusion Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway. PMID:23216969
Günther, Oliver P; Chen, Virginia; Freue, Gabriela Cohen; Balshaw, Robert F; Tebbutt, Scott J; Hollander, Zsuzsanna; Takhar, Mandeep; McMaster, W Robert; McManus, Bruce M; Keown, Paul A; Ng, Raymond T
2012-12-08
Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble? The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity. Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway.
Model Independence in Downscaled Climate Projections: a Case Study in the Southeast United States
NASA Astrophysics Data System (ADS)
Gray, G. M. E.; Boyles, R.
2016-12-01
Downscaled climate projections are used to deduce how the climate will change in future decades at local and regional scales. It is important to use multiple models to characterize part of the future uncertainty given the impact on adaptation decision making. This is traditionally employed through an equally-weighted ensemble of multiple GCMs downscaled using one technique. Newer practices include several downscaling techniques in an effort to increase the ensemble's representation of future uncertainty. However, this practice may be adding statistically dependent models to the ensemble. Previous research has shown a dependence problem in the GCM ensemble in multiple generations, but has not been shown in the downscaled ensemble. In this case study, seven downscaled climate projections on the daily time scale are considered: CLAREnCE10, SERAP, BCCA (CMIP5 and CMIP3 versions), Hostetler, CCR, and MACA-LIVNEH. These data represent 83 ensemble members, 44 GCMs, and two generations of GCMs. Baseline periods are compared against the University of Idaho's METDATA gridded observation dataset. Hierarchical agglomerative clustering is applied to the correlated errors to determine dependent clusters. Redundant GCMs across different downscaling techniques show the most dependence, while smaller dependence signals are detected within downscaling datasets and across generations of GCMs. These results indicate that using additional downscaled projections to increase the ensemble size must be done with care to avoid redundant GCMs and the process of downscaling may increase the dependence of those downscaled GCMs. Climate model generation does not appear dissimilar enough to be treated as two separate statistical populations for ensemble building at the local and regional scales.
Molecular dynamics simulations using temperature-enhanced essential dynamics replica exchange.
Kubitzki, Marcus B; de Groot, Bert L
2007-06-15
Today's standard molecular dynamics simulations of moderately sized biomolecular systems at full atomic resolution are typically limited to the nanosecond timescale and therefore suffer from limited conformational sampling. Efficient ensemble-preserving algorithms like replica exchange (REX) may alleviate this problem somewhat but are still computationally prohibitive due to the large number of degrees of freedom involved. Aiming at increased sampling efficiency, we present a novel simulation method combining the ideas of essential dynamics and REX. Unlike standard REX, in each replica only a selection of essential collective modes of a subsystem of interest (essential subspace) is coupled to a higher temperature, with the remainder of the system staying at a reference temperature, T(0). This selective excitation along with the replica framework permits efficient approximate ensemble-preserving conformational sampling and allows much larger temperature differences between replicas, thereby considerably enhancing sampling efficiency. Ensemble properties and sampling performance of the method are discussed using dialanine and guanylin test systems, with multi-microsecond molecular dynamics simulations of these test systems serving as references.
Dynamic Dimensionality Selection for Bayesian Classifier Ensembles
2015-03-19
learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but much more...classifier, Generative learning, Discriminative learning, Naïve Bayes, Feature selection, Logistic regression , higher order attribute independence 16...discriminative learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but
Multivariate localization methods for ensemble Kalman filtering
NASA Astrophysics Data System (ADS)
Roh, S.; Jun, M.; Szunyogh, I.; Genton, M. G.
2015-12-01
In ensemble Kalman filtering (EnKF), the small number of ensemble members that is feasible to use in a practical data assimilation application leads to sampling variability of the estimates of the background error covariances. The standard approach to reducing the effects of this sampling variability, which has also been found to be highly efficient in improving the performance of EnKF, is the localization of the estimates of the covariances. One family of localization techniques is based on taking the Schur (element-wise) product of the ensemble-based sample covariance matrix and a correlation matrix whose entries are obtained by the discretization of a distance-dependent correlation function. While the proper definition of the localization function for a single state variable has been extensively investigated, a rigorous definition of the localization function for multiple state variables that exist at the same locations has been seldom considered. This paper introduces two strategies for the construction of localization functions for multiple state variables. The proposed localization functions are tested by assimilating simulated observations experiments into the bivariate Lorenz 95 model with their help.
Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo
2016-01-01
Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble's output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) - k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer's disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.
NASA Technical Reports Server (NTRS)
Oliva-Buisson, Yvette J. (Compiler)
2014-01-01
The overall objective for this project is to evaluate two candidate alternatives for the existing Propellant Handler's Ensemble (PHE) escape ventilator. The new candidate ventilators use newer technology with similar quantities of air at approximately half the weight of the current ventilator. Ventilators are typically used to ingress/egress a hazardous work area when hard line air is provided at the work area but the hose is not long enough to get the operator to and from the staging area to the work area. The intent of this test is to verify that the new ventilators perform as well as or better than the current ventilators in maintaining proper oxygen (O2) and carbon dioxide (CO2) levels in the PHE during a typical use for the rated time period (10 minutes). We will evaluate two new units comparing them to the existing unit. Subjects will wear the Category I version of the Propellant Handler's Ensemble with the rear suit pouch snapped.
Algorithms that Defy the Gravity of Learning Curve
2017-04-28
three nearest neighbour-based anomaly detectors, i.e., an ensemble of nearest neigh- bours, a recent nearest neighbour-based ensemble method called iNNE...streams. Note that the change in sample size does not alter the geometrical data characteristics discussed here. 3.1 Experimental Methodology ...need to be answered. 3.6 Comparison with conventional ensemble methods Given the theoretical results, the third aim of this project (i.e., identify the
Model dependence and its effect on ensemble projections in CMIP5
NASA Astrophysics Data System (ADS)
Abramowitz, G.; Bishop, C.
2013-12-01
Conceptually, the notion of model dependence within climate model ensembles is relatively simple - modelling groups share a literature base, parametrisations, data sets and even model code - the potential for dependence in sampling different climate futures is clear. How though can this conceptual problem inform a practical solution that demonstrably improves the ensemble mean and ensemble variance as an estimate of system uncertainty? While some research has already focused on error correlation or error covariance as a candidate to improve ensemble mean estimates, a complete definition of independence must at least implicitly subscribe to an ensemble interpretation paradigm, such as the 'truth-plus-error', 'indistinguishable', or more recently 'replicate Earth' paradigm. Using a definition of model dependence based on error covariance within the replicate Earth paradigm, this presentation will show that accounting for dependence in surface air temperature gives cooler projections in CMIP5 - by as much as 20% globally in some RCPs - although results differ significantly for each RCP, especially regionally. The fact that the change afforded by accounting for dependence across different RCPs is different is not an inconsistent result. Different numbers of submissions to each RCP by different modelling groups mean that differences in projections from different RCPs are not entirely about RCP forcing conditions - they also reflect different sampling strategies.
Rethinking the Default Construction of Multimodel Climate Ensembles
Rauser, Florian; Gleckler, Peter; Marotzke, Jochem
2015-07-21
Here, we discuss the current code of practice in the climate sciences to routinely create climate model ensembles as ensembles of opportunity from the newest phase of the Coupled Model Intercomparison Project (CMIP). We give a two-step argument to rethink this process. First, the differences between generations of ensembles corresponding to different CMIP phases in key climate quantities are not large enough to warrant an automatic separation into generational ensembles for CMIP3 and CMIP5. Second, we suggest that climate model ensembles cannot continue to be mere ensembles of opportunity but should always be based on a transparent scientific decision process.more » If ensembles can be constrained by observation, then they should be constructed as target ensembles that are specifically tailored to a physical question. If model ensembles cannot be constrained by observation, then they should be constructed as cross-generational ensembles, including all available model data to enhance structural model diversity and to better sample the underlying uncertainties. To facilitate this, CMIP should guide the necessarily ongoing process of updating experimental protocols for the evaluation and documentation of coupled models. Finally, with an emphasis on easy access to model data and facilitating the filtering of climate model data across all CMIP generations and experiments, our community could return to the underlying idea of using model data ensembles to improve uncertainty quantification, evaluation, and cross-institutional exchange.« less
Evaluation and Applications of the Prediction of Intensity Model Error (PRIME) Model
NASA Astrophysics Data System (ADS)
Bhatia, K. T.; Nolan, D. S.; Demaria, M.; Schumacher, A.
2015-12-01
Forecasters and end users of tropical cyclone (TC) intensity forecasts would greatly benefit from a reliable expectation of model error to counteract the lack of consistency in TC intensity forecast performance. As a first step towards producing error predictions to accompany each TC intensity forecast, Bhatia and Nolan (2013) studied the relationship between synoptic parameters, TC attributes, and forecast errors. In this study, we build on previous results of Bhatia and Nolan (2013) by testing the ability of the Prediction of Intensity Model Error (PRIME) model to forecast the absolute error and bias of four leading intensity models available for guidance in the Atlantic basin. PRIME forecasts are independently evaluated at each 12-hour interval from 12 to 120 hours during the 2007-2014 Atlantic hurricane seasons. The absolute error and bias predictions of PRIME are compared to their respective climatologies to determine their skill. In addition to these results, we will present the performance of the operational version of PRIME run during the 2015 hurricane season. PRIME verification results show that it can reliably anticipate situations where particular models excel, and therefore could lead to a more informed protocol for hurricane evacuations and storm preparations. These positive conclusions suggest that PRIME forecasts also have the potential to lower the error in the original intensity forecasts of each model. As a result, two techniques are proposed to develop a post-processing procedure for a multimodel ensemble based on PRIME. The first approach is to inverse-weight models using PRIME absolute error predictions (higher predicted absolute error corresponds to lower weights). The second multimodel ensemble applies PRIME bias predictions to each model's intensity forecast and the mean of the corrected models is evaluated. The forecasts of both of these experimental ensembles are compared to those of the equal-weight ICON ensemble, which currently provides the most reliable forecasts in the Atlantic basin.
Methodology for Augmenting Existing Paths with Additional Parallel Transects
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilson, John E.
2013-09-30
Visual Sample Plan (VSP) is sample planning software that is used, among other purposes, to plan transect sampling paths to detect areas that were potentially used for munition training. This module was developed for application on a large site where existing roads and trails were to be used as primary sampling paths. Gap areas between these primary paths needed to found and covered with parallel transect paths. These gap areas represent areas on the site that are more than a specified distance from a primary path. These added parallel paths needed to optionally be connected together into a single path—themore » shortest path possible. The paths also needed to optionally be attached to existing primary paths, again with the shortest possible path. Finally, the process must be repeatable and predictable so that the same inputs (primary paths, specified distance, and path options) will result in the same set of new paths every time. This methodology was developed to meet those specifications.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ullrich, C. A.; Kohn, W.
An electron density distribution n(r) which can be represented by that of a single-determinant ground state of noninteracting electrons in an external potential v(r) is called pure-state v -representable (P-VR). Most physical electronic systems are P-VR. Systems which require a weighted sum of several such determinants to represent their density are called ensemble v -representable (E-VR). This paper develops formal Kohn-Sham equations for E-VR physical systems, using the appropriate coupling constant integration. It also derives local density- and generalized gradient approximations, and conditions and corrections specific to ensembles.
NASA Astrophysics Data System (ADS)
Yuvchenko, S. A.; Ushakova, E. V.; Pavlova, M. V.; Alonova, M. V.; Zimnyakov, D. A.
2018-04-01
We consider the practical realization of a new optical probe method of the random media which is defined as the reference-free path length interferometry with the intensity moments analysis. A peculiarity in the statistics of the spectrally selected fluorescence radiation in laser-pumped dye-doped random medium is discussed. Previously established correlations between the second- and the third-order moments of the intensity fluctuations in the random interference patterns, the coherence function of the probe radiation, and the path difference probability density for the interfering partial waves in the medium are confirmed. The correlations were verified using the statistical analysis of the spectrally selected fluorescence radiation emitted by a laser-pumped dye-doped random medium. Water solution of Rhodamine 6G was applied as the doping fluorescent agent for the ensembles of the densely packed silica grains, which were pumped by the 532 nm radiation of a solid state laser. The spectrum of the mean path length for a random medium was reconstructed.
Scheid, Anika; Nebel, Markus E
2012-07-09
Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case - without sacrificing much of the accuracy of the results. Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms.
2012-01-01
Background Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. Results In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case – without sacrificing much of the accuracy of the results. Conclusions Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms. PMID:22776037
Force Sensor Based Tool Condition Monitoring Using a Heterogeneous Ensemble Learning Model
Wang, Guofeng; Yang, Yinwei; Li, Zhimeng
2014-01-01
Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability. PMID:25405514
The NRL relocatable ocean/acoustic ensemble forecast system
NASA Astrophysics Data System (ADS)
Rowley, C.; Martin, P.; Cummings, J.; Jacobs, G.; Coelho, E.; Bishop, C.; Hong, X.; Peggion, G.; Fabre, J.
2009-04-01
A globally relocatable regional ocean nowcast/forecast system has been developed to support rapid implementation of new regional forecast domains. The system is in operational use at the Naval Oceanographic Office for a growing number of regional and coastal implementations. The new system is the basis for an ocean acoustic ensemble forecast and adaptive sampling capability. We present an overview of the forecast system and the ocean ensemble and adaptive sampling methods. The forecast system consists of core ocean data analysis and forecast modules, software for domain configuration, surface and boundary condition forcing processing, and job control, and global databases for ocean climatology, bathymetry, tides, and river locations and transports. The analysis component is the Navy Coupled Ocean Data Assimilation (NCODA) system, a 3D multivariate optimum interpolation system that produces simultaneous analyses of temperature, salinity, geopotential, and vector velocity using remotely-sensed SST, SSH, and sea ice concentration, plus in situ observations of temperature, salinity, and currents from ships, buoys, XBTs, CTDs, profiling floats, and autonomous gliders. The forecast component is the Navy Coastal Ocean Model (NCOM). The system supports one-way nesting and multiple assimilation methods. The ensemble system uses the ensemble transform technique with error variance estimates from the NCODA analysis to represent initial condition error. Perturbed surface forcing or an atmospheric ensemble is used to represent errors in surface forcing. The ensemble transform Kalman filter is used to assess the impact of adaptive observations on future analysis and forecast uncertainty for both ocean and acoustic properties.
Force sensor based tool condition monitoring using a heterogeneous ensemble learning model.
Wang, Guofeng; Yang, Yinwei; Li, Zhimeng
2014-11-14
Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability.
Abramyan, Tigran M.; Hyde-Volpe, David L.; Stuart, Steven J.; Latour, Robert A.
2017-01-01
The use of standard molecular dynamics simulation methods to predict the interactions of a protein with a material surface have the inherent limitations of lacking the ability to determine the most likely conformations and orientations of the adsorbed protein on the surface and to determine the level of convergence attained by the simulation. In addition, standard mixing rules are typically applied to combine the nonbonded force field parameters of the solution and solid phases the system to represent interfacial behavior without validation. As a means to circumvent these problems, the authors demonstrate the application of an efficient advanced sampling method (TIGER2A) for the simulation of the adsorption of hen egg-white lysozyme on a crystalline (110) high-density polyethylene surface plane. Simulations are conducted to generate a Boltzmann-weighted ensemble of sampled states using force field parameters that were validated to represent interfacial behavior for this system. The resulting ensembles of sampled states were then analyzed using an in-house-developed cluster analysis method to predict the most probable orientations and conformations of the protein on the surface based on the amount of sampling performed, from which free energy differences between the adsorbed states were able to be calculated. In addition, by conducting two independent sets of TIGER2A simulations combined with cluster analyses, the authors demonstrate a method to estimate the degree of convergence achieved for a given amount of sampling. The results from these simulations demonstrate that these methods enable the most probable orientations and conformations of an adsorbed protein to be predicted and that the use of our validated interfacial force field parameter set provides closer agreement to available experimental results compared to using standard CHARMM force field parameterization to represent molecular behavior at the interface. PMID:28514864
Calibration of neural networks using genetic algorithms, with application to optimal path planning
NASA Technical Reports Server (NTRS)
Smith, Terence R.; Pitney, Gilbert A.; Greenwood, Daniel
1987-01-01
Genetic algorithms (GA) are used to search the synaptic weight space of artificial neural systems (ANS) for weight vectors that optimize some network performance function. GAs do not suffer from some of the architectural constraints involved with other techniques and it is straightforward to incorporate terms into the performance function concerning the metastructure of the ANS. Hence GAs offer a remarkably general approach to calibrating ANS. GAs are applied to the problem of calibrating an ANS that finds optimal paths over a given surface. This problem involves training an ANS on a relatively small set of paths and then examining whether the calibrated ANS is able to find good paths between arbitrary start and end points on the surface.
Zerbino, Daniel R.; Johnson, Nathan; Juetteman, Thomas; Sheppard, Dan; Wilder, Steven P.; Lavidas, Ilias; Nuhn, Michael; Perry, Emily; Raffaillac-Desfosses, Quentin; Sobral, Daniel; Keefe, Damian; Gräf, Stefan; Ahmed, Ikhlak; Kinsella, Rhoda; Pritchard, Bethan; Brent, Simon; Amode, Ridwan; Parker, Anne; Trevanion, Steven; Birney, Ewan; Dunham, Ian; Flicek, Paul
2016-01-01
New experimental techniques in epigenomics allow researchers to assay a diversity of highly dynamic features such as histone marks, DNA modifications or chromatin structure. The study of their fluctuations should provide insights into gene expression regulation, cell differentiation and disease. The Ensembl project collects and maintains the Ensembl regulation data resources on epigenetic marks, transcription factor binding and DNA methylation for human and mouse, as well as microarray probe mappings and annotations for a variety of chordate genomes. From this data, we produce a functional annotation of the regulatory elements along the human and mouse genomes with plans to expand to other species as data becomes available. Starting from well-studied cell lines, we will progressively expand our library of measurements to a greater variety of samples. Ensembl’s regulation resources provide a central and easy-to-query repository for reference epigenomes. As with all Ensembl data, it is freely available at http://www.ensembl.org, from the Perl and REST APIs and from the public Ensembl MySQL database server at ensembldb.ensembl.org. Database URL: http://www.ensembl.org PMID:26888907
Quality of service routing in wireless ad hoc networks
NASA Astrophysics Data System (ADS)
Sane, Sachin J.; Patcha, Animesh; Mishra, Amitabh
2003-08-01
An efficient routing protocol is essential to guarantee application level quality of service running on wireless ad hoc networks. In this paper we propose a novel routing algorithm that computes a path between a source and a destination by considering several important constraints such as path-life, availability of sufficient energy as well as buffer space in each of the nodes on the path between the source and destination. The algorithm chooses the best path from among the multiples paths that it computes between two endpoints. We consider the use of control packets that run at a priority higher than the data packets in determining the multiple paths. The paper also examines the impact of different schedulers such as weighted fair queuing, and weighted random early detection among others in preserving the QoS level guarantees. Our extensive simulation results indicate that the algorithm improves the overall lifetime of a network, reduces the number of dropped packets, and decreases the end-to-end delay for real-time voice application.
Hefron, Ryan; Borghetti, Brett; Schubert Kabban, Christine; Christensen, James; Estepp, Justin
2018-04-26
Applying deep learning methods to electroencephalograph (EEG) data for cognitive state assessment has yielded improvements over previous modeling methods. However, research focused on cross-participant cognitive workload modeling using these techniques is underrepresented. We study the problem of cross-participant state estimation in a non-stimulus-locked task environment, where a trained model is used to make workload estimates on a new participant who is not represented in the training set. Using experimental data from the Multi-Attribute Task Battery (MATB) environment, a variety of deep neural network models are evaluated in the trade-space of computational efficiency, model accuracy, variance and temporal specificity yielding three important contributions: (1) The performance of ensembles of individually-trained models is statistically indistinguishable from group-trained methods at most sequence lengths. These ensembles can be trained for a fraction of the computational cost compared to group-trained methods and enable simpler model updates. (2) While increasing temporal sequence length improves mean accuracy, it is not sufficient to overcome distributional dissimilarities between individuals’ EEG data, as it results in statistically significant increases in cross-participant variance. (3) Compared to all other networks evaluated, a novel convolutional-recurrent model using multi-path subnetworks and bi-directional, residual recurrent layers resulted in statistically significant increases in predictive accuracy and decreases in cross-participant variance.
Hefron, Ryan; Borghetti, Brett; Schubert Kabban, Christine; Christensen, James; Estepp, Justin
2018-01-01
Applying deep learning methods to electroencephalograph (EEG) data for cognitive state assessment has yielded improvements over previous modeling methods. However, research focused on cross-participant cognitive workload modeling using these techniques is underrepresented. We study the problem of cross-participant state estimation in a non-stimulus-locked task environment, where a trained model is used to make workload estimates on a new participant who is not represented in the training set. Using experimental data from the Multi-Attribute Task Battery (MATB) environment, a variety of deep neural network models are evaluated in the trade-space of computational efficiency, model accuracy, variance and temporal specificity yielding three important contributions: (1) The performance of ensembles of individually-trained models is statistically indistinguishable from group-trained methods at most sequence lengths. These ensembles can be trained for a fraction of the computational cost compared to group-trained methods and enable simpler model updates. (2) While increasing temporal sequence length improves mean accuracy, it is not sufficient to overcome distributional dissimilarities between individuals’ EEG data, as it results in statistically significant increases in cross-participant variance. (3) Compared to all other networks evaluated, a novel convolutional-recurrent model using multi-path subnetworks and bi-directional, residual recurrent layers resulted in statistically significant increases in predictive accuracy and decreases in cross-participant variance. PMID:29701668
Genetic programming based ensemble system for microarray data classification.
Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To
2015-01-01
Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved.
Genetic Programming Based Ensemble System for Microarray Data Classification
Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To
2015-01-01
Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved. PMID:25810748
Short paths in expander graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kleinberg, J.; Rubinfeld, R.
Graph expansion has proved to be a powerful general tool for analyzing the behavior of routing algorithms and the interconnection networks on which they run. We develop new routing algorithms and structural results for bounded-degree expander graphs. Our results are unified by the fact that they are all based upon, and extend, a body of work asserting that expanders are rich in short, disjoint paths. In particular, our work has consequences for the disjoint paths problem, multicommodify flow, and graph minor containment. We show: (i) A greedy algorithm for approximating the maximum disjoint paths problem achieves a polylogarithmic approximation ratiomore » in bounded-degree expanders. Although our algorithm is both deterministic and on-line, its performance guarantee is an improvement over previous bounds in expanders. (ii) For a multicommodily flow problem with arbitrary demands on a bounded-degree expander, there is a (1 + {epsilon})-optimal solution using only flow paths of polylogarithmic length. It follows that the multicommodity flow algorithm of Awerbuch and Leighton runs in nearly linear time per commodity in expanders. Our analysis is based on establishing the following: given edge weights on an expander G, one can increase some of the weights very slightly so the resulting shortest-path metric is smooth - the min-weight path between any pair of nodes uses a polylogarithmic number of edges. (iii) Every bounded-degree expander on n nodes contains every graph with O(n/log{sup O(1)} n) nodes and edges as a minor.« less
NASA Astrophysics Data System (ADS)
Saleh, F.; Ramaswamy, V.; Georgas, N.; Blumberg, A. F.; Wang, Y.
2016-12-01
Advances in computational resources and modeling techniques are opening the path to effectively integrate existing complex models. In the context of flood prediction, recent extreme events have demonstrated the importance of integrating components of the hydrosystem to better represent the interactions amongst different physical processes and phenomena. As such, there is a pressing need to develop holistic and cross-disciplinary modeling frameworks that effectively integrate existing models and better represent the operative dynamics. This work presents a novel Hydrologic-Hydraulic-Hydrodynamic Ensemble (H3E) flood prediction framework that operationally integrates existing predictive models representing coastal (New York Harbor Observing and Prediction System, NYHOPS), hydrologic (US Army Corps of Engineers Hydrologic Modeling System, HEC-HMS) and hydraulic (2-dimensional River Analysis System, HEC-RAS) components. The state-of-the-art framework is forced with 125 ensemble meteorological inputs from numerical weather prediction models including the Global Ensemble Forecast System, the European Centre for Medium-Range Weather Forecasts (ECMWF), the Canadian Meteorological Centre (CMC), the Short Range Ensemble Forecast (SREF) and the North American Mesoscale Forecast System (NAM). The framework produces, within a 96-hour forecast horizon, on-the-fly Google Earth flood maps that provide critical information for decision makers and emergency preparedness managers. The utility of the framework was demonstrated by retrospectively forecasting an extreme flood event, hurricane Sandy in the Passaic and Hackensack watersheds (New Jersey, USA). Hurricane Sandy caused significant damage to a number of critical facilities in this area including the New Jersey Transit's main storage and maintenance facility. The results of this work demonstrate that ensemble based frameworks provide improved flood predictions and useful information about associated uncertainties, thus improving the assessment of risks as when compared to a deterministic forecast. The work offers perspectives for short-term flood forecasts, flood mitigation strategies and best management practices for climate change scenarios.
Quantum storage of orbital angular momentum entanglement in an atomic ensemble.
Ding, Dong-Sheng; Zhang, Wei; Zhou, Zhi-Yuan; Shi, Shuai; Xiang, Guo-Yong; Wang, Xi-Shi; Jiang, Yun-Kun; Shi, Bao-Sen; Guo, Guang-Can
2015-02-06
Constructing a quantum memory for a photonic entanglement is vital for realizing quantum communication and network. Because of the inherent infinite dimension of orbital angular momentum (OAM), the photon's OAM has the potential for encoding a photon in a high-dimensional space, enabling the realization of high channel capacity communication. Photons entangled in orthogonal polarizations or optical paths had been stored in a different system, but there have been no reports on the storage of a photon pair entangled in OAM space. Here, we report the first experimental realization of storing an entangled OAM state through the Raman protocol in a cold atomic ensemble. We reconstruct the density matrix of an OAM entangled state with a fidelity of 90.3%±0.8% and obtain the Clauser-Horne-Shimony-Holt inequality parameter S of 2.41±0.06 after a programed storage time. All results clearly show the preservation of entanglement during the storage.
NASA Astrophysics Data System (ADS)
Multsch, S.; Exbrayat, J.-F.; Kirby, M.; Viney, N. R.; Frede, H.-G.; Breuer, L.
2015-04-01
Irrigation agriculture plays an increasingly important role in food supply. Many evapotranspiration models are used today to estimate the water demand for irrigation. They consider different stages of crop growth by empirical crop coefficients to adapt evapotranspiration throughout the vegetation period. We investigate the importance of the model structural versus model parametric uncertainty for irrigation simulations by considering six evapotranspiration models and five crop coefficient sets to estimate irrigation water requirements for growing wheat in the Murray-Darling Basin, Australia. The study is carried out using the spatial decision support system SPARE:WATER. We find that structural model uncertainty among reference ET is far more important than model parametric uncertainty introduced by crop coefficients. These crop coefficients are used to estimate irrigation water requirement following the single crop coefficient approach. Using the reliability ensemble averaging (REA) technique, we are able to reduce the overall predictive model uncertainty by more than 10%. The exceedance probability curve of irrigation water requirements shows that a certain threshold, e.g. an irrigation water limit due to water right of 400 mm, would be less frequently exceeded in case of the REA ensemble average (45%) in comparison to the equally weighted ensemble average (66%). We conclude that multi-model ensemble predictions and sophisticated model averaging techniques are helpful in predicting irrigation demand and provide relevant information for decision making.
Multidimensional generalized-ensemble algorithms for complex systems.
Mitsutake, Ayori; Okamoto, Yuko
2009-06-07
We give general formulations of the multidimensional multicanonical algorithm, simulated tempering, and replica-exchange method. We generalize the original potential energy function E(0) by adding any physical quantity V of interest as a new energy term. These multidimensional generalized-ensemble algorithms then perform a random walk not only in E(0) space but also in V space. Among the three algorithms, the replica-exchange method is the easiest to perform because the weight factor is just a product of regular Boltzmann-like factors, while the weight factors for the multicanonical algorithm and simulated tempering are not a priori known. We give a simple procedure for obtaining the weight factors for these two latter algorithms, which uses a short replica-exchange simulation and the multiple-histogram reweighting techniques. As an example of applications of these algorithms, we have performed a two-dimensional replica-exchange simulation and a two-dimensional simulated-tempering simulation using an alpha-helical peptide system. From these simulations, we study the helix-coil transitions of the peptide in gas phase and in aqueous solution.
Wu, Xiongwu; Damjanovic, Ana; Brooks, Bernard R.
2013-01-01
This review provides a comprehensive description of the self-guided Langevin dynamics (SGLD) and the self-guided molecular dynamics (SGMD) methods and their applications. Example systems are included to provide guidance on optimal application of these methods in simulation studies. SGMD/SGLD has enhanced ability to overcome energy barriers and accelerate rare events to affordable time scales. It has been demonstrated that with moderate parameters, SGLD can routinely cross energy barriers of 20 kT at a rate that molecular dynamics (MD) or Langevin dynamics (LD) crosses 10 kT barriers. The core of these methods is the use of local averages of forces and momenta in a direct manner that can preserve the canonical ensemble. The use of such local averages results in methods where low frequency motion “borrows” energy from high frequency degrees of freedom when a barrier is approached and then returns that excess energy after a barrier is crossed. This self-guiding effect also results in an accelerated diffusion to enhance conformational sampling efficiency. The resulting ensemble with SGLD deviates in a small way from the canonical ensemble, and that deviation can be corrected with either an on-the-fly or a post processing reweighting procedure that provides an excellent canonical ensemble for systems with a limited number of accelerated degrees of freedom. Since reweighting procedures are generally not size extensive, a newer method, SGLDfp, uses local averages of both momenta and forces to preserve the ensemble without reweighting. The SGLDfp approach is size extensive and can be used to accelerate low frequency motion in large systems, or in systems with explicit solvent where solvent diffusion is also to be enhanced. Since these methods are direct and straightforward, they can be used in conjunction with many other sampling methods or free energy methods by simply replacing the integration of degrees of freedom that are normally sampled by MD or LD. PMID:23913991
Two Upper Bounds for the Weighted Path Length of Binary Trees. Report No. UIUCDCS-R-73-565.
ERIC Educational Resources Information Center
Pradels, Jean Louis
Rooted binary trees with weighted nodes are structures encountered in many areas, such as coding theory, searching and sorting, information storage and retrieval. The path length is a meaningful quantity which gives indications about the expected time of a search or the length of a code, for example. In this paper, two sharp bounds for the total…
Enzymatic reaction paths as determined by transition path sampling
NASA Astrophysics Data System (ADS)
Masterson, Jean Emily
Enzymes are biological catalysts capable of enhancing the rates of chemical reactions by many orders of magnitude as compared to solution chemistry. Since the catalytic power of enzymes routinely exceeds that of the best artificial catalysts available, there is much interest in understanding the complete nature of chemical barrier crossing in enzymatic reactions. Two specific questions pertaining to the source of enzymatic rate enhancements are investigated in this work. The first is the issue of how fast protein motions of an enzyme contribute to chemical barrier crossing. Our group has previously identified sub-picosecond protein motions, termed promoting vibrations (PVs), that dynamically modulate chemical transformation in several enzymes. In the case of human heart lactate dehydrogenase (hhLDH), prior studies have shown that a specific axis of residues undergoes a compressional fluctuation towards the active site, decreasing a hydride and a proton donor--acceptor distance on a sub-picosecond timescale to promote particle transfer. To more thoroughly understand the contribution of this dynamic motion to the enzymatic reaction coordinate of hhLDH, we conducted transition path sampling (TPS) using four versions of the enzymatic system: a wild type enzyme with natural isotopic abundance; a heavy enzyme where all the carbons, nitrogens, and non-exchangeable hydrogens were replaced with heavy isotopes; and two versions of the enzyme with mutations in the axis of PV residues. We generated four separate ensembles of reaction paths and analyzed each in terms of the reaction mechanism, time of barrier crossing, dynamics of the PV, and residues involved in the enzymatic reaction coordinate. We found that heavy isotopic substitution of hhLDH altered the sub-picosecond dynamics of the PV, changed the favored reaction mechanism, dramatically increased the time of barrier crossing, but did not have an effect on the specific residues involved in the PV. In the mutant systems, we observed changes in the reaction mechanism and altered contributions of the mutated residues to the enzymatic reaction coordinate, but we did not detect a substantial change in the time of barrier crossing. These results confirm the importance of maintaining the dynamics and structural scaffolding of the hhLDH PV in order to facilitate facile barrier passage. We also utilized TPS to investigate the possible role of fast protein dynamics in the enzymatic reaction coordinate of human dihydrofolate reductase (hsDHFR). We found that sub-picosecond dynamics of hsDHFR do contribute to the reaction coordinate, whereas this is not the case in the E. coli version of the enzyme. This result indicates a shift in the DHFR family to a more dynamic version of catalysis. The second inquiry we addressed in this thesis regarding enzymatic barrier passage concerns the variability of paths through reactive phase space for a given enzymatic reaction. We further investigated the hhLDH-catalyzed reaction using a high-perturbation TPS algorithm. Though we saw that alternate reaction paths were possible, the dominant reaction path we observed corresponded to that previously elucidated in prior hhLDH TPS studies. Since the additional reaction paths we observed were likely high-energy, these results indicate that only the dominant reaction path contributes significantly to the overall reaction rate. In conclusion, we show that the enzymes hhLDH and hsDHFR exhibit paths through reactive phase space where fast protein motions are involved in the enzymatic reaction coordinate and exhibit a non-negligible contribution to chemical barrier crossing.
NASA Astrophysics Data System (ADS)
Colorado, G.; Salinas, J. A.; Cavazos, T.; de Grau, P.
2013-05-01
15 CMIP5 GCMs precipitation simulations were combined in a weighted ensemble using the Reliable Ensemble Averaging (REA) method, obtaining the weight of each model. This was done for a historical period (1961-2000) and for the future emissions based on low (RCP4.5) and high (RCP8.5) radiating forcing for the period 2075-2099. The annual cycle of simple ensemble of the historical GCMs simulations, the historical REA average and the Climate Research Unit (CRU TS3.1) database was compared in four zones of México. In the case of precipitation we can see the improvements by using the REA method, especially in the two northern zones of México where the REA average is more close to the observations (CRU) that the simple average. However in the southern zones although there is an improvement it is not as good as it is in the north, particularly in the southeast where instead of the REA average is able to reproduce qualitatively good the annual cycle with the mid-summer drought it was greatly underestimated. The main reason is because the precipitation is underestimated for all the models and the mid-summer drought do not even exists in some models. In the REA average of the future scenarios, as we can expected, the most drastic decrease in precipitation was simulated using the RCP8.5 especially in the monsoon area and in the south of Mexico in summer and in winter. In the center and southern of Mexico however, the same scenario in autumn simulates an increase of precipitation.
Nodal distances for rooted phylogenetic trees.
Cardona, Gabriel; Llabrés, Mercè; Rosselló, Francesc; Valiente, Gabriel
2010-08-01
Dissimilarity measures for (possibly weighted) phylogenetic trees based on the comparison of their vectors of path lengths between pairs of taxa, have been present in the systematics literature since the early seventies. For rooted phylogenetic trees, however, these vectors can only separate non-weighted binary trees, and therefore these dissimilarity measures are metrics only on this class of rooted phylogenetic trees. In this paper we overcome this problem, by splitting in a suitable way each path length between two taxa into two lengths. We prove that the resulting splitted path lengths matrices single out arbitrary rooted phylogenetic trees with nested taxa and arcs weighted in the set of positive real numbers. This allows the definition of metrics on this general class of rooted phylogenetic trees by comparing these matrices through metrics in spaces M(n)(R) of real-valued n x n matrices. We conclude this paper by establishing some basic facts about the metrics for non-weighted phylogenetic trees defined in this way using L(p) metrics on M(n)(R), with p [epsilon] R(>0).
Polarized ensembles of random pure states
NASA Astrophysics Data System (ADS)
Deelan Cunden, Fabio; Facchi, Paolo; Florio, Giuseppe
2013-08-01
A new family of polarized ensembles of random pure states is presented. These ensembles are obtained by linear superposition of two random pure states with suitable distributions, and are quite manageable. We will use the obtained results for two purposes: on the one hand we will be able to derive an efficient strategy for sampling states from isopurity manifolds. On the other, we will characterize the deviation of a pure quantum state from separability under the influence of noise.
NASA Astrophysics Data System (ADS)
Feng, Xiangbo; Haines, Keith
2017-04-01
ECMWF has produced its first ensemble ocean-atmosphere coupled reanalysis, the 20th century Coupled ECMWF ReAnalysis (CERA-20C), with 10 ensemble members at 3-hour resolution. Here the analysis uncertainties (ensemble spread) of lower atmospheric variables and sea surface temperature (SST), and their correlations, are quantified on diurnal, seasonal and longer timescales. The 2-m air temperature (T2m) spread is always larger than the SST spread at high-frequencies, but smaller on monthly timescales, except in deep convection areas, indicating increasing SST control at longer timescales. Spatially the T2m-SST ensemble correlations are the strongest where ocean mixed layers are shallow and can respond to atmospheric variability. Where atmospheric convection is strong with a deep precipitating boundary layer, T2m-SST correlations are greatly reduced. As the 20th-century progresses more observations become available, and ensemble spreads decline at all variability timescales. The T2m-SST correlations increase through the 20th-century, except in the tropics. As winds become better constrained over the oceans with less spread, T2m-SST become more correlated. In the tropics, strong ENSO-related inter-annual variability is found in the correlations, as atmospheric convection centres move. These ensemble spreads have been used to provide background errors for the assimilation throughout the reanalysis, have implications for the weights given to observations, and are a general measure of the uncertainties in the analysed product. Although cross boundary covariances are not currently used, they offer considerable potential for strengthening the ocean-atmosphere coupling in future reanalyses.
Understanding the Structural Ensembles of a Highly Extended Disordered Protein†
Daughdrill, Gary W.; Kashtanov, Stepan; Stancik, Amber; Hill, Shannon E.; Helms, Gregory; Muschol, Martin
2013-01-01
Developing a comprehensive description of the equilibrium structural ensembles for intrinsically disordered proteins (IDPs) is essential to understanding their function. The p53 transactivation domain (p53TAD) is an IDP that interacts with multiple protein partners and contains numerous phosphorylation sites. Multiple techniques were used to investigate the equilibrium structural ensemble of p53TAD in its native and chemically unfolded states. The results from these experiments show that the native state of p53TAD has dimensions similar to a classical random coil while the chemically unfolded state is more extended. To investigate the molecular properties responsible for this behavior, a novel algorithm that generates diverse and unbiased structural ensembles of IDPs was developed. This algorithm was used to generate a large pool of plausible p53TAD structures that were reweighted to identify a subset of structures with the best fit to small angle X-ray scattering data. High weight structures in the native state ensemble show features that are localized to protein binding sites and regions with high proline content. The features localized to the protein binding sites are mostly eliminated in the chemically unfolded ensemble; while, the regions with high proline content remain relatively unaffected. Data from NMR experiments support these results, showing that residues from the protein binding sites experience larger environmental changes upon unfolding by urea than regions with high proline content. This behavior is consistent with the urea-induced exposure of nonpolar and aromatic side-chains in the protein binding sites that are partially excluded from solvent in the native state ensemble. PMID:21979461
Landsgesell, Jonas; Holm, Christian; Smiatek, Jens
2017-02-14
We present a novel method for the study of weak polyelectrolytes and general acid-base reactions in molecular dynamics and Monte Carlo simulations. The approach combines the advantages of the reaction ensemble and the Wang-Landau sampling method. Deprotonation and protonation reactions are simulated explicitly with the help of the reaction ensemble method, while the accurate sampling of the corresponding phase space is achieved by the Wang-Landau approach. The combination of both techniques provides a sufficient statistical accuracy such that meaningful estimates for the density of states and the partition sum can be obtained. With regard to these estimates, several thermodynamic observables like the heat capacity or reaction free energies can be calculated. We demonstrate that the computation times for the calculation of titration curves with a high statistical accuracy can be significantly decreased when compared to the original reaction ensemble method. The applicability of our approach is validated by the study of weak polyelectrolytes and their thermodynamic properties.
Flight-Path Characteristics for Decelerating From Supercircular Speed
NASA Technical Reports Server (NTRS)
Luidens, Roger W.
1961-01-01
Characteristics of the following six flight paths for decelerating from a supercircular speed are developed in closed form: constant angle of attack, constant net acceleration, constant altitude" constant free-stream Reynolds number, and "modulated roll." The vehicles were required to remain in or near the atmosphere, and to stay within the aerodynamic capabilities of a vehicle with a maximum lift-drag ratio of 1.0 and within a maximum net acceleration G of 10 g's. The local Reynolds number for all the flight paths for a vehicle with a gross weight of 10,000 pounds and a 600 swept wing was found to be about 0.7 x 10(exp 6). With the assumption of a laminar boundary layer, the heating of the vehicle is studied as a function of type of flight path, initial G load, and initial velocity. The following heating parameters were considered: the distribution of the heating rate over the vehicle, the distribution of the heat per square foot over the vehicle, and the total heat input to the vehicle. The constant G load path at limiting G was found to give the lowest total heat input for a given initial velocity. For a vehicle with a maximum lift-drag ratio of 1.0 and a flight path with a maximum G of 10 g's, entry velocities of twice circular appear thermo- dynamically feasible, and entries at velocities of 2.8 times circular are aerodynamically possible. The predominant heating (about 85 percent) occurs at the leading edge of the vehicle. The total ablated weight for a 10,000-pound-gross-weight vehicle decelerating from an initial velocity of twice circular velocity is estimated to be 5 percent of gross weight. Modifying the constant G load flight path by a constant-angle-of-attack segment through a flight- to circular-velocity ratio of 1.0 gives essentially a "point landing" capability but also results in an increased total heat input to the vehicle.
Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo
2016-01-01
Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble’s output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) − k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer’s disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases. PMID:26764911
Equilibrium energy spectrum of point vortex motion with remarks on ensemble choice and ergodicity
NASA Astrophysics Data System (ADS)
Esler, J. G.
2017-01-01
The dynamics and statistical mechanics of N chaotically evolving point vortices in the doubly periodic domain are revisited. The selection of the correct microcanonical ensemble for the system is first investigated. The numerical results of Weiss and McWilliams [Phys. Fluids A 3, 835 (1991), 10.1063/1.858014], who argued that the point vortex system with N =6 is nonergodic because of an apparent discrepancy between ensemble averages and dynamical time averages, are shown to be due to an incorrect ensemble definition. When the correct microcanonical ensemble is sampled, accounting for the vortex momentum constraint, time averages obtained from direct numerical simulation agree with ensemble averages within the sampling error of each calculation, i.e., there is no numerical evidence for nonergodicity. Further, in the N →∞ limit it is shown that the vortex momentum no longer constrains the long-time dynamics and therefore that the correct microcanonical ensemble for statistical mechanics is that associated with the entire constant energy hypersurface in phase space. Next, a recently developed technique is used to generate an explicit formula for the density of states function for the system, including for arbitrary distributions of vortex circulations. Exact formulas for the equilibrium energy spectrum, and for the probability density function of the energy in each Fourier mode, are then obtained. Results are compared with a series of direct numerical simulations with N =50 and excellent agreement is found, confirming the relevance of the results for interpretation of quantum and classical two-dimensional turbulence.
Shallow cumuli ensemble statistics for development of a stochastic parameterization
NASA Astrophysics Data System (ADS)
Sakradzija, Mirjana; Seifert, Axel; Heus, Thijs
2014-05-01
According to a conventional deterministic approach to the parameterization of moist convection in numerical atmospheric models, a given large scale forcing produces an unique response from the unresolved convective processes. This representation leaves out the small-scale variability of convection, as it is known from the empirical studies of deep and shallow convective cloud ensembles, there is a whole distribution of sub-grid states corresponding to the given large scale forcing. Moreover, this distribution gets broader with the increasing model resolution. This behavior is also consistent with our theoretical understanding of a coarse-grained nonlinear system. We propose an approach to represent the variability of the unresolved shallow-convective states, including the dependence of the sub-grid states distribution spread and shape on the model horizontal resolution. Starting from the Gibbs canonical ensemble theory, Craig and Cohen (2006) developed a theory for the fluctuations in a deep convective ensemble. The micro-states of a deep convective cloud ensemble are characterized by the cloud-base mass flux, which, according to the theory, is exponentially distributed (Boltzmann distribution). Following their work, we study the shallow cumulus ensemble statistics and the distribution of the cloud-base mass flux. We employ a Large-Eddy Simulation model (LES) and a cloud tracking algorithm, followed by a conditional sampling of clouds at the cloud base level, to retrieve the information about the individual cloud life cycles and the cloud ensemble as a whole. In the case of shallow cumulus cloud ensemble, the distribution of micro-states is a generalized exponential distribution. Based on the empirical and theoretical findings, a stochastic model has been developed to simulate the shallow convective cloud ensemble and to test the convective ensemble theory. Stochastic model simulates a compound random process, with the number of convective elements drawn from a Poisson distribution, and cloud properties sub-sampled from a generalized ensemble distribution. We study the role of the different cloud subtypes in a shallow convective ensemble and how the diverse cloud properties and cloud lifetimes affect the system macro-state. To what extent does the cloud-base mass flux distribution deviate from the simple Boltzmann distribution and how does it affect the results from the stochastic model? Is the memory, provided by the finite lifetime of individual clouds, of importance for the ensemble statistics? We also test for the minimal information given as an input to the stochastic model, able to reproduce the ensemble mean statistics and the variability in a convective ensemble. An important property of the resulting distribution of the sub-grid convective states is its scale-adaptivity - the smaller the grid-size, the broader the compound distribution of the sub-grid states.
Ensemble Pruning for Glaucoma Detection in an Unbalanced Data Set.
Adler, Werner; Gefeller, Olaf; Gul, Asma; Horn, Folkert K; Khan, Zardad; Lausen, Berthold
2016-12-07
Random forests are successful classifier ensemble methods consisting of typically 100 to 1000 classification trees. Ensemble pruning techniques reduce the computational cost, especially the memory demand, of random forests by reducing the number of trees without relevant loss of performance or even with increased performance of the sub-ensemble. The application to the problem of an early detection of glaucoma, a severe eye disease with low prevalence, based on topographical measurements of the eye background faces specific challenges. We examine the performance of ensemble pruning strategies for glaucoma detection in an unbalanced data situation. The data set consists of 102 topographical features of the eye background of 254 healthy controls and 55 glaucoma patients. We compare the area under the receiver operating characteristic curve (AUC), and the Brier score on the total data set, in the majority class, and in the minority class of pruned random forest ensembles obtained with strategies based on the prediction accuracy of greedily grown sub-ensembles, the uncertainty weighted accuracy, and the similarity between single trees. To validate the findings and to examine the influence of the prevalence of glaucoma in the data set, we additionally perform a simulation study with lower prevalences of glaucoma. In glaucoma classification all three pruning strategies lead to improved AUC and smaller Brier scores on the total data set with sub-ensembles as small as 30 to 80 trees compared to the classification results obtained with the full ensemble consisting of 1000 trees. In the simulation study, we were able to show that the prevalence of glaucoma is a critical factor and lower prevalence decreases the performance of our pruning strategies. The memory demand for glaucoma classification in an unbalanced data situation based on random forests could effectively be reduced by the application of pruning strategies without loss of performance in a population with increased risk of glaucoma.
Monte Carlo replica-exchange based ensemble docking of protein conformations.
Zhang, Zhe; Ehmann, Uwe; Zacharias, Martin
2017-05-01
A replica-exchange Monte Carlo (REMC) ensemble docking approach has been developed that allows efficient exploration of protein-protein docking geometries. In addition to Monte Carlo steps in translation and orientation of binding partners, possible conformational changes upon binding are included based on Monte Carlo selection of protein conformations stored as ordered pregenerated conformational ensembles. The conformational ensembles of each binding partner protein were generated by three different approaches starting from the unbound partner protein structure with a range spanning a root mean square deviation of 1-2.5 Å with respect to the unbound structure. Because MC sampling is performed to select appropriate partner conformations on the fly the approach is not limited by the number of conformations in the ensemble compared to ensemble docking of each conformer pair in ensemble cross docking. Although only a fraction of generated conformers was in closer agreement with the bound structure the REMC ensemble docking approach achieved improved docking results compared to REMC docking with only the unbound partner structures or using docking energy minimization methods. The approach has significant potential for further improvement in combination with more realistic structural ensembles and better docking scoring functions. Proteins 2017; 85:924-937. © 2016 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Modality-Driven Classification and Visualization of Ensemble Variance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bensema, Kevin; Gosink, Luke; Obermaier, Harald
Advances in computational power now enable domain scientists to address conceptual and parametric uncertainty by running simulations multiple times in order to sufficiently sample the uncertain input space. While this approach helps address conceptual and parametric uncertainties, the ensemble datasets produced by this technique present a special challenge to visualization researchers as the ensemble dataset records a distribution of possible values for each location in the domain. Contemporary visualization approaches that rely solely on summary statistics (e.g., mean and variance) cannot convey the detailed information encoded in ensemble distributions that are paramount to ensemble analysis; summary statistics provide no informationmore » about modality classification and modality persistence. To address this problem, we propose a novel technique that classifies high-variance locations based on the modality of the distribution of ensemble predictions. Additionally, we develop a set of confidence metrics to inform the end-user of the quality of fit between the distribution at a given location and its assigned class. We apply a similar method to time-varying ensembles to illustrate the relationship between peak variance and bimodal or multimodal behavior. These classification schemes enable a deeper understanding of the behavior of the ensemble members by distinguishing between distributions that can be described by a single tendency and distributions which reflect divergent trends in the ensemble.« less
Liquid Water from First Principles: Validation of Different Sampling Approaches
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mundy, C J; Kuo, W; Siepmann, J
2004-05-20
A series of first principles molecular dynamics and Monte Carlo simulations were carried out for liquid water to assess the validity and reproducibility of different sampling approaches. These simulations include Car-Parrinello molecular dynamics simulations using the program CPMD with different values of the fictitious electron mass in the microcanonical and canonical ensembles, Born-Oppenheimer molecular dynamics using the programs CPMD and CP2K in the microcanonical ensemble, and Metropolis Monte Carlo using CP2K in the canonical ensemble. With the exception of one simulation for 128 water molecules, all other simulations were carried out for systems consisting of 64 molecules. It is foundmore » that the structural and thermodynamic properties of these simulations are in excellent agreement with each other as long as adiabatic sampling is maintained in the Car-Parrinello molecular dynamics simulations either by choosing a sufficiently small fictitious mass in the microcanonical ensemble or by Nos{acute e}-Hoover thermostats in the canonical ensemble. Using the Becke-Lee-Yang-Parr exchange and correlation energy functionals and norm-conserving Troullier-Martins or Goedecker-Teter-Hutter pseudopotentials, simulations at a fixed density of 1.0 g/cm{sup 3} and a temperature close to 315 K yield a height of the first peak in the oxygen-oxygen radial distribution function of about 3.0, a classical constant-volume heat capacity of about 70 J K{sup -1} mol{sup -1}, and a self-diffusion constant of about 0.1 Angstroms{sup 2}/ps.« less
Self-averaging and weak ergodicity breaking of diffusion in heterogeneous media
NASA Astrophysics Data System (ADS)
Russian, Anna; Dentz, Marco; Gouze, Philippe
2017-08-01
Diffusion in natural and engineered media is quantified in terms of stochastic models for the heterogeneity-induced fluctuations of particle motion. However, fundamental properties such as ergodicity and self-averaging and their dependence on the disorder distribution are often not known. Here, we investigate these questions for diffusion in quenched disordered media characterized by spatially varying retardation properties, which account for particle retention due to physical or chemical interactions with the medium. We link self-averaging and ergodicity to the disorder sampling efficiency Rn, which quantifies the number of disorder realizations a noise ensemble may sample in a single disorder realization. Diffusion for disorder scenarios characterized by a finite mean transition time is ergodic and self-averaging for any dimension. The strength of the sample to sample fluctuations decreases with increasing spatial dimension. For an infinite mean transition time, particle motion is weakly ergodicity breaking in any dimension because single particles cannot sample the heterogeneity spectrum in finite time. However, even though the noise ensemble is not representative of the single-particle time statistics, subdiffusive motion in q ≥2 dimensions is self-averaging, which means that the noise ensemble in a single realization samples a representative part of the heterogeneity spectrum.
Locci, Antonio Mario; Cincotti, Alberto; Todde, Sara; Orrù, Roberto; Cao, Giacomo
2010-01-01
A novel methodology is proposed for investigating the effect of the pulsed electric current during the spark plasma sintering (SPS) of electrically conductive powders without potential misinterpretation of experimental results. First, ensemble configurations (geometry, size and material of the powder sample, die, plunger and spacers) are identified where the electric current is forced to flow only through either the sample or the die, so that the sample is heated either through the Joule effect or by thermal conduction, respectively. These ensemble configurations are selected using a recently proposed mathematical model of an SPS apparatus, which, once suitably modified, makes it possible to carry out detailed electrical and thermal analysis. Next, SPS experiments are conducted using the ensemble configurations theoretically identified. Using aluminum powders as a case study, we find that the temporal profiles of sample shrinkage, which indicate densification behavior, as well as the final density of the sample are clearly different when the electric current flows only through the sample or through the die containing it, whereas the temperature cycle and mechanical load are the same in both cases. PMID:27877354
Peculiar spectral statistics of ensembles of trees and star-like graphs
NASA Astrophysics Data System (ADS)
Kovaleva, V.; Maximov, Yu; Nechaev, S.; Valba, O.
2017-07-01
In this paper we investigate the eigenvalue statistics of exponentially weighted ensembles of full binary trees and p-branching star graphs. We show that spectral densities of corresponding adjacency matrices demonstrate peculiar ultrametric structure inherent to sparse systems. In particular, the tails of the distribution for binary trees share the ‘Lifshitz singularity’ emerging in the one-dimensional localization, while the spectral statistics of p-branching star-like graphs is less universal, being strongly dependent on p. The hierarchical structure of spectra of adjacency matrices is interpreted as sets of resonance frequencies, that emerge in ensembles of fully branched tree-like systems, known as dendrimers. However, the relaxational spectrum is not determined by the cluster topology, but has rather the number-theoretic origin, reflecting the peculiarities of the rare-event statistics typical for one-dimensional systems with a quenched structural disorder. The similarity of spectral densities of an individual dendrimer and of an ensemble of linear chains with exponential distribution in lengths, demonstrates that dendrimers could be served as simple disorder-less toy models of one-dimensional systems with quenched disorder.
Peculiar spectral statistics of ensembles of trees and star-like graphs
Kovaleva, V.; Maximov, Yu; Nechaev, S.; ...
2017-07-11
In this paper we investigate the eigenvalue statistics of exponentially weighted ensembles of full binary trees and p-branching star graphs. We show that spectral densities of corresponding adjacency matrices demonstrate peculiar ultrametric structure inherent to sparse systems. In particular, the tails of the distribution for binary trees share the \\Lifshitz singularity" emerging in the onedimensional localization, while the spectral statistics of p-branching star-like graphs is less universal, being strongly dependent on p. The hierarchical structure of spectra of adjacency matrices is interpreted as sets of resonance frequencies, that emerge in ensembles of fully branched tree-like systems, known as dendrimers. However,more » the relaxational spectrum is not determined by the cluster topology, but has rather the number-theoretic origin, re ecting the peculiarities of the rare-event statistics typical for one-dimensional systems with a quenched structural disorder. The similarity of spectral densities of an individual dendrimer and of ensemble of linear chains with exponential distribution in lengths, demonstrates that dendrimers could be served as simple disorder-less toy models of one-dimensional systems with quenched disorder.« less
Peculiar spectral statistics of ensembles of trees and star-like graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kovaleva, V.; Maximov, Yu; Nechaev, S.
In this paper we investigate the eigenvalue statistics of exponentially weighted ensembles of full binary trees and p-branching star graphs. We show that spectral densities of corresponding adjacency matrices demonstrate peculiar ultrametric structure inherent to sparse systems. In particular, the tails of the distribution for binary trees share the \\Lifshitz singularity" emerging in the onedimensional localization, while the spectral statistics of p-branching star-like graphs is less universal, being strongly dependent on p. The hierarchical structure of spectra of adjacency matrices is interpreted as sets of resonance frequencies, that emerge in ensembles of fully branched tree-like systems, known as dendrimers. However,more » the relaxational spectrum is not determined by the cluster topology, but has rather the number-theoretic origin, re ecting the peculiarities of the rare-event statistics typical for one-dimensional systems with a quenched structural disorder. The similarity of spectral densities of an individual dendrimer and of ensemble of linear chains with exponential distribution in lengths, demonstrates that dendrimers could be served as simple disorder-less toy models of one-dimensional systems with quenched disorder.« less
A Statistical Simulation Approach to Safe Life Fatigue Analysis of Redundant Metallic Components
NASA Technical Reports Server (NTRS)
Matthews, William T.; Neal, Donald M.
1997-01-01
This paper introduces a dual active load path fail-safe fatigue design concept analyzed by Monte Carlo simulation. The concept utilizes the inherent fatigue life differences between selected pairs of components for an active dual path system, enhanced by a stress level bias in one component. The design is applied to a baseline design; a safe life fatigue problem studied in an American Helicopter Society (AHS) round robin. The dual active path design is compared with a two-element standby fail-safe system and the baseline design for life at specified reliability levels and weight. The sensitivity of life estimates for both the baseline and fail-safe designs was examined by considering normal and Weibull distribution laws and coefficient of variation levels. Results showed that the biased dual path system lifetimes, for both the first element failure and residual life, were much greater than for standby systems. The sensitivity of the residual life-weight relationship was not excessive at reliability levels up to R = 0.9999 and the weight penalty was small. The sensitivity of life estimates increases dramatically at higher reliability levels.
Bashir, Saba; Qamar, Usman; Khan, Farhan Hassan
2015-06-01
Conventional clinical decision support systems are based on individual classifiers or simple combination of these classifiers which tend to show moderate performance. This research paper presents a novel classifier ensemble framework based on enhanced bagging approach with multi-objective weighted voting scheme for prediction and analysis of heart disease. The proposed model overcomes the limitations of conventional performance by utilizing an ensemble of five heterogeneous classifiers: Naïve Bayes, linear regression, quadratic discriminant analysis, instance based learner and support vector machines. Five different datasets are used for experimentation, evaluation and validation. The datasets are obtained from publicly available data repositories. Effectiveness of the proposed ensemble is investigated by comparison of results with several classifiers. Prediction results of the proposed ensemble model are assessed by ten fold cross validation and ANOVA statistics. The experimental evaluation shows that the proposed framework deals with all type of attributes and achieved high diagnosis accuracy of 84.16 %, 93.29 % sensitivity, 96.70 % specificity, and 82.15 % f-measure. The f-ratio higher than f-critical and p value less than 0.05 for 95 % confidence interval indicate that the results are extremely statistically significant for most of the datasets.
NASA Astrophysics Data System (ADS)
Sun, Hongyue; Luo, Shuai; Jin, Ran; He, Zhen
2017-07-01
Mathematical modeling is an important tool to investigate the performance of microbial fuel cell (MFC) towards its optimized design. To overcome the shortcoming of traditional MFC models, an ensemble model is developed through integrating both engineering model and statistical analytics for the extrapolation scenarios in this study. Such an ensemble model can reduce laboring effort in parameter calibration and require fewer measurement data to achieve comparable accuracy to traditional statistical model under both the normal and extreme operation regions. Based on different weight between current generation and organic removal efficiency, the ensemble model can give recommended input factor settings to achieve the best current generation and organic removal efficiency. The model predicts a set of optimal design factors for the present tubular MFCs including the anode flow rate of 3.47 mL min-1, organic concentration of 0.71 g L-1, and catholyte pumping flow rate of 14.74 mL min-1 to achieve the peak current at 39.2 mA. To maintain 100% organic removal efficiency, the anode flow rate and organic concentration should be controlled lower than 1.04 mL min-1 and 0.22 g L-1, respectively. The developed ensemble model can be potentially modified to model other types of MFCs or bioelectrochemical systems.
Büttner, Kathrin; Krieter, Joachim
2018-08-01
The analysis of trade networks as well as the spread of diseases within these systems focuses mainly on pure animal movements between farms. However, additional data included as edge weights can complement the informational content of the network analysis. However, the inclusion of edge weights can also alter the outcome of the network analysis. Thus, the aim of the study was to compare unweighted and weighted network analyses of a pork supply chain in Northern Germany and to evaluate the impact on the centrality parameters. Five different weighted network versions were constructed by adding the following edge weights: number of trade contacts, number of delivered livestock, average number of delivered livestock per trade contact, geographical distance and reciprocal geographical distance. Additionally, two different edge weight standardizations were used. The network observed from 2013 to 2014 contained 678 farms which were connected by 1,018 edges. General network characteristics including shortest path structure (e.g. identical shortest paths, shortest path lengths) as well as centrality parameters for each network version were calculated. Furthermore, the targeted and the random removal of farms were performed in order to evaluate the structural changes in the networks. All network versions and edge weight standardizations revealed the same number of shortest paths (1,935). Between 94.4 to 98.9% of the unweighted network and the weighted network versions were identical. Furthermore, depending on the calculated centrality parameters and the edge weight standardization used, it could be shown that the weighted network versions differed from the unweighted network (e.g. for the centrality parameters based on ingoing trade contacts) or did not differ (e.g. for the centrality parameters based on the outgoing trade contacts) with regard to the Spearman Rank Correlation and the targeted removal of farms. The choice of standardization method as well as the inclusion or exclusion of specific farm types (e.g. abattoirs) can alter the results significantly. These facts have to be considered when centrality parameters are to be used for the implementation of prevention and control strategies in the case of an epidemic. Copyright © 2018 Elsevier B.V. All rights reserved.
Frictional behaviour of sandstone: A sample-size dependent triaxial investigation
NASA Astrophysics Data System (ADS)
Roshan, Hamid; Masoumi, Hossein; Regenauer-Lieb, Klaus
2017-01-01
Frictional behaviour of rocks from the initial stage of loading to final shear displacement along the formed shear plane has been widely investigated in the past. However the effect of sample size on such frictional behaviour has not attracted much attention. This is mainly related to the limitations in rock testing facilities as well as the complex mechanisms involved in sample-size dependent frictional behaviour of rocks. In this study, a suite of advanced triaxial experiments was performed on Gosford sandstone samples at different sizes and confining pressures. The post-peak response of the rock along the formed shear plane has been captured for the analysis with particular interest in sample-size dependency. Several important phenomena have been observed from the results of this study: a) the rate of transition from brittleness to ductility in rock is sample-size dependent where the relatively smaller samples showed faster transition toward ductility at any confining pressure; b) the sample size influences the angle of formed shear band and c) the friction coefficient of the formed shear plane is sample-size dependent where the relatively smaller sample exhibits lower friction coefficient compared to larger samples. We interpret our results in terms of a thermodynamics approach in which the frictional properties for finite deformation are viewed as encompassing a multitude of ephemeral slipping surfaces prior to the formation of the through going fracture. The final fracture itself is seen as a result of the self-organisation of a sufficiently large ensemble of micro-slip surfaces and therefore consistent in terms of the theory of thermodynamics. This assumption vindicates the use of classical rock mechanics experiments to constrain failure of pressure sensitive rocks and the future imaging of these micro-slips opens an exciting path for research in rock failure mechanisms.
SAChES: Scalable Adaptive Chain-Ensemble Sampling.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Swiler, Laura Painton; Ray, Jaideep; Ebeida, Mohamed Salah
We present the development of a parallel Markov Chain Monte Carlo (MCMC) method called SAChES, Scalable Adaptive Chain-Ensemble Sampling. This capability is targed to Bayesian calibration of com- putationally expensive simulation models. SAChES involves a hybrid of two methods: Differential Evo- lution Monte Carlo followed by Adaptive Metropolis. Both methods involve parallel chains. Differential evolution allows one to explore high-dimensional parameter spaces using loosely coupled (i.e., largely asynchronous) chains. Loose coupling allows the use of large chain ensembles, with far more chains than the number of parameters to explore. This reduces per-chain sampling burden, enables high-dimensional inversions and the usemore » of computationally expensive forward models. The large number of chains can also ameliorate the impact of silent-errors, which may affect only a few chains. The chain ensemble can also be sampled to provide an initial condition when an aberrant chain is re-spawned. Adaptive Metropolis takes the best points from the differential evolution and efficiently hones in on the poste- rior density. The multitude of chains in SAChES is leveraged to (1) enable efficient exploration of the parameter space; and (2) ensure robustness to silent errors which may be unavoidable in extreme-scale computational platforms of the future. This report outlines SAChES, describes four papers that are the result of the project, and discusses some additional results.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Olsen, Seth, E-mail: seth.olsen@uq.edu.au
2015-01-28
This paper reviews basic results from a theory of the a priori classical probabilities (weights) in state-averaged complete active space self-consistent field (SA-CASSCF) models. It addresses how the classical probabilities limit the invariance of the self-consistency condition to transformations of the complete active space configuration interaction (CAS-CI) problem. Such transformations are of interest for choosing representations of the SA-CASSCF solution that are diabatic with respect to some interaction. I achieve the known result that a SA-CASSCF can be self-consistently transformed only within degenerate subspaces of the CAS-CI ensemble density matrix. For uniformly distributed (“microcanonical”) SA-CASSCF ensembles, self-consistency is invariant tomore » any unitary CAS-CI transformation that acts locally on the ensemble support. Most SA-CASSCF applications in current literature are microcanonical. A problem with microcanonical SA-CASSCF models for problems with “more diabatic than adiabatic” states is described. The problem is that not all diabatic energies and couplings are self-consistently resolvable. A canonical-ensemble SA-CASSCF strategy is proposed to solve the problem. For canonical-ensemble SA-CASSCF, the equilibrated ensemble is a Boltzmann density matrix parametrized by its own CAS-CI Hamiltonian and a Lagrange multiplier acting as an inverse “temperature,” unrelated to the physical temperature. Like the convergence criterion for microcanonical-ensemble SA-CASSCF, the equilibration condition for canonical-ensemble SA-CASSCF is invariant to transformations that act locally on the ensemble CAS-CI density matrix. The advantage of a canonical-ensemble description is that more adiabatic states can be included in the support of the ensemble without running into convergence problems. The constraint on the dimensionality of the problem is relieved by the introduction of an energy constraint. The method is illustrated with a complete active space valence-bond (CASVB) analysis of the charge/bond resonance electronic structure of a monomethine cyanine: Michler’s hydrol blue. The diabatic CASVB representation is shown to vary weakly for “temperatures” corresponding to visible photon energies. Canonical-ensemble SA-CASSCF enables the resolution of energies and couplings for all covalent and ionic CASVB structures contributing to the SA-CASSCF ensemble. The CASVB solution describes resonance of charge- and bond-localized electronic structures interacting via bridge resonance superexchange. The resonance couplings can be separated into channels associated with either covalent charge delocalization or chemical bonding interactions, with the latter significantly stronger than the former.« less
Olsen, Seth
2015-01-28
This paper reviews basic results from a theory of the a priori classical probabilities (weights) in state-averaged complete active space self-consistent field (SA-CASSCF) models. It addresses how the classical probabilities limit the invariance of the self-consistency condition to transformations of the complete active space configuration interaction (CAS-CI) problem. Such transformations are of interest for choosing representations of the SA-CASSCF solution that are diabatic with respect to some interaction. I achieve the known result that a SA-CASSCF can be self-consistently transformed only within degenerate subspaces of the CAS-CI ensemble density matrix. For uniformly distributed ("microcanonical") SA-CASSCF ensembles, self-consistency is invariant to any unitary CAS-CI transformation that acts locally on the ensemble support. Most SA-CASSCF applications in current literature are microcanonical. A problem with microcanonical SA-CASSCF models for problems with "more diabatic than adiabatic" states is described. The problem is that not all diabatic energies and couplings are self-consistently resolvable. A canonical-ensemble SA-CASSCF strategy is proposed to solve the problem. For canonical-ensemble SA-CASSCF, the equilibrated ensemble is a Boltzmann density matrix parametrized by its own CAS-CI Hamiltonian and a Lagrange multiplier acting as an inverse "temperature," unrelated to the physical temperature. Like the convergence criterion for microcanonical-ensemble SA-CASSCF, the equilibration condition for canonical-ensemble SA-CASSCF is invariant to transformations that act locally on the ensemble CAS-CI density matrix. The advantage of a canonical-ensemble description is that more adiabatic states can be included in the support of the ensemble without running into convergence problems. The constraint on the dimensionality of the problem is relieved by the introduction of an energy constraint. The method is illustrated with a complete active space valence-bond (CASVB) analysis of the charge/bond resonance electronic structure of a monomethine cyanine: Michler's hydrol blue. The diabatic CASVB representation is shown to vary weakly for "temperatures" corresponding to visible photon energies. Canonical-ensemble SA-CASSCF enables the resolution of energies and couplings for all covalent and ionic CASVB structures contributing to the SA-CASSCF ensemble. The CASVB solution describes resonance of charge- and bond-localized electronic structures interacting via bridge resonance superexchange. The resonance couplings can be separated into channels associated with either covalent charge delocalization or chemical bonding interactions, with the latter significantly stronger than the former.
Exploring diversity in ensemble classification: Applications in large area land cover mapping
NASA Astrophysics Data System (ADS)
Mellor, Andrew; Boukir, Samia
2017-07-01
Ensemble classifiers, such as random forests, are now commonly applied in the field of remote sensing, and have been shown to perform better than single classifier systems, resulting in reduced generalisation error. Diversity across the members of ensemble classifiers is known to have a strong influence on classification performance - whereby classifier errors are uncorrelated and more uniformly distributed across ensemble members. The relationship between ensemble diversity and classification performance has not yet been fully explored in the fields of information science and machine learning and has never been examined in the field of remote sensing. This study is a novel exploration of ensemble diversity and its link to classification performance, applied to a multi-class canopy cover classification problem using random forests and multisource remote sensing and ancillary GIS data, across seven million hectares of diverse dry-sclerophyll dominated public forests in Victoria Australia. A particular emphasis is placed on analysing the relationship between ensemble diversity and ensemble margin - two key concepts in ensemble learning. The main novelty of our work is on boosting diversity by emphasizing the contribution of lower margin instances used in the learning process. Exploring the influence of tree pruning on diversity is also a new empirical analysis that contributes to a better understanding of ensemble performance. Results reveal insights into the trade-off between ensemble classification accuracy and diversity, and through the ensemble margin, demonstrate how inducing diversity by targeting lower margin training samples is a means of achieving better classifier performance for more difficult or rarer classes and reducing information redundancy in classification problems. Our findings inform strategies for collecting training data and designing and parameterising ensemble classifiers, such as random forests. This is particularly important in large area remote sensing applications, for which training data is costly and resource intensive to collect.
Biodegradability of degradable plastic waste.
Agamuthu, P; Faizura, Putri Nadzrul
2005-04-01
Plastic waste constitutes the third largest waste volume in Malaysian municipal solid waste (MSW), next to putrescible waste and paper. The plastic component in MSW from Kuala Lumpur averages 24% (by weight), whereas the national mean is about 15%. The 144 waste dumps in the country receive about 95% of the MSW, including plastic waste. The useful life of the landfills is fast diminishing as the plastic waste stays un-degraded for more than 50 years. In this study the compostability of polyethylene and pro-oxidant additive-based environmentally degradable plastics (EDP) was investigated. Linear low-density polyethylene (LLDPE) samples exposed hydrolytically or oxidatively at 60 degrees C showed that the abiotic degradation path was oxidative rather than hydrolytic. There was a weight loss of 8% and the plastic has been oxidized as shown by the additional carbonyl group exhibited in the Fourier transform infra red (FTIR) Spectrum. Oxidation rate seemed to be influenced by the amount of pro-oxidant additive, the chemical structure and morphology of the plastic samples, and the surface area. Composting studies during a 45-day experiment showed that the percentage elongation (reduction) was 20% for McD samples [high-density polyethylene, (HDPE) with 3% additive] and LL samples (LLDPE with 7% additive) and 18% reduction for totally degradable plastic (TDP) samples (HDPE with 3% additive). Lastly, microbial experiments using Pseudomonas aeroginosa on carbon-free media with degradable plastic samples as the sole carbon source, showed confirmatory results. A positive bacterial growth and a weight loss of 2.2% for degraded polyethylene samples were evident to show that the degradable plastic is biodegradable.
Ensemble-based evaluation of extreme water levels for the eastern Baltic Sea
NASA Astrophysics Data System (ADS)
Eelsalu, Maris; Soomere, Tarmo
2016-04-01
The risks and damages associated with coastal flooding that are naturally associated with an increase in the magnitude of extreme storm surges are one of the largest concerns of countries with extensive low-lying nearshore areas. The relevant risks are even more contrast for semi-enclosed water bodies such as the Baltic Sea where subtidal (weekly-scale) variations in the water volume of the sea substantially contribute to the water level and lead to large spreading of projections of future extreme water levels. We explore the options for using large ensembles of projections to more reliably evaluate return periods of extreme water levels. Single projections of the ensemble are constructed by means of fitting several sets of block maxima with various extreme value distributions. The ensemble is based on two simulated data sets produced in the Swedish Meteorological and Hydrological Institute. A hindcast by the Rossby Centre Ocean model is sampled with a resolution of 6 h and a similar hindcast by the circulation model NEMO with a resolution of 1 h. As the annual maxima of water levels in the Baltic Sea are not always uncorrelated, we employ maxima for calendar years and for stormy seasons. As the shape parameter of the Generalised Extreme Value distribution changes its sign and substantially varies in magnitude along the eastern coast of the Baltic Sea, the use of a single distribution for the entire coast is inappropriate. The ensemble involves projections based on the Generalised Extreme Value, Gumbel and Weibull distributions. The parameters of these distributions are evaluated using three different ways: maximum likelihood method and method of moments based on both biased and unbiased estimates. The total number of projections in the ensemble is 40. As some of the resulting estimates contain limited additional information, the members of pairs of projections that are highly correlated are assigned weights 0.6. A comparison of the ensemble-based projection of extreme water levels and their return periods with similar estimates derived from local observations reveals an interesting pattern of match and mismatch. The match is almost perfect in measurement sites where local effects (e.g., wave-induced set-up or local surge in very shallow areas that are not resolved by circulation models) do not contribute to the observed values of water level. There is, however, substantial mismatch between projected and observed extreme values for most of the Estonian coast. The mismatch is largest for sections that are open to high waves and for several bays that are deeply cut into mainland but open for predominant strong wind directions. Detailed quantification of this mismatch eventually makes it possible to develop substantially improved estimates of extreme water levels in sections where local effects considerably contribute into the total water level.
NASA Astrophysics Data System (ADS)
Smerdon, J. E.; Büntgen, U.; Ljungqvist, F. C.; Esper, J.; Fernández-Donado, L.; Gonzalez-Rouco, F. J.; Luterbacher, J.; McCarroll, D.; Wagner, S.; Wahl, E. R.; Wanner, H.; Werner, J.; Zorita, E.
2012-12-01
A reconstruction of mean European summer (JJA) land temperatures from 138 B.C.E. to 2003 C.E. is presented and compared to 37 forced transient simulations of the last millennium from coupled General Circulation Models (CGCMs). Eleven annually resolved tree-ring and documentary records from ten European countries/regions were used for the reconstruction and compiled as part of the Euro_Med working group contribution to the PAGES 2k Regional Network. Records were selected based upon their summer temperature signal, annual resolution, and time-continuous sampling. All tree-ring data were detrended using the Regional Curve Standardization (RCS) method to retain low-frequency variance in the resulting mean chronologies. The calibration time series was the area-weighted JJA temperature computed from the CRUTEM4v dataset over a European land domain (35°-70°N, 10°W-40°E). A nested 'Composite-Plus-Scale' reconstruction was derived using nine nests reflecting the availability of predictors back in time. Each nest was calculated by standardizing the available predictor series over the calibration interval, and subsequently calculating a weighted composite in which each proxy was multiplied by its correlation with the target index. The CPS methodology was implemented using a resampling scheme that uses 104 years for calibration. The initial calibration period extended from 1850-1953 C.E. and was incremented by one year until reaching the final period of 1900-2003 C.E., yielding a total of 51 reconstructions for each nest. Within each calibration step, the 50 years excluded from calibration were used for validation. Validation statistics across all reconstruction ensemble members within each nest indicate skillful reconstructions (RE: 0.42-0.64; CE: 0.26-0.54) and are all above the maximum validation statistics achieved in an ensemble of red noise benchmarking experiments. Warm periods in the derived reconstruction during the 1st, 2nd, and 7th-12th centuries compare to similar warm summer temperatures during the mid 20th century, although the 2003 summer remains the warmest single summer over the duration of the reconstruction. A relative period of cold summer temperatures is also noted from the 14th-19th centuries, consistent with the expected timing of the Little Ice Age. The nested CPS reconstruction is also compared to a 37-member ensemble of millennium-length forced transient simulations from CGCMs, including eleven simulations from the collection of CMIP5/PMIP3 last-millennium experiments. The simulations are separated based on their use of strong or weak scaling of total solar irradiance (TSI) forcing over the last millennium. Although both ensembles of simulated mean European temperatures compare well with the nested CPS reconstruction, there is some evidence that there is better agreement with the ensemble using strong TSI as forcing.
Ensembles of satellite aerosol retrievals based on three AATSR algorithms within aerosol_cci
NASA Astrophysics Data System (ADS)
Kosmale, Miriam; Popp, Thomas
2016-04-01
Ensemble techniques are widely used in the modelling community, combining different modelling results in order to reduce uncertainties. This approach could be also adapted to satellite measurements. Aerosol_cci is an ESA funded project, where most of the European aerosol retrieval groups work together. The different algorithms are homogenized as far as it makes sense, but remain essentially different. Datasets are compared with ground based measurements and between each other. Three AATSR algorithms (Swansea university aerosol retrieval, ADV aerosol retrieval by FMI and Oxford aerosol retrieval ORAC) provide within this project 17 year global aerosol records. Each of these algorithms provides also uncertainty information on pixel level. Within the presented work, an ensembles of the three AATSR algorithms is performed. The advantage over each single algorithm is the higher spatial coverage due to more measurement pixels per gridbox. A validation to ground based AERONET measurements shows still a good correlation of the ensemble, compared to the single algorithms. Annual mean maps show the global aerosol distribution, based on a combination of the three aerosol algorithms. In addition, pixel level uncertainties of each algorithm are used for weighting the contributions, in order to reduce the uncertainty of the ensemble. Results of different versions of the ensembles for aerosol optical depth will be presented and discussed. The results are validated against ground based AERONET measurements. A higher spatial coverage on daily basis allows better results in annual mean maps. The benefit of using pixel level uncertainties is analysed.
Classroom Environment as Related to Contest Ratings among High School Performing Ensembles.
ERIC Educational Resources Information Center
Hamann, Donald L.; And Others
1990-01-01
Examines influence of classroom environments, measured by the Classroom Environment Scale, Form R (CESR), on vocal and instrumental ensembles' musical achievement at festival contests. Using random sample, reveals subjects with higher scores on CESR scales of involvement, affiliation, teacher support, and organization received better contest…
Tiered Evaluation in Large Ensemble Settings.
ERIC Educational Resources Information Center
Scott, David
1998-01-01
Discusses the use of a tiered evaluation system (TES) that allows students to work at different levels, enables teachers to assess progress objectively, and presents students with appropriate challenges in the music ensembles. Focuses on how TES works and its advantages, considers the challenges and flexibility of TES, and provides samples. (CMK)
An Effective Evolutionary Approach for Bicriteria Shortest Path Routing Problems
NASA Astrophysics Data System (ADS)
Lin, Lin; Gen, Mitsuo
Routing problem is one of the important research issues in communication network fields. In this paper, we consider a bicriteria shortest path routing (bSPR) model dedicated to calculating nondominated paths for (1) the minimum total cost and (2) the minimum transmission delay. To solve this bSPR problem, we propose a new multiobjective genetic algorithm (moGA): (1) an efficient chromosome representation using the priority-based encoding method; (2) a new operator of GA parameters auto-tuning, which is adaptively regulation of exploration and exploitation based on the change of the average fitness of parents and offspring which is occurred at each generation; and (3) an interactive adaptive-weight fitness assignment mechanism is implemented that assigns weights to each objective and combines the weighted objectives into a single objective function. Numerical experiments with various scales of network design problems show the effectiveness and the efficiency of our approach by comparing with the recent researches.
Features of the Correlation Structure of Price Indices
Gao, Xiangyun; An, Haizhong; Zhong, Weiqiong
2013-01-01
What are the features of the correlation structure of price indices? To answer this question, 5 types of price indices, including 195 specific price indices from 2003 to 2011, were selected as sample data. To build a weighted network of price indices each price index is represented by a vertex, and a positive correlation between two price indices is represented by an edge. We studied the features of the weighted network structure by applying economic theory to the analysis of complex network parameters. We found that the frequency of the price indices follows a normal distribution by counting the weighted degrees of the nodes, and we identified the price indices which have an important impact on the network's structure. We found out small groups in the weighted network by the methods of k-core and k-plex. We discovered structure holes in the network by calculating the hierarchy of the nodes. Finally, we found that the price indices weighted network has a small-world effect by calculating the shortest path. These results provide a scientific basis for macroeconomic control policies. PMID:23593399
Analysis of explicit model predictive control for path-following control
2018-01-01
In this paper, explicit Model Predictive Control(MPC) is employed for automated lane-keeping systems. MPC has been regarded as the key to handle such constrained systems. However, the massive computational complexity of MPC, which employs online optimization, has been a major drawback that limits the range of its target application to relatively small and/or slow problems. Explicit MPC can reduce this computational burden using a multi-parametric quadratic programming technique(mp-QP). The control objective is to derive an optimal front steering wheel angle at each sampling time so that autonomous vehicles travel along desired paths, including straight, circular, and clothoid parts, at high entry speeds. In terms of the design of the proposed controller, a method of choosing weighting matrices in an optimization problem and the range of horizons for path-following control are described through simulations. For the verification of the proposed controller, simulation results obtained using other control methods such as MPC, Linear-Quadratic Regulator(LQR), and driver model are employed, and CarSim, which reflects the features of a vehicle more realistically than MATLAB/Simulink, is used for reliable demonstration. PMID:29534080
Analysis of explicit model predictive control for path-following control.
Lee, Junho; Chang, Hyuk-Jun
2018-01-01
In this paper, explicit Model Predictive Control(MPC) is employed for automated lane-keeping systems. MPC has been regarded as the key to handle such constrained systems. However, the massive computational complexity of MPC, which employs online optimization, has been a major drawback that limits the range of its target application to relatively small and/or slow problems. Explicit MPC can reduce this computational burden using a multi-parametric quadratic programming technique(mp-QP). The control objective is to derive an optimal front steering wheel angle at each sampling time so that autonomous vehicles travel along desired paths, including straight, circular, and clothoid parts, at high entry speeds. In terms of the design of the proposed controller, a method of choosing weighting matrices in an optimization problem and the range of horizons for path-following control are described through simulations. For the verification of the proposed controller, simulation results obtained using other control methods such as MPC, Linear-Quadratic Regulator(LQR), and driver model are employed, and CarSim, which reflects the features of a vehicle more realistically than MATLAB/Simulink, is used for reliable demonstration.
Efficient Simulation of Tropical Cyclone Pathways with Stochastic Perturbations
NASA Astrophysics Data System (ADS)
Webber, R.; Plotkin, D. A.; Abbot, D. S.; Weare, J.
2017-12-01
Global Climate Models (GCMs) are known to statistically underpredict intense tropical cyclones (TCs) because they fail to capture the rapid intensification and high wind speeds characteristic of the most destructive TCs. Stochastic parametrization schemes have the potential to improve the accuracy of GCMs. However, current analysis of these schemes through direct sampling is limited by the computational expense of simulating a rare weather event at fine spatial gridding. The present work introduces a stochastically perturbed parametrization tendency (SPPT) scheme to increase simulated intensity of TCs. We adapt the Weighted Ensemble algorithm to simulate the distribution of TCs at a fraction of the computational effort required in direct sampling. We illustrate the efficiency of the SPPT scheme by comparing simulations at different spatial resolutions and stochastic parameter regimes. Stochastic parametrization and rare event sampling strategies have great potential to improve TC prediction and aid understanding of tropical cyclogenesis. Since rising sea surface temperatures are postulated to increase the intensity of TCs, these strategies can also improve predictions about climate change-related weather patterns. The rare event sampling strategies used in the current work are not only a novel tool for studying TCs, but they may also be applied to sampling any range of extreme weather events.
Chodera, John D; Shirts, Michael R
2011-11-21
The widespread popularity of replica exchange and expanded ensemble algorithms for simulating complex molecular systems in chemistry and biophysics has generated much interest in discovering new ways to enhance the phase space mixing of these protocols in order to improve sampling of uncorrelated configurations. Here, we demonstrate how both of these classes of algorithms can be considered as special cases of Gibbs sampling within a Markov chain Monte Carlo framework. Gibbs sampling is a well-studied scheme in the field of statistical inference in which different random variables are alternately updated from conditional distributions. While the update of the conformational degrees of freedom by Metropolis Monte Carlo or molecular dynamics unavoidably generates correlated samples, we show how judicious updating of the thermodynamic state indices--corresponding to thermodynamic parameters such as temperature or alchemical coupling variables--can substantially increase mixing while still sampling from the desired distributions. We show how state update methods in common use can lead to suboptimal mixing, and present some simple, inexpensive alternatives that can increase mixing of the overall Markov chain, reducing simulation times necessary to obtain estimates of the desired precision. These improved schemes are demonstrated for several common applications, including an alchemical expanded ensemble simulation, parallel tempering, and multidimensional replica exchange umbrella sampling.
Ovis: A Framework for Visual Analysis of Ocean Forecast Ensembles.
Höllt, Thomas; Magdy, Ahmed; Zhan, Peng; Chen, Guoning; Gopalakrishnan, Ganesh; Hoteit, Ibrahim; Hansen, Charles D; Hadwiger, Markus
2014-08-01
We present a novel integrated visualization system that enables interactive visual analysis of ensemble simulations of the sea surface height that is used in ocean forecasting. The position of eddies can be derived directly from the sea surface height and our visualization approach enables their interactive exploration and analysis.The behavior of eddies is important in different application settings of which we present two in this paper. First, we show an application for interactive planning of placement as well as operation of off-shore structures using real-world ensemble simulation data of the Gulf of Mexico. Off-shore structures, such as those used for oil exploration, are vulnerable to hazards caused by eddies, and the oil and gas industry relies on ocean forecasts for efficient operations. We enable analysis of the spatial domain, as well as the temporal evolution, for planning the placement and operation of structures.Eddies are also important for marine life. They transport water over large distances and with it also heat and other physical properties as well as biological organisms. In the second application we present the usefulness of our tool, which could be used for planning the paths of autonomous underwater vehicles, so called gliders, for marine scientists to study simulation data of the largely unexplored Red Sea.
Nanoscale Electronic Conditioning for Improvement of Nanowire Light-Emitting-Diode Efficiency.
May, Brelon J; Belz, Matthew R; Ahamed, Arshad; Sarwar, A T M G; Selcu, Camelia M; Myers, Roberto C
2018-04-24
Commercial III-Nitride LEDs and lasers spanning visible and ultraviolet wavelengths are based on epitaxial films. Alternatively, nanowire-based III-Nitride optoelectronics offer the advantage of strain compliance and high crystalline quality growth on a variety of inexpensive substrates. However, nanowire LEDs exhibit an inherent property distribution, resulting in uneven current spreading through macroscopic devices that consist of millions of individual nanowire diodes connected in parallel. Despite being electrically connected, only a small fraction of nanowires, sometimes <1%, contribute to the electroluminescence (EL). Here, we show that a population of electrical shorts exists in the devices, consisting of a subset of low-resistance nanowires that pass a large portion of the total current in the ensemble devices. Burn-in electronic conditioning is performed by applying a short-term overload voltage; the nanoshorts experience very high current density, sufficient to render them open circuits, thereby forcing a new current path through more nanowire LEDs in an ensemble device. Current-voltage measurements of individual nanowires are acquired using conductive atomic force microscopy to observe the removal of nanoshorts using burn-in. In macroscopic devices, this results in a 33× increase in peak EL and reduced leakage current. Burn-in conditioning of nanowire ensembles therefore provides a straightforward method to mitigate nonuniformities inherent to nanowire devices.
NASA Astrophysics Data System (ADS)
Murray, S.; Guerra, J. A.
2017-12-01
One essential component of operational space weather forecasting is the prediction of solar flares. Early flare forecasting work focused on statistical methods based on historical flaring rates, but more complex machine learning methods have been developed in recent years. A multitude of flare forecasting methods are now available, however it is still unclear which of these methods performs best, and none are substantially better than climatological forecasts. Current operational space weather centres cannot rely on automated methods, and generally use statistical forecasts with a little human intervention. Space weather researchers are increasingly looking towards methods used in terrestrial weather to improve current forecasting techniques. Ensemble forecasting has been used in numerical weather prediction for many years as a way to combine different predictions in order to obtain a more accurate result. It has proved useful in areas such as magnetospheric modelling and coronal mass ejection arrival analysis, however has not yet been implemented in operational flare forecasting. Here we construct ensemble forecasts for major solar flares by linearly combining the full-disk probabilistic forecasts from a group of operational forecasting methods (ASSA, ASAP, MAG4, MOSWOC, NOAA, and Solar Monitor). Forecasts from each method are weighted by a factor that accounts for the method's ability to predict previous events, and several performance metrics (both probabilistic and categorical) are considered. The results provide space weather forecasters with a set of parameters (combination weights, thresholds) that allow them to select the most appropriate values for constructing the 'best' ensemble forecast probability value, according to the performance metric of their choice. In this way different forecasts can be made to fit different end-user needs.
Evaluation of the North American Multi-Model Ensemble System for Monthly and Seasonal Prediction
NASA Astrophysics Data System (ADS)
Zhang, Q.
2014-12-01
Since August 2011, the real time seasonal forecasts of the U.S. National Multi-Model Ensemble (NMME) have been made on 8th of each month by NCEP Climate Prediction Center (CPC). The participating models were NCEP/CFSv1&2, GFDL/CM2.2, NCAR/U.Miami/COLA/CCSM3, NASA/GEOS5, IRI/ ECHAM-a & ECHAM-f in the first year of the real time NMME forecast. Two Canadian coupled models CMC/CanCM3 and CM4 joined in and CFSv1 and IRI's models dropped out in the second year. The NMME team at CPC collects monthly means of three variables, precipitation, temperature at 2m and sea surface temperature from each modeling center on a 1x1 global grid, removes systematic errors, makes the grand ensemble mean in equal weight for each model mean and probability forecast with equal weight for each member of each model. This provides the NMME forecast locked in schedule for the CPC operational seasonal and monthly outlook. The basic verification metrics of seasonal and monthly prediction of NMME are calculated as an evaluation of skill, including both deterministic and probabilistic forecasts for the 3-year real time (August, 2011- July 2014) period and the 30-year retrospective forecast (1982-2011) of the individual models as well as the NMME ensemble. The motivation of this study is to provide skill benchmarks for future improvements of the NMME seasonal and monthly prediction system. We also want to establish whether the real time and hindcast periods (used for bias correction in real time) are consistent. The experimental phase I of the project already supplies routine guidance to users of the NMME forecasts.
NASA Astrophysics Data System (ADS)
Vogel, Thomas; Perez, Danny; Junghans, Christoph
2014-03-01
We show direct formal relationships between the Wang-Landau iteration [PRL 86, 2050 (2001)], metadynamics [PNAS 99, 12562 (2002)] and statistical temperature molecular dynamics [PRL 97, 050601 (2006)], the major Monte Carlo and molecular dynamics work horses for sampling from a generalized, multicanonical ensemble. We aim at helping to consolidate the developments in the different areas by indicating how methodological advancements can be transferred in a straightforward way, avoiding the parallel, largely independent, developments tracks observed in the past.
Kinetics and reaction coordinates of the reassembly of protein fragments via forward flux sampling.
Borrero, Ernesto E; Contreras Martínez, Lydia M; DeLisa, Matthew P; Escobedo, Fernando A
2010-05-19
We studied the mechanism of the reassembly and folding process of two fragments of a split lattice protein by using forward flux sampling (FFS). Our results confirmed previous thermodynamics and kinetics analyses that suggested that the disruption of the critical core (of an unsplit protein that folds by a nucleation mechanism) plays a key role in the reassembly mechanism of the split system. For several split systems derived from a parent 48-mer model, we estimated the reaction coordinates in terms of collective variables by using the FFS least-square estimation method and found that the reassembly transition is best described by a combination of the total number of native contacts, the number of interchain native contacts, and the total conformational energy of the split system. We also analyzed the transition path ensemble obtained from FFS simulations using the estimated reaction coordinates as order parameters to identify the microscopic features that differentiate the reassembly of the different split systems studied. We found that in the fastest folding split system, a balanced distribution of the original-core amino acids (of the unsplit system) between protein fragments propitiates interchain interactions at early stages of the folding process. Only this system exhibits a different reassembly mechanism from that of the unsplit protein, involving the formation of a different folding nucleus. In the slowest folding system, the concentration of the folding nucleus in one fragment causes its early prefolding, whereas the second fragment tends to remain as a detached random coil. We also show that the reassembly rate can be either increased or decreased by tuning interchain cooperativeness via the introduction of a single point mutation that either strengthens or weakens one of the native interchain contacts (prevalent in the transition state ensemble). Copyright (c) 2010 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Yang, Qiang; Culbertson, Charles W.; Nielsen, Martha G.; Schalk, Charles W.; Johnson, Carole D.; Marvinney, Robert G.; Stute, Martin; Zheng, Yan
2014-01-01
To understand the hydrogeochemical processes regulating well water arsenic (As) evolution in fractured bedrock aquifers, three domestic wells with [As] up to 478 μg/L are investigated in central Maine. Geophysical logging reveals that fractures near the borehole bottom contribute 70-100% of flow. Borehole and fracture water samples from various depths show significant proportions of As (up to 69%) and Fe (93-99%) in particulates (>0.45 μm). These particulates and those settled after a 16-day batch experiment contain 560-13,000 g/kg of As and 14-35% weight/weight of Fe. As/Fe ratios (2.5-20 mmol/mol) and As partitioning ratios (adsorbed/dissolved [As], 20,000-100,000 L/kg) suggest that As is sorbed onto amorphous hydrous ferric oxides. Newly drilled cores also show enrichment of As (up to 1300 mg/kg) sorbed onto secondary iron minerals on the fracture surfaces. Pumping at high flow rates induces large decreases in particulate As and Fe, a moderate increase in dissolved [As] and As(III)/As ratio, while little change in major ion chemistry. The δD and δ18O are similar for the borehole and fracture waters, suggesting a same source of recharge from atmospheric precipitation. Results support a conceptual model invoking flow and sorption controls on groundwater [As] in fractured bedrock aquifers whereby oxygen infiltration promotes the oxidation of As-bearing sulfides at shallower depths in the oxic portion of the flow path releasing As and Fe; followed by Fe oxidation to form Fe oxyhydroxide particulates, which are transported in fractures and sorb As along the flow path until intercepted by boreholes. In the anoxic portions of the flow path, reductive dissolution of As-sorbed iron particulates could re-mobilize As. For exposure assessment, we recommend sampling of groundwater without filtration to obtain total As concentration in groundwater.
Yang, Qiang; Culbertson, Charles W; Nielsen, Martha G; Schalk, Charles W; Johnson, Carole D; Marvinney, Robert G; Stute, Martin; Zheng, Yan
2015-02-01
To understand the hydrogeochemical processes regulating well water arsenic (As) evolution in fractured bedrock aquifers, three domestic wells with [As] up to 478 μg/L are investigated in central Maine. Geophysical logging reveals that fractures near the borehole bottom contribute 70-100% of flow. Borehole and fracture water samples from various depths show significant proportions of As (up to 69%) and Fe (93-99%) in particulates (>0.45 μm). These particulates and those settled after a 16-day batch experiment contain 560-13,000 mg/kg of As and 14-35% weight/weight of Fe. As/Fe ratios (2.5-20 mmol/mol) and As partitioning ratios (adsorbed/dissolved [As], 20,000-100,000 L/kg) suggest that As is sorbed onto amorphous hydrous ferric oxides. Newly drilled cores also show enrichment of As (up to 1300 mg/kg) sorbed onto secondary iron minerals on the fracture surfaces. Pumping at high flow rates induces large decreases in particulate As and Fe, a moderate increase in dissolved [As] and As(III)/As ratio, while little change in major ion chemistry. The δD and δ(18)O are similar for the borehole and fracture waters, suggesting a same source of recharge from atmospheric precipitation. Results support a conceptual model invoking flow and sorption controls on groundwater [As] in fractured bedrock aquifers whereby oxygen infiltration promotes the oxidation of As-bearing sulfides at shallower depths in the oxic portion of the flow path releasing As and Fe; followed by Fe oxidation to form Fe oxyhydroxide particulates, which are transported in fractures and sorb As along the flow path until intercepted by boreholes. In the anoxic portions of the flow path, reductive dissolution of As-sorbed iron particulates could re-mobilize As. For exposure assessment, we recommend sampling of groundwater without filtration to obtain total As concentration in groundwater. Copyright © 2014 Elsevier B.V. All rights reserved.
Yang, Qiang; Culbertson, Charles W.; Nielsen, Martha G.; Schalk, Charles W.; Johnson, Carole D.; Marvinney, Robert G.; Stute, Martin; Zheng, Yan
2014-01-01
To understand the hydrogeochemical processes regulating well water arsenic (As) evolution in fractured bedrock aquifers, three domestic wells with [As] up to 478 µg/L are investigated in central Maine. Geophysical logging reveals that fractures near the borehole bottom contribute 70–100% of flow. Borehole and fracture water samples from various depths show significant proportions of As (up to 69%) and Fe (93–99%) in particulates (>0.45 µm). These particulates and those settled after a 16-day batch experiment contain 560–13,000 mg/kg of As and 14–35% weight/weight of Fe. As/Fe ratios (2.5–20 mmole/mole) and As partitioning ratios (adsorbed/dissolved [As], 20,000–100,000 L/kg) suggest that As is sorbed onto amorphous hydrous ferric oxides. Newly drilled cores also show enrichment of As (up to 1,300 mg/kg) sorbed onto secondary iron minerals on the fracture surfaces. Pumping at high flow rates induces large decreases in particulate As and Fe, a moderate increase in dissolved [As] and As(III)/As ratio, while little change in major ion chemistry. The δD and δ18O are similar for the borehole and fracture waters, suggesting a same source of recharge from atmospheric precipitation. Results support a conceptual model invoking flow and sorption controls on groundwater [As] in fractured bedrock aquifers whereby oxygen infiltration promotes oxidation of As-bearing sulfides at shallower depths in the oxic portion of the flow path releasing As and Fe; followed by Fe oxidation to form Fe oxyhydroxide particulates, which are transported in fractures and sorb As along the flow path until intercepted by boreholes. In the anoxic portions of the flow path, reductive dissolution of As-sorbed iron particulates could re-mobilize As. For exposure assessment, we recommend sampling of groundwater without filtration to obtain total As concentration in groundwater. PMID:24842411
NASA Astrophysics Data System (ADS)
Taniguchi, Kenji
2018-04-01
To investigate future variations in high-impact weather events, numerous samples are required. For the detailed assessment in a specific region, a high spatial resolution is also required. A simple ensemble simulation technique is proposed in this paper. In the proposed technique, new ensemble members were generated from one basic state vector and two perturbation vectors, which were obtained by lagged average forecasting simulations. Sensitivity experiments with different numbers of ensemble members, different simulation lengths, and different perturbation magnitudes were performed. Experimental application to a global warming study was also implemented for a typhoon event. Ensemble-mean results and ensemble spreads of total precipitation, atmospheric conditions showed similar characteristics across the sensitivity experiments. The frequencies of the maximum total and hourly precipitation also showed similar distributions. These results indicate the robustness of the proposed technique. On the other hand, considerable ensemble spread was found in each ensemble experiment. In addition, the results of the application to a global warming study showed possible variations in the future. These results indicate that the proposed technique is useful for investigating various meteorological phenomena and the impacts of global warming. The results of the ensemble simulations also enable the stochastic evaluation of differences in high-impact weather events. In addition, the impacts of a spectral nudging technique were also examined. The tracks of a typhoon were quite different between cases with and without spectral nudging; however, the ranges of the tracks among ensemble members were comparable. It indicates that spectral nudging does not necessarily suppress ensemble spread.
Enhanced conformational sampling to visualize a free-energy landscape of protein complex formation
Iida, Shinji; Nakamura, Haruki; Higo, Junichi
2016-01-01
We introduce various, recently developed, generalized ensemble methods, which are useful to sample various molecular configurations emerging in the process of protein–protein or protein–ligand binding. The methods introduced here are those that have been or will be applied to biomolecular binding, where the biomolecules are treated as flexible molecules expressed by an all-atom model in an explicit solvent. Sampling produces an ensemble of conformations (snapshots) that are thermodynamically probable at room temperature. Then, projection of those conformations to an abstract low-dimensional space generates a free-energy landscape. As an example, we show a landscape of homo-dimer formation of an endothelin-1-like molecule computed using a generalized ensemble method. The lowest free-energy cluster at room temperature coincided precisely with the experimentally determined complex structure. Two minor clusters were also found in the landscape, which were largely different from the native complex form. Although those clusters were isolated at room temperature, with rising temperature a pathway emerged linking the lowest and second-lowest free-energy clusters, and a further temperature increment connected all the clusters. This exemplifies that the generalized ensemble method is a powerful tool for computing the free-energy landscape, by which one can discuss the thermodynamic stability of clusters and the temperature dependence of the cluster networks. PMID:27288028
Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V
2012-01-01
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
NASA Astrophysics Data System (ADS)
Chardon, J.; Mathevet, T.; Le Lay, M.; Gailhard, J.
2012-04-01
In the context of a national energy company (EDF : Electricité de France), hydro-meteorological forecasts are necessary to ensure safety and security of installations, meet environmental standards and improve water ressources management and decision making. Hydrological ensemble forecasts allow a better representation of meteorological and hydrological forecasts uncertainties and improve human expertise of hydrological forecasts, which is essential to synthesize available informations, coming from different meteorological and hydrological models and human experience. An operational hydrological ensemble forecasting chain has been developed at EDF since 2008 and is being used since 2010 on more than 30 watersheds in France. This ensemble forecasting chain is characterized ensemble pre-processing (rainfall and temperature) and post-processing (streamflow), where a large human expertise is solicited. The aim of this paper is to compare 2 hydrological ensemble post-processing methods developed at EDF in order improve ensemble forecasts reliability (similar to Monatanari &Brath, 2004; Schaefli et al., 2007). The aim of the post-processing methods is to dress hydrological ensemble forecasts with hydrological model uncertainties, based on perfect forecasts. The first method (called empirical approach) is based on a statistical modelisation of empirical error of perfect forecasts, by streamflow sub-samples of quantile class and lead-time. The second method (called dynamical approach) is based on streamflow sub-samples of quantile class and streamflow variation, and lead-time. On a set of 20 watersheds used for operational forecasts, results show that both approaches are necessary to ensure a good post-processing of hydrological ensemble, allowing a good improvement of reliability, skill and sharpness of ensemble forecasts. The comparison of the empirical and dynamical approaches shows the limits of the empirical approach which is not able to take into account hydrological dynamic and processes, i. e. sample heterogeneity. For a same streamflow range corresponds different processes such as rising limbs or recession, where uncertainties are different. The dynamical approach improves reliability, skills and sharpness of forecasts and globally reduces confidence intervals width. When compared in details, the dynamical approach allows a noticeable reduction of confidence intervals during recessions where uncertainty is relatively lower and a slight increase of confidence intervals during rising limbs or snowmelt where uncertainty is greater. The dynamic approach, validated by forecaster's experience that considered the empirical approach not discriminative enough, improved forecaster's confidence and communication of uncertainties. Montanari, A. and Brath, A., (2004). A stochastic approach for assessing the uncertainty of rainfall-runoff simulations. Water Resources Research, 40, W01106, doi:10.1029/2003WR002540. Schaefli, B., Balin Talamba, D. and Musy, A., (2007). Quantifying hydrological modeling errors through a mixture of normal distributions. Journal of Hydrology, 332, 303-315.
The Influence of Internal Model Variability in GEOS-5 on Interhemispheric CO2 Exchange
NASA Technical Reports Server (NTRS)
Allen, Melissa; Erickson, David; Kendall, Wesley; Fu, Joshua; Ott, Leslie; Pawson, Steven
2012-01-01
An ensemble of eight atmospheric CO2 simulations was completed employing the National Aeronautics and Space Administration (NASA) Goddard Earth Observation System, Version 5 (GEOS-5) for the years 2000-2001, each with initial meteorological conditions corresponding to different days in January 2000 to examine internal model variability. Globally, the model runs show similar concentrations of CO2 for the two years, but in regions of high CO2 concentrations due to fossil fuel emissions, large differences among different model simulations appear. The phasing and amplitude of the CO2 cycle at Northern Hemisphere locations in all of the ensemble members is similar to that of surface observations. In several southern hemisphere locations, however, some of the GEOS-5 model CO2 cycles are out of phase by as much as four months, and large variations occur between the ensemble members. This result indicates that there is large sensitivity to transport in these regions. The differences vary by latitude-the most extreme differences in the Tropics and the least at the South Pole. Examples of these differences among the ensemble members with regard to CO2 uptake and respiration of the terrestrial biosphere and CO2 emissions due to fossil fuel emissions are shown at Cape Grim, Tasmania. Integration-based flow analysis of the atmospheric circulation in the model runs shows widely varying paths of flow into the Tasmania region among the models including sources from North America, South America, South Africa, South Asia and Indonesia. These results suggest that interhemispheric transport can be strongly influenced by internal model variability.
Statistical Methods in Ai: Rare Event Learning Using Associative Rules and Higher-Order Statistics
NASA Astrophysics Data System (ADS)
Iyer, V.; Shetty, S.; Iyengar, S. S.
2015-07-01
Rare event learning has not been actively researched since lately due to the unavailability of algorithms which deal with big samples. The research addresses spatio-temporal streams from multi-resolution sensors to find actionable items from a perspective of real-time algorithms. This computing framework is independent of the number of input samples, application domain, labelled or label-less streams. A sampling overlap algorithm such as Brooks-Iyengar is used for dealing with noisy sensor streams. We extend the existing noise pre-processing algorithms using Data-Cleaning trees. Pre-processing using ensemble of trees using bagging and multi-target regression showed robustness to random noise and missing data. As spatio-temporal streams are highly statistically correlated, we prove that a temporal window based sampling from sensor data streams converges after n samples using Hoeffding bounds. Which can be used for fast prediction of new samples in real-time. The Data-cleaning tree model uses a nonparametric node splitting technique, which can be learned in an iterative way which scales linearly in memory consumption for any size input stream. The improved task based ensemble extraction is compared with non-linear computation models using various SVM kernels for speed and accuracy. We show using empirical datasets the explicit rule learning computation is linear in time and is only dependent on the number of leafs present in the tree ensemble. The use of unpruned trees (t) in our proposed ensemble always yields minimum number (m) of leafs keeping pre-processing computation to n × t log m compared to N2 for Gram Matrix. We also show that the task based feature induction yields higher Qualify of Data (QoD) in the feature space compared to kernel methods using Gram Matrix.
Path integrals and the WKB approximation in loop quantum cosmology
NASA Astrophysics Data System (ADS)
Ashtekar, Abhay; Campiglia, Miguel; Henderson, Adam
2010-12-01
We follow the Feynman procedure to obtain a path integral formulation of loop quantum cosmology starting from the Hilbert space framework. Quantum geometry effects modify the weight associated with each path so that the effective measure on the space of paths is different from that used in the Wheeler-DeWitt theory. These differences introduce some conceptual subtleties in arriving at the WKB approximation. But the approximation is well defined and provides intuition for the differences between loop quantum cosmology and the Wheeler-DeWitt theory from a path integral perspective.
Unsplittable Flow in Paths and Trees and Column-Restricted Packing Integer Programs
NASA Astrophysics Data System (ADS)
Chekuri, Chandra; Ene, Alina; Korula, Nitish
We consider the unsplittable flow problem (UFP) and the closely related column-restricted packing integer programs (CPIPs). In UFP we are given an edge-capacitated graph G = (V,E) and k request pairs R 1, ..., R k , where each R i consists of a source-destination pair (s i ,t i ), a demand d i and a weight w i . The goal is to find a maximum weight subset of requests that can be routed unsplittably in G. Most previous work on UFP has focused on the no-bottleneck case in which the maximum demand of the requests is at most the smallest edge capacity. Inspired by the recent work of Bansal et al. [3] on UFP on a path without the above assumption, we consider UFP on paths as well as trees. We give a simple O(logn) approximation for UFP on trees when all weights are identical; this yields an O(log2 n) approximation for the weighted case. These are the first non-trivial approximations for UFP on trees. We develop an LP relaxation for UFP on paths that has an integrality gap of O(log2 n); previously there was no relaxation with o(n) gap. We also consider UFP in general graphs and CPIPs without the no-bottleneck assumption and obtain new and useful results.
Multiple-instance ensemble learning for hyperspectral images
NASA Astrophysics Data System (ADS)
Ergul, Ugur; Bilgin, Gokhan
2017-10-01
An ensemble framework for multiple-instance (MI) learning (MIL) is introduced for use in hyperspectral images (HSIs) by inspiring the bagging (bootstrap aggregation) method in ensemble learning. Ensemble-based bagging is performed by a small percentage of training samples, and MI bags are formed by a local windowing process with variable window sizes on selected instances. In addition to bootstrap aggregation, random subspace is another method used to diversify base classifiers. The proposed method is implemented using four MIL classification algorithms. The classifier model learning phase is carried out with MI bags, and the estimation phase is performed over single-test instances. In the experimental part of the study, two different HSIs that have ground-truth information are used, and comparative results are demonstrated with state-of-the-art classification methods. In general, the MI ensemble approach produces more compact results in terms of both diversity and error compared to equipollent non-MIL algorithms.
Characterizing RNA ensembles from NMR data with kinematic models
Fonseca, Rasmus; Pachov, Dimitar V.; Bernauer, Julie; van den Bedem, Henry
2014-01-01
Functional mechanisms of biomolecules often manifest themselves precisely in transient conformational substates. Researchers have long sought to structurally characterize dynamic processes in non-coding RNA, combining experimental data with computer algorithms. However, adequate exploration of conformational space for these highly dynamic molecules, starting from static crystal structures, remains challenging. Here, we report a new conformational sampling procedure, KGSrna, which can efficiently probe the native ensemble of RNA molecules in solution. We found that KGSrna ensembles accurately represent the conformational landscapes of 3D RNA encoded by NMR proton chemical shifts. KGSrna resolves motionally averaged NMR data into structural contributions; when coupled with residual dipolar coupling data, a KGSrna ensemble revealed a previously uncharacterized transient excited state of the HIV-1 trans-activation response element stem–loop. Ensemble-based interpretations of averaged data can aid in formulating and testing dynamic, motion-based hypotheses of functional mechanisms in RNAs with broad implications for RNA engineering and therapeutic intervention. PMID:25114056
NASA Astrophysics Data System (ADS)
Booth, B. B. B.; Bernie, D.; McNeall, D.; Hawkins, E.; Caesar, J.; Boulton, C.; Friedlingstein, P.; Sexton, D.
2012-09-01
We compare future changes in global mean temperature in response to different future scenarios which, for the first time, arise from emission driven rather than concentration driven perturbed parameter ensemble of a Global Climate Model (GCM). These new GCM simulations sample uncertainties in atmospheric feedbacks, land carbon cycle, ocean physics and aerosol sulphur cycle processes. We find broader ranges of projected temperature responses arising when considering emission rather than concentration driven simulations (with 10-90 percentile ranges of 1.7 K for the aggressive mitigation scenario up to 3.9 K for the high end business as usual scenario). A small minority of simulations resulting from combinations of strong atmospheric feedbacks and carbon cycle responses show temperature increases in excess of 9 degrees (RCP8.5) and even under aggressive mitigation (RCP2.6) temperatures in excess of 4 K. While the simulations point to much larger temperature ranges for emission driven experiments, they do not change existing expectations (based on previous concentration driven experiments) on the timescale that different sources of uncertainty are important. The new simulations sample a range of future atmospheric concentrations for each emission scenario. Both in case of SRES A1B and the Representative Concentration Pathways (RCPs), the concentration pathways used to drive GCM ensembles lies towards the lower end of our simulated distribution. This design decision (a legecy of previous assessments) is likely to lead concentration driven experiments to under-sample strong feedback responses in concentration driven projections. Our ensemble of emission driven simulations span the global temperature response of other multi-model frameworks except at the low end, where combinations of low climate sensitivity and low carbon cycle feedbacks lead to responses outside our ensemble range. The ensemble simulates a number of high end responses which lie above the CMIP5 carbon cycle range. These high end simulations can be linked to sampling a number of stronger carbon cycle feedbacks and to sampling climate sensitivities above 4.5 K. This latter aspect highlights the priority in identifying real world climate sensitivity constraints which, if achieved, would lead to reductions on the uppper bound of projected global mean temperature change. The ensembles of simulations presented here provides a framework to explore relationships between present day observables and future changes while the large spread of future projected changes, highlights the ongoing need for such work.
NASA Astrophysics Data System (ADS)
Fatichi, S.; Ivanov, V. Y.; Caporali, E.
2013-04-01
This study extends a stochastic downscaling methodology to generation of an ensemble of hourly time series of meteorological variables that express possible future climate conditions at a point-scale. The stochastic downscaling uses general circulation model (GCM) realizations and an hourly weather generator, the Advanced WEather GENerator (AWE-GEN). Marginal distributions of factors of change are computed for several climate statistics using a Bayesian methodology that can weight GCM realizations based on the model relative performance with respect to a historical climate and a degree of disagreement in projecting future conditions. A Monte Carlo technique is used to sample the factors of change from their respective marginal distributions. As a comparison with traditional approaches, factors of change are also estimated by averaging GCM realizations. With either approach, the derived factors of change are applied to the climate statistics inferred from historical observations to re-evaluate parameters of the weather generator. The re-parameterized generator yields hourly time series of meteorological variables that can be considered to be representative of future climate conditions. In this study, the time series are generated in an ensemble mode to fully reflect the uncertainty of GCM projections, climate stochasticity, as well as uncertainties of the downscaling procedure. Applications of the methodology in reproducing future climate conditions for the periods of 2000-2009, 2046-2065 and 2081-2100, using the period of 1962-1992 as the historical baseline are discussed for the location of Firenze (Italy). The inferences of the methodology for the period of 2000-2009 are tested against observations to assess reliability of the stochastic downscaling procedure in reproducing statistics of meteorological variables at different time scales.
Using Support Vector Machine Ensembles for Target Audience Classification on Twitter
Lo, Siaw Ling; Chiong, Raymond; Cornforth, David
2015-01-01
The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space. PMID:25874768
Using support vector machine ensembles for target audience classification on Twitter.
Lo, Siaw Ling; Chiong, Raymond; Cornforth, David
2015-01-01
The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space.
Phipps, Eric T.; D'Elia, Marta; Edwards, Harold C.; ...
2017-04-18
In this study, quantifying simulation uncertainties is a critical component of rigorous predictive simulation. A key component of this is forward propagation of uncertainties in simulation input data to output quantities of interest. Typical approaches involve repeated sampling of the simulation over the uncertain input data, and can require numerous samples when accurately propagating uncertainties from large numbers of sources. Often simulation processes from sample to sample are similar and much of the data generated from each sample evaluation could be reused. We explore a new method for implementing sampling methods that simultaneously propagates groups of samples together in anmore » embedded fashion, which we call embedded ensemble propagation. We show how this approach takes advantage of properties of modern computer architectures to improve performance by enabling reuse between samples, reducing memory bandwidth requirements, improving memory access patterns, improving opportunities for fine-grained parallelization, and reducing communication costs. We describe a software technique for implementing embedded ensemble propagation based on the use of C++ templates and describe its integration with various scientific computing libraries within Trilinos. We demonstrate improved performance, portability and scalability for the approach applied to the simulation of partial differential equations on a variety of CPU, GPU, and accelerator architectures, including up to 131,072 cores on a Cray XK7 (Titan).« less
Eum, Hyung-Il; Gachon, Philippe; Laprise, René
2016-01-01
This study examined the impact of model biases on climate change signals for daily precipitation and for minimum and maximum temperatures. Through the use of multiple climate scenarios from 12 regional climate model simulations, the ensemble mean, and three synthetic simulations generated by a weighting procedure, we investigated intermodel seasonal climate change signals between current and future periods, for both median and extreme precipitation/temperature values. A significant dependence of seasonal climate change signals on the model biases over southern Québec in Canada was detected for temperatures, but not for precipitation. This suggests that the regional temperature change signal is affectedmore » by local processes. Seasonally, model bias affects future mean and extreme values in winter and summer. In addition, potentially large increases in future extremes of temperature and precipitation values were projected. For three synthetic scenarios, systematically less bias and a narrow range of mean change for all variables were projected compared to those of climate model simulations. In addition, synthetic scenarios were found to better capture the spatial variability of extreme cold temperatures than the ensemble mean scenario. Finally, these results indicate that the synthetic scenarios have greater potential to reduce the uncertainty of future climate projections and capture the spatial variability of extreme climate events.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eum, Hyung-Il; Gachon, Philippe; Laprise, René
This study examined the impact of model biases on climate change signals for daily precipitation and for minimum and maximum temperatures. Through the use of multiple climate scenarios from 12 regional climate model simulations, the ensemble mean, and three synthetic simulations generated by a weighting procedure, we investigated intermodel seasonal climate change signals between current and future periods, for both median and extreme precipitation/temperature values. A significant dependence of seasonal climate change signals on the model biases over southern Québec in Canada was detected for temperatures, but not for precipitation. This suggests that the regional temperature change signal is affectedmore » by local processes. Seasonally, model bias affects future mean and extreme values in winter and summer. In addition, potentially large increases in future extremes of temperature and precipitation values were projected. For three synthetic scenarios, systematically less bias and a narrow range of mean change for all variables were projected compared to those of climate model simulations. In addition, synthetic scenarios were found to better capture the spatial variability of extreme cold temperatures than the ensemble mean scenario. Finally, these results indicate that the synthetic scenarios have greater potential to reduce the uncertainty of future climate projections and capture the spatial variability of extreme climate events.« less
Evolutionary Ensemble for In Silico Prediction of Ames Test Mutagenicity
NASA Astrophysics Data System (ADS)
Chen, Huanhuan; Yao, Xin
Driven by new regulations and animal welfare, the need to develop in silico models has increased recently as alternative approaches to safety assessment of chemicals without animal testing. This paper describes a novel machine learning ensemble approach to building an in silico model for the prediction of the Ames test mutagenicity, one of a battery of the most commonly used experimental in vitro and in vivo genotoxicity tests for safety evaluation of chemicals. Evolutionary random neural ensemble with negative correlation learning (ERNE) [1] was developed based on neural networks and evolutionary algorithms. ERNE combines the method of bootstrap sampling on training data with the method of random subspace feature selection to ensure diversity in creating individuals within an initial ensemble. Furthermore, while evolving individuals within the ensemble, it makes use of the negative correlation learning, enabling individual NNs to be trained as accurate as possible while still manage to maintain them as diverse as possible. Therefore, the resulting individuals in the final ensemble are capable of cooperating collectively to achieve better generalization of prediction. The empirical experiment suggest that ERNE is an effective ensemble approach for predicting the Ames test mutagenicity of chemicals.
NASA Astrophysics Data System (ADS)
Bakar, Sumarni Abu; Ibrahim, Milbah
2017-08-01
The shortest path problem is a popular problem in graph theory. It is about finding a path with minimum length between a specified pair of vertices. In any network the weight of each edge is usually represented in a form of crisp real number and subsequently the weight is used in the calculation of shortest path problem using deterministic algorithms. However, due to failure, uncertainty is always encountered in practice whereby the weight of edge of the network is uncertain and imprecise. In this paper, a modified algorithm which utilized heuristic shortest path method and fuzzy approach is proposed for solving a network with imprecise arc length. Here, interval number and triangular fuzzy number in representing arc length of the network are considered. The modified algorithm is then applied to a specific example of the Travelling Salesman Problem (TSP). Total shortest distance obtained from this algorithm is then compared with the total distance obtained from traditional nearest neighbour heuristic algorithm. The result shows that the modified algorithm can provide not only on the sequence of visited cities which shown to be similar with traditional approach but it also provides a good measurement of total shortest distance which is lesser as compared to the total shortest distance calculated using traditional approach. Hence, this research could contribute to the enrichment of methods used in solving TSP.
Electrophoretic sample insertion. [device for uniformly distributing samples in flow path
NASA Technical Reports Server (NTRS)
Mccreight, L. R. (Inventor)
1974-01-01
Two conductive screens located in the flow path of an electrophoresis sample separation apparatus are charged electrically. The sample is introduced between the screens, and the charge is sufficient to disperse and hold the samples across the screens. When the charge is terminated, the samples are uniformly distributed in the flow path. Additionally, a first separation by charged properties has been accomplished.
NASA Astrophysics Data System (ADS)
Flores, A. N.; Entekhabi, D.; Bras, R. L.
2007-12-01
Soil hydraulic and thermal properties (SHTPs) affect both the rate of moisture redistribution in the soil column and the volumetric soil water capacity. Adequately constraining these properties through field and lab analysis to parameterize spatially-distributed hydrology models is often prohibitively expensive. Because SHTPs vary significantly at small spatial scales individual soil samples are also only reliably indicative of local conditions, and these properties remain a significant source of uncertainty in soil moisture and temperature estimation. In ensemble-based soil moisture data assimilation, uncertainty in the model-produced prior estimate due to associated uncertainty in SHTPs must be taken into account to avoid under-dispersive ensembles. To treat SHTP uncertainty for purposes of supplying inputs to a distributed watershed model we use the restricted pairing (RP) algorithm, an extension of Latin Hypercube (LH) sampling. The RP algorithm generates an arbitrary number of SHTP combinations by sampling the appropriate marginal distributions of the individual soil properties using the LH approach, while imposing a target rank correlation among the properties. A previously-published meta- database of 1309 soils representing 12 textural classes is used to fit appropriate marginal distributions to the properties and compute the target rank correlation structure, conditioned on soil texture. Given categorical soil textures, our implementation of the RP algorithm generates an arbitrarily-sized ensemble of realizations of the SHTPs required as input to the TIN-based Realtime Integrated Basin Simulator with vegetation dynamics (tRIBS+VEGGIE) distributed parameter ecohydrology model. Soil moisture ensembles simulated with RP- generated SHTPs exhibit less variance than ensembles simulated with SHTPs generated by a scheme that neglects correlation among properties. Neglecting correlation among SHTPs can lead to physically unrealistic combinations of parameters that exhibit implausible hydrologic behavior when input to the tRIBS+VEGGIE model.
Hou, Zeyu; Lu, Wenxi; Xue, Haibo; Lin, Jin
2017-08-01
Surrogate-based simulation-optimization technique is an effective approach for optimizing the surfactant enhanced aquifer remediation (SEAR) strategy for clearing DNAPLs. The performance of the surrogate model, which is used to replace the simulation model for the aim of reducing computation burden, is the key of corresponding researches. However, previous researches are generally based on a stand-alone surrogate model, and rarely make efforts to improve the approximation accuracy of the surrogate model to the simulation model sufficiently by combining various methods. In this regard, we present set pair analysis (SPA) as a new method to build ensemble surrogate (ES) model, and conducted a comparative research to select a better ES modeling pattern for the SEAR strategy optimization problems. Surrogate models were developed using radial basis function artificial neural network (RBFANN), support vector regression (SVR), and Kriging. One ES model is assembling RBFANN model, SVR model, and Kriging model using set pair weights according their performance, and the other is assembling several Kriging (the best surrogate modeling method of three) models built with different training sample datasets. Finally, an optimization model, in which the ES model was embedded, was established to obtain the optimal remediation strategy. The results showed the residuals of the outputs between the best ES model and simulation model for 100 testing samples were lower than 1.5%. Using an ES model instead of the simulation model was critical for considerably reducing the computation time of simulation-optimization process and maintaining high computation accuracy simultaneously. Copyright © 2017 Elsevier B.V. All rights reserved.
Muhlbaier, Michael D; Topalis, Apostolos; Polikar, Robi
2009-01-01
We have previously introduced an incremental learning algorithm Learn(++), which learns novel information from consecutive data sets by generating an ensemble of classifiers with each data set, and combining them by weighted majority voting. However, Learn(++) suffers from an inherent "outvoting" problem when asked to learn a new class omega(new) introduced by a subsequent data set, as earlier classifiers not trained on this class are guaranteed to misclassify omega(new) instances. The collective votes of earlier classifiers, for an inevitably incorrect decision, then outweigh the votes of the new classifiers' correct decision on omega(new) instances--until there are enough new classifiers to counteract the unfair outvoting. This forces Learn(++) to generate an unnecessarily large number of classifiers. This paper describes Learn(++).NC, specifically designed for efficient incremental learning of multiple new classes using significantly fewer classifiers. To do so, Learn (++).NC introduces dynamically weighted consult and vote (DW-CAV), a novel voting mechanism for combining classifiers: individual classifiers consult with each other to determine which ones are most qualified to classify a given instance, and decide how much weight, if any, each classifier's decision should carry. Experiments on real-world problems indicate that the new algorithm performs remarkably well with substantially fewer classifiers, not only as compared to its predecessor Learn(++), but also as compared to several other algorithms recently proposed for similar problems.
Constrained proper sampling of conformations of transition state ensemble of protein folding
Lin, Ming; Zhang, Jian; Lu, Hsiao-Mei; Chen, Rong; Liang, Jie
2011-01-01
Characterizing the conformations of protein in the transition state ensemble (TSE) is important for studying protein folding. A promising approach pioneered by Vendruscolo [Nature (London) 409, 641 (2001)] to study TSE is to generate conformations that satisfy all constraints imposed by the experimentally measured ϕ values that provide information about the native likeness of the transition states. Faísca [J. Chem. Phys. 129, 095108 (2008)] generated conformations of TSE based on the criterion that, starting from a TS conformation, the probabilities of folding and unfolding are about equal through Markov Chain Monte Carlo (MCMC) simulations. In this study, we use the technique of constrained sequential Monte Carlo method [Lin , J. Chem. Phys. 129, 094101 (2008); Zhang Proteins 66, 61 (2007)] to generate TSE conformations of acylphosphatase of 98 residues that satisfy the ϕ-value constraints, as well as the criterion that each conformation has a folding probability of 0.5 by Monte Carlo simulations. We adopt a two stage process and first generate 5000 contact maps satisfying the ϕ-value constraints. Each contact map is then used to generate 1000 properly weighted conformations. After clustering similar conformations, we obtain a set of properly weighted samples of 4185 candidate clusters. Representative conformation of each of these cluster is then selected and 50 runs of Markov chain Monte Carlo (MCMC) simulation are carried using a regrowth move set. We then select a subset of 1501 conformations that have equal probabilities to fold and to unfold as the set of TSE. These 1501 samples characterize well the distribution of transition state ensemble conformations of acylphosphatase. Compared with previous studies, our approach can access much wider conformational space and can objectively generate conformations that satisfy the ϕ-value constraints and the criterion of 0.5 folding probability without bias. In contrast to previous studies, our results show that transition state conformations are very diverse and are far from nativelike when measured in cartesian root-mean-square deviation (cRMSD): the average cRMSD between TSE conformations and the native structure is 9.4 Å for this short protein, instead of 6 Å reported in previous studies. In addition, we found that the average fraction of native contacts in the TSE is 0.37, with enrichment in native-like β-sheets and a shortage of long range contacts, suggesting such contacts form at a later stage of folding. We further calculate the first passage time of folding of TSE conformations through calculation of physical time associated with the regrowth moves in MCMC simulation through mapping such moves to a Markovian state model, whose transition time was obtained by Langevin dynamics simulations. Our results indicate that despite the large structural diversity of the TSE, they are characterized by similar folding time. Our approach is general and can be used to study TSE in other macromolecules. PMID:21341875
Multiphysics superensemble forecast applied to Mediterranean heavy precipitation situations
NASA Astrophysics Data System (ADS)
Vich, M.; Romero, R.
2010-11-01
The high-impact precipitation events that regularly affect the western Mediterranean coastal regions are still difficult to predict with the current prediction systems. Bearing this in mind, this paper focuses on the superensemble technique applied to the precipitation field. Encouraged by the skill shown by a previous multiphysics ensemble prediction system applied to western Mediterranean precipitation events, the superensemble is fed with this ensemble. The training phase of the superensemble contributes to the actual forecast with weights obtained by comparing the past performance of the ensemble members and the corresponding observed states. The non-hydrostatic MM5 mesoscale model is used to run the multiphysics ensemble. Simulations are performed with a 22.5 km resolution domain (Domain 1 in http://mm5forecasts.uib.es) nested in the ECMWF forecast fields. The period between September and December 2001 is used to train the superensemble and a collection of 19~MEDEX cyclones is used to test it. The verification procedure involves testing the superensemble performance and comparing it with that of the poor-man and bias-corrected ensemble mean and the multiphysic EPS control member. The results emphasize the need of a well-behaved training phase to obtain good results with the superensemble technique. A strategy to obtain this improved training phase is already outlined.
Smith, Kyle K G; Poulsen, Jens Aage; Nyman, Gunnar; Rossky, Peter J
2015-06-28
We develop two classes of quasi-classical dynamics that are shown to conserve the initial quantum ensemble when used in combination with the Feynman-Kleinert approximation of the density operator. These dynamics are used to improve the Feynman-Kleinert implementation of the classical Wigner approximation for the evaluation of quantum time correlation functions known as Feynman-Kleinert linearized path-integral. As shown, both classes of dynamics are able to recover the exact classical and high temperature limits of the quantum time correlation function, while a subset is able to recover the exact harmonic limit. A comparison of the approximate quantum time correlation functions obtained from both classes of dynamics is made with the exact results for the challenging model problems of the quartic and double-well potentials. It is found that these dynamics provide a great improvement over the classical Wigner approximation, in which purely classical dynamics are used. In a special case, our first method becomes identical to centroid molecular dynamics.
Unimodular lattice triangulations as small-world and scale-free random graphs
NASA Astrophysics Data System (ADS)
Krüger, B.; Schmidt, E. M.; Mecke, K.
2015-02-01
Real-world networks, e.g., the social relations or world-wide-web graphs, exhibit both small-world and scale-free behaviour. We interpret lattice triangulations as planar graphs by identifying triangulation vertices with graph nodes and one-dimensional simplices with edges. Since these triangulations are ergodic with respect to a certain Pachner flip, applying different Monte Carlo simulations enables us to calculate average properties of random triangulations, as well as canonical ensemble averages, using an energy functional that is approximately the variance of the degree distribution. All considered triangulations have clustering coefficients comparable with real-world graphs; for the canonical ensemble there are inverse temperatures with small shortest path length independent of system size. Tuning the inverse temperature to a quasi-critical value leads to an indication of scale-free behaviour for degrees k≥slant 5. Using triangulations as a random graph model can improve the understanding of real-world networks, especially if the actual distance of the embedded nodes becomes important.
NASA Astrophysics Data System (ADS)
Nussbaumer, Raphaël; Gloaguen, Erwan; Mariéthoz, Grégoire; Holliger, Klaus
2016-04-01
Bayesian sequential simulation (BSS) is a powerful geostatistical technique, which notably has shown significant potential for the assimilation of datasets that are diverse with regard to the spatial resolution and their relationship. However, these types of applications of BSS require a large number of realizations to adequately explore the solution space and to assess the corresponding uncertainties. Moreover, such simulations generally need to be performed on very fine grids in order to adequately exploit the technique's potential for characterizing heterogeneous environments. Correspondingly, the computational cost of BSS algorithms in their classical form is very high, which so far has limited an effective application of this method to large models and/or vast datasets. In this context, it is also important to note that the inherent assumption regarding the independence of the considered datasets is generally regarded as being too strong in the context of sequential simulation. To alleviate these problems, we have revisited the classical implementation of BSS and incorporated two key features to increase the computational efficiency. The first feature is a combined quadrant spiral - superblock search, which targets run-time savings on large grids and adds flexibility with regard to the selection of neighboring points using equal directional sampling and treating hard data and previously simulated points separately. The second feature is a constant path of simulation, which enhances the efficiency for multiple realizations. We have also modified the aggregation operator to be more flexible with regard to the assumption of independence of the considered datasets. This is achieved through log-linear pooling, which essentially allows for attributing weights to the various data components. Finally, a multi-grid simulating path was created to enforce large-scale variance and to allow for adapting parameters, such as, for example, the log-linear weights or the type of simulation path at various scales. The newly implemented search method for kriging reduces the computational cost from an exponential dependence with regard to the grid size in the original algorithm to a linear relationship, as each neighboring search becomes independent from the grid size. For the considered examples, our results show a sevenfold reduction in run time for each additional realization when a constant simulation path is used. The traditional criticism that constant path techniques introduce a bias to the simulations was explored and our findings do indeed reveal a minor reduction in the diversity of the simulations. This bias can, however, be largely eliminated by changing the path type at different scales through the use of the multi-grid approach. Finally, we show that adapting the aggregation weight at each scale considered in our multi-grid approach allows for reproducing both the variogram and histogram, and the spatial trend of the underlying data.
Enhancing Flood Prediction Reliability Using Bayesian Model Averaging
NASA Astrophysics Data System (ADS)
Liu, Z.; Merwade, V.
2017-12-01
Uncertainty analysis is an indispensable part of modeling the hydrology and hydrodynamics of non-idealized environmental systems. Compared to reliance on prediction from one model simulation, using on ensemble of predictions that consider uncertainty from different sources is more reliable. In this study, Bayesian model averaging (BMA) is applied to Black River watershed in Arkansas and Missouri by combining multi-model simulations to get reliable deterministic water stage and probabilistic inundation extent predictions. The simulation ensemble is generated from 81 LISFLOOD-FP subgrid model configurations that include uncertainty from channel shape, channel width, channel roughness and discharge. Model simulation outputs are trained with observed water stage data during one flood event, and BMA prediction ability is validated for another flood event. Results from this study indicate that BMA does not always outperform all members in the ensemble, but it provides relatively robust deterministic flood stage predictions across the basin. Station based BMA (BMA_S) water stage prediction has better performance than global based BMA (BMA_G) prediction which is superior to the ensemble mean prediction. Additionally, high-frequency flood inundation extent (probability greater than 60%) in BMA_G probabilistic map is more accurate than the probabilistic flood inundation extent based on equal weights.
Gibbs Ensemble Simulations of the Solvent Swelling of Polymer Films
NASA Astrophysics Data System (ADS)
Gartner, Thomas; Epps, Thomas, III; Jayaraman, Arthi
Solvent vapor annealing (SVA) is a useful technique to tune the morphology of block polymer, polymer blend, and polymer nanocomposite films. Despite SVA's utility, standardized SVA protocols have not been established, partly due to a lack of fundamental knowledge regarding the interplay between the polymer(s), solvent, substrate, and free-surface during solvent annealing and evaporation. An understanding of how to tune polymer film properties in a controllable manner through SVA processes is needed. Herein, the thermodynamic implications of the presence of solvent in the swollen polymer film is explored through two alternative Gibbs ensemble simulation methods that we have developed and extended: Gibbs ensemble molecular dynamics (GEMD) and hybrid Monte Carlo (MC)/molecular dynamics (MD). In this poster, we will describe these simulation methods and demonstrate their application to polystyrene films swollen by toluene and n-hexane. Polymer film swelling experiments, Gibbs ensemble molecular simulations, and polymer reference interaction site model (PRISM) theory are combined to calculate an effective Flory-Huggins χ (χeff) for polymer-solvent mixtures. The effects of solvent chemistry, solvent content, polymer molecular weight, and polymer architecture on χeff are examined, providing a platform to control and understand the thermodynamics of polymer film swelling.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Adachi, Satoshi; Toda, Mikito; Kubotani, Hiroto
The fixed-trace ensemble of random complex matrices is the fundamental model that excellently describes the entanglement in the quantum states realized in a coupled system by its strongly chaotic dynamical evolution [see H. Kubotani, S. Adachi, M. Toda, Phys. Rev. Lett. 100 (2008) 240501]. The fixed-trace ensemble fully takes into account the conservation of probability for quantum states. The present paper derives for the first time the exact analytical formula of the one-body distribution function of singular values of random complex matrices in the fixed-trace ensemble. The distribution function of singular values (i.e. Schmidt eigenvalues) of a quantum state ismore » so important since it describes characteristics of the entanglement in the state. The derivation of the exact analytical formula utilizes two recent achievements in mathematics, which appeared in 1990s. The first is the Kaneko theory that extends the famous Selberg integral by inserting a hypergeometric type weight factor into the integrand to obtain an analytical formula for the extended integral. The second is the Petkovsek-Wilf-Zeilberger theory that calculates definite hypergeometric sums in a closed form.« less
Ensemble stacking mitigates biases in inference of synaptic connectivity.
Chambers, Brendan; Levy, Maayan; Dechery, Joseph B; MacLean, Jason N
2018-01-01
A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches.
NASA Astrophysics Data System (ADS)
Janicka, Lucja; Szczepanik, Dominika; Borek, Karolina; Heese, Birgit; Stachlewska, Iwona S.
2018-04-01
The aerosol layers of different origin, suspended in the atmosphere on 9-11 August 2015 were observed with the PollyXT-UW lidar in Warsaw, Poland. The HYSPLIT ensemble backward trajectories indicate that the observed air-masses attribute to a few different sources, among others, possible transport paths from Ukraine, Slovakia, and Africa. In this paper, we attempt to analyse and discuss the properties of aerosol particles of different origin that were suspended over Warsaw during this event.
Otero, Cassi L.
2007-01-01
The U.S. Geological Survey, in cooperation with the San Antonio Water System, conducted a 4-year study during 2002?06 to identify major flow paths in the Edwards aquifer in northeastern Bexar and southern Comal Counties (study area). In the study area, faulting directs ground water into three hypothesized flow paths that move water, generally, from the southwest to the northeast. These flow paths are identified as the southern Comal flow path, the central Comal flow path, and the northern Comal flow path. Statistical correlations between water levels for six observation wells and between the water levels and discharges from Comal Springs and Hueco Springs yielded evidence for the hypothesized flow paths. Strong linear correlations were evident between the datasets from wells and springs within the same flow path and the datasets from wells in areas where flow between flow paths was suspected. Geochemical data (major ions, stable isotopes, sulfur hexafluoride, and tritium and helium) were used in graphical analyses to obtain evidence of the flow path from which wells or springs derive water. Major-ion geochemistry in samples from selected wells and springs showed relatively little variation. Samples from the southern Comal flow path were characterized by relatively high sulfate and chloride concentrations, possibly indicating that the water in the flow path was mixing with small amounts of saline water from the freshwater/saline-water transition zone. Samples from the central Comal flow path yielded the most varied major-ion geochemistry of the three hypothesized flow paths. Central Comal flow path samples were characterized, in general, by high calcium concentrations and low magnesium concentrations. Samples from the northern Comal flow path were characterized by relatively low sulfate and chloride concentrations and high magnesium concentrations. The high magnesium concentrations characteristic of northern Comal flow path samples from the recharge zone in Comal County might indicate that water from the Trinity aquifer is entering the Edwards aquifer in the subsurface. A graph of the relation between the stable isotopes deuterium and delta-18 oxygen showed that, except for samples collected following an unusually intense rain storm, there was not much variation in stable isotope values among the flow paths. In the study area deuterium ranged from -36.00 to -20.89 per mil and delta-18 oxygen ranged from -6.03 to -3.70 per mil. Excluding samples collected following the intense rain storm, the deuterium range in the study area was -33.00 to -20.89 per mil and the delta-18 oxygen range was -4.60 to -3.70 per mil. Two ground-water age-dating techniques, sulfur hexafluoride concentrations and tritium/helium-3 isotope ratios, were used to compute apparent ages (time since recharge occurred) of water samples collected in the study area. In general, the apparent ages computed by the two methods do not seem to indicate direction of flow. Apparent ages computed for water samples in northeastern Bexar and southern Comal Counties do not vary greatly except for some very young water in the recharge zone in central Comal County.
A comparison of resampling schemes for estimating model observer performance with small ensembles
NASA Astrophysics Data System (ADS)
Elshahaby, Fatma E. A.; Jha, Abhinav K.; Ghaly, Michael; Frey, Eric C.
2017-09-01
In objective assessment of image quality, an ensemble of images is used to compute the 1st and 2nd order statistics of the data. Often, only a finite number of images is available, leading to the issue of statistical variability in numerical observer performance. Resampling-based strategies can help overcome this issue. In this paper, we compared different combinations of resampling schemes (the leave-one-out (LOO) and the half-train/half-test (HT/HT)) and model observers (the conventional channelized Hotelling observer (CHO), channelized linear discriminant (CLD) and channelized quadratic discriminant). Observer performance was quantified by the area under the ROC curve (AUC). For a binary classification task and for each observer, the AUC value for an ensemble size of 2000 samples per class served as a gold standard for that observer. Results indicated that each observer yielded a different performance depending on the ensemble size and the resampling scheme. For a small ensemble size, the combination [CHO, HT/HT] had more accurate rankings than the combination [CHO, LOO]. Using the LOO scheme, the CLD and CHO had similar performance for large ensembles. However, the CLD outperformed the CHO and gave more accurate rankings for smaller ensembles. As the ensemble size decreased, the performance of the [CHO, LOO] combination seriously deteriorated as opposed to the [CLD, LOO] combination. Thus, it might be desirable to use the CLD with the LOO scheme when smaller ensemble size is available.
Creating ensembles of oblique decision trees with evolutionary algorithms and sampling
Cantu-Paz, Erick [Oakland, CA; Kamath, Chandrika [Tracy, CA
2006-06-13
A decision tree system that is part of a parallel object-oriented pattern recognition system, which in turn is part of an object oriented data mining system. A decision tree process includes the step of reading the data. If necessary, the data is sorted. A potential split of the data is evaluated according to some criterion. An initial split of the data is determined. The final split of the data is determined using evolutionary algorithms and statistical sampling techniques. The data is split. Multiple decision trees are combined in ensembles.
Classification of Odours for Mobile Robots Using an Ensemble of Linear Classifiers
NASA Astrophysics Data System (ADS)
Trincavelli, Marco; Coradeschi, Silvia; Loutfi, Amy
2009-05-01
This paper investigates the classification of odours using an electronic nose mounted on a mobile robot. The samples are collected as the robot explores the environment. Under such conditions, the sensor response differs from typical three phase sampling processes. In this paper, we focus particularly on the classification problem and how it is influenced by the movement of the robot. To cope with these influences, an algorithm consisting of an ensemble of classifiers is presented. Experimental results show that this algorithm increases classification performance compared to other traditional classification methods.
The Relationship between English Language Learner Status and Music Ensemble Participation
ERIC Educational Resources Information Center
Lorah, Julie A.; Sanders, Elizabeth A.; Morrison, Steven J.
2014-01-01
Authors of previous research have reported that U.S. English language learner (ELL) students participate in school-sponsored music ensembles (band, orchestra, and choir) at a lower rate than their native-English-speaking peers (non-ELLs). The current study examined this phenomenon using a nationally representative sample of U.S. 10th graders (14-…
NASA Astrophysics Data System (ADS)
Douglas, Jack
2014-03-01
One of the things that puzzled me when I was a PhD student working under Karl Freed was the curious unity between the theoretical descriptions of excluded volume interactions in polymers, the hydrodynamic properties of polymers in solution, and the critical properties of fluid mixtures, gases and diverse other materials (magnets, superfluids,etc.) when these problems were formally expressed in terms of Wiener path integration and the interactions treated through a combination of epsilon expansion and renormalization group (RG) theory. It seemed that only the interaction labels changed from one problem to the other. What do these problems have in common? Essential clues to these interrelations became apparent when Karl Freed, myself and Shi-Qing Wang together began to study polymers interacting with hyper-surfaces of continuously variable dimension where the Feynman perturbation expansions could be performed through infinite order so that we could really understand what the RG theory was doing. It is evidently simply a particular method for resuming perturbation theory, and former ambiguities no longer existed. An integral equation extension of this type of exact calculation to ``surfaces'' of arbitrary fixed shape finally revealed the central mathematical object that links these diverse physical models- the capacity of polymer chains, whose value vanishes at the critical dimension of 4 and whose magnitude is linked to the friction coefficient of polymer chains, the virial coefficient of polymers and the 4-point function of the phi-4 field theory,...Once this central object was recognized, it then became possible solve diverse problems in material science through the calculation of capacity, and related ``virials'' properties, through Monte Carlo sampling of random walk paths. The essential ideas of this computational method are discussed and some applications given to non-trivial problems: nanotubes treated as either rigid rods or ensembles worm-like chains having finite cross-section, DNA, nanoparticles with grafted chain layers and knotted polymers. The path-integration method, which grew up from research in Karl Freed's group, is evidently a powerful tool for computing basic transport properties of complex-shaped objects and should find increasing application in polymer science, nanotechnological applications and biology.
NASA Astrophysics Data System (ADS)
Kuijlaars, A. B. J.
2001-08-01
The asymptotic behavior of polynomials that are orthogonal with respect to a slowly decaying weight is very different from the asymptotic behavior of polynomials that are orthogonal with respect to a Freud-type weight. While the latter has been extensively studied, much less is known about the former. Following an earlier investigation into the zero behavior, we study here the asymptotics of the density of states in a unitary ensemble of random matrices with a slowly decaying weight. This measure is also naturally connected with the orthogonal polynomials. It is shown that, after suitable rescaling, the weak limit is the same as the weak limit of the rescaled zeros.
Large Scale Crop Classification in Ukraine using Multi-temporal Landsat-8 Images with Missing Data
NASA Astrophysics Data System (ADS)
Kussul, N.; Skakun, S.; Shelestov, A.; Lavreniuk, M. S.
2014-12-01
At present, there are no globally available Earth observation (EO) derived products on crop maps. This issue is being addressed within the Sentinel-2 for Agriculture initiative where a number of test sites (including from JECAM) participate to provide coherent protocols and best practices for various global agriculture systems, and subsequently crop maps from Sentinel-2. One of the problems in dealing with optical images for large territories (more than 10,000 sq. km) is the presence of clouds and shadows that result in having missing values in data sets. In this abstract, a new approach to classification of multi-temporal optical satellite imagery with missing data due to clouds and shadows is proposed. First, self-organizing Kohonen maps (SOMs) are used to restore missing pixel values in a time series of satellite imagery. SOMs are trained for each spectral band separately using non-missing values. Missing values are restored through a special procedure that substitutes input sample's missing components with neuron's weight coefficients. After missing data restoration, a supervised classification is performed for multi-temporal satellite images. For this, an ensemble of neural networks, in particular multilayer perceptrons (MLPs), is proposed. Ensembling of neural networks is done by the technique of average committee, i.e. to calculate the average class probability over classifiers and select the class with the highest average posterior probability for the given input sample. The proposed approach is applied for large scale crop classification using multi temporal Landsat-8 images for the JECAM test site in Ukraine [1-2]. It is shown that ensemble of MLPs provides better performance than a single neural network in terms of overall classification accuracy and kappa coefficient. The obtained classification map is also validated through estimated crop and forest areas and comparison to official statistics. 1. A.Yu. Shelestov et al., "Geospatial information system for agricultural monitoring," Cybernetics Syst. Anal., vol. 49, no. 1, pp. 124-132, 2013. 2. J. Gallego et al., "Efficiency Assessment of Different Approaches to Crop Classification Based on Satellite and Ground Observations," J. Autom. Inform. Scie., vol. 44, no. 5, pp. 67-80, 2012.
Cheng, Erika R; Park, Hyojun; Wisk, Lauren E; Mandell, Kara C; Wakeel, Fathima; Litzelman, Kristin; Chatterjee, Debanjana; Witt, Whitney P
2016-03-01
The life course perspective suggests a pathway may exist among maternal exposure to stressful life events prior to conception (PSLEs), infant birth weight and subsequent offspring health, whereby PSLEs are part of a 'chains-of-risk' that set children on a certain health pathway. No prior study has examined the link between PSLEs and offspring health in a nationally representative sample of US mothers and their children. We used longitudinal, nationally representative data to evaluate the relation between maternal exposure to PSLEs and subsequent measures of infant and toddler health, taking both maternal and obstetric characteristics into account. We examined 6900 mother-child dyads participating in 2 waves of the nationally representative Early Childhood Longitudinal Study-Birth Cohort (n=6900). Infant and toddler health outcomes assessed at 9 and 24 months included overall health status, special healthcare needs and severe health conditions. Adjusted path analyses examined associations between PSLEs, birth weight and child health outcomes. In adjusted analyses, PSLEs increased the risk for very low birth weight (VLBW, <1500 g), which, in turn, predicted poor health at both 9 and 24 months of age. Path analyses demonstrated that PSLEs had small indirect effects on children's subsequent health that operated through VLBW. Our analysis suggests a chains-of-risk model in which women's exposure to PSLEs increases the risk for giving birth to a VLBW infant, which, in turn, adversely affects infant and toddler health. Addressing women's preconception health may have important downstream benefits for their children, although more research is needed to replicate these findings. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Hybrid Data Assimilation without Ensemble Filtering
NASA Technical Reports Server (NTRS)
Todling, Ricardo; Akkraoui, Amal El
2014-01-01
The Global Modeling and Assimilation Office is preparing to upgrade its three-dimensional variational system to a hybrid approach in which the ensemble is generated using a square-root ensemble Kalman filter (EnKF) and the variational problem is solved using the Grid-point Statistical Interpolation system. As in most EnKF applications, we found it necessary to employ a combination of multiplicative and additive inflations, to compensate for sampling and modeling errors, respectively and, to maintain the small-member ensemble solution close to the variational solution; we also found it necessary to re-center the members of the ensemble about the variational analysis. During tuning of the filter we have found re-centering and additive inflation to play a considerably larger role than expected, particularly in a dual-resolution context when the variational analysis is ran at larger resolution than the ensemble. This led us to consider a hybrid strategy in which the members of the ensemble are generated by simply converting the variational analysis to the resolution of the ensemble and applying additive inflation, thus bypassing the EnKF. Comparisons of this, so-called, filter-free hybrid procedure with an EnKF-based hybrid procedure and a control non-hybrid, traditional, scheme show both hybrid strategies to provide equally significant improvement over the control; more interestingly, the filter-free procedure was found to give qualitatively similar results to the EnKF-based procedure.
Enhanced conformational sampling to visualize a free-energy landscape of protein complex formation.
Iida, Shinji; Nakamura, Haruki; Higo, Junichi
2016-06-15
We introduce various, recently developed, generalized ensemble methods, which are useful to sample various molecular configurations emerging in the process of protein-protein or protein-ligand binding. The methods introduced here are those that have been or will be applied to biomolecular binding, where the biomolecules are treated as flexible molecules expressed by an all-atom model in an explicit solvent. Sampling produces an ensemble of conformations (snapshots) that are thermodynamically probable at room temperature. Then, projection of those conformations to an abstract low-dimensional space generates a free-energy landscape. As an example, we show a landscape of homo-dimer formation of an endothelin-1-like molecule computed using a generalized ensemble method. The lowest free-energy cluster at room temperature coincided precisely with the experimentally determined complex structure. Two minor clusters were also found in the landscape, which were largely different from the native complex form. Although those clusters were isolated at room temperature, with rising temperature a pathway emerged linking the lowest and second-lowest free-energy clusters, and a further temperature increment connected all the clusters. This exemplifies that the generalized ensemble method is a powerful tool for computing the free-energy landscape, by which one can discuss the thermodynamic stability of clusters and the temperature dependence of the cluster networks. © 2016 The Author(s).
Force-field functor theory: classical force-fields which reproduce equilibrium quantum distributions
Babbush, Ryan; Parkhill, John; Aspuru-Guzik, Alán
2013-01-01
Feynman and Hibbs were the first to variationally determine an effective potential whose associated classical canonical ensemble approximates the exact quantum partition function. We examine the existence of a map between the local potential and an effective classical potential which matches the exact quantum equilibrium density and partition function. The usefulness of such a mapping rests in its ability to readily improve Born-Oppenheimer potentials for use with classical sampling. We show that such a map is unique and must exist. To explore the feasibility of using this result to improve classical molecular mechanics, we numerically produce a map from a library of randomly generated one-dimensional potential/effective potential pairs then evaluate its performance on independent test problems. We also apply the map to simulate liquid para-hydrogen, finding that the resulting radial pair distribution functions agree well with path integral Monte Carlo simulations. The surprising accessibility and transferability of the technique suggest a quantitative route to adapting Born-Oppenheimer potentials, with a motivation similar in spirit to the powerful ideas and approximations of density functional theory. PMID:24790954
Cartographic modeling of snow avalanche path location within Glacier National Park, Montana
NASA Technical Reports Server (NTRS)
Walsh, Stephen J.; Brown, Daniel G.; Bian, Ling; Butler, David R.
1990-01-01
Geographic information system (GIS) techniques were applied to the study of snow-avalanche path location within Glacier National Park, Montana. Aerial photointerpretation and field surveys confirmed the location of 121 avalanche paths within the selected study area. Spatial and nonspatial information on each path were integrated using the ARC/INFO GIS. Lithologic, structural, hydrographic, topographic, and land-cover impacts on path location were analyzed. All path frequencies within variable classes were normalized by the area of class occurrence relative to the total area of the study area and were added to the morphometric information contained within INFO tables. The normalized values for each GIS coverage were used to cartographically model, by means of composite factor weightings, avalanche path locations.
Sampling-based ensemble segmentation against inter-operator variability
NASA Astrophysics Data System (ADS)
Huo, Jing; Okada, Kazunori; Pope, Whitney; Brown, Matthew
2011-03-01
Inconsistency and a lack of reproducibility are commonly associated with semi-automated segmentation methods. In this study, we developed an ensemble approach to improve reproducibility and applied it to glioblastoma multiforme (GBM) brain tumor segmentation on T1-weigted contrast enhanced MR volumes. The proposed approach combines samplingbased simulations and ensemble segmentation into a single framework; it generates a set of segmentations by perturbing user initialization and user-specified internal parameters, then fuses the set of segmentations into a single consensus result. Three combination algorithms were applied: majority voting, averaging and expectation-maximization (EM). The reproducibility of the proposed framework was evaluated by a controlled experiment on 16 tumor cases from a multicenter drug trial. The ensemble framework had significantly better reproducibility than the individual base Otsu thresholding method (p<.001).
Body weight, shame, guilt and oral health: a path analysis model in undergraduate students.
Dumitrescu, Alexandrina L; Dogaru, Carmen Beatrice; Duţă, Carmen; Manolescu, B
2011-01-01
The purpose of the present study was to answer the question of whether experiences of shame, guilt and body investment can explain such the association between BMI, oral health behaviours and status in an undergraduate student population-based sample. The study was performed on a sample of 150 first year medical students (19.62 +/- 2.62 years old). Data were collected through a self-administered questionnaire, Weight- and Body-Related Shame and Guilt Scale and Body Investment Scale. 61.3% of students were of normal weight, 21.3% were underweight and 11.3% were overweight. Statistically significant differences were observed between males and females regarding the body mass index (P < 0.0001) and WEB-shame (P < 0.0001). Among females, statically significant higher values of WEB-Shame, WEB-Guilt and lower levels of Body investment were noted among normal weight compared with under-weight students (P < 0.05). The normal-weight female and underweight participants reported statistically significant different frequency of gingival involvement (P < 0.05). Among males, WEB-S was correlated with satisfaction by appearance of own teeth, current extracted teeth and self-reported gum bleeding, while WEB-G, self-reported current extracted teeth, toothbrushing and mouthrinse frequency were also correlated. Among females, WEB-S was correlated with flossing and dental visit frequency. The structural equation model demonstrated a good fit among female students but not among males. These findings highlight the importance of targeting and understanding the realm of body-related self-conscious emotions and the associated links to regulations and health investment behavior.
Quantifying selective alignment of ensemble nitrogen-vacancy centers in (111) diamond
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tahara, Kosuke; Ozawa, Hayato; Iwasaki, Takayuki
2015-11-09
Selective alignment of nitrogen-vacancy (NV) centers in diamond is an important technique towards its applications. Quantification of the alignment ratio is necessary to design the optimized diamond samples. However, this is not a straightforward problem for dense ensemble of the NV centers. We estimate the alignment ratio of ensemble NV centers along the [111] direction in (111) diamond by optically detected magnetic resonance measurements. Diamond films deposited by N{sub 2} doped chemical vapor deposition have NV center densities over 1 × 10{sup 15 }cm{sup −3} and alignment ratios over 75%. Although spin coherence time (T{sub 2}) is limited to a few μs bymore » electron spins of nitrogen impurities, the combination of the selective alignment and the high density can be a possible way to optimize NV-containing diamond samples for the sensing applications.« less
Detection of eardrum abnormalities using ensemble deep learning approaches
NASA Astrophysics Data System (ADS)
Senaras, Caglar; Moberly, Aaron C.; Teknos, Theodoros; Essig, Garth; Elmaraghy, Charles; Taj-Schaal, Nazhat; Yua, Lianbo; Gurcan, Metin N.
2018-02-01
In this study, we proposed an approach to report the condition of the eardrum as "normal" or "abnormal" by ensembling two different deep learning architectures. In the first network (Network 1), we applied transfer learning to the Inception V3 network by using 409 labeled samples. As a second network (Network 2), we designed a convolutional neural network to take advantage of auto-encoders by using additional 673 unlabeled eardrum samples. The individual classification accuracies of the Network 1 and Network 2 were calculated as 84.4%(+/- 12.1%) and 82.6% (+/- 11.3%), respectively. Only 32% of the errors of the two networks were the same, making it possible to combine two approaches to achieve better classification accuracy. The proposed ensemble method allows us to achieve robust classification because it has high accuracy (84.4%) with the lowest standard deviation (+/- 10.3%).
A Stochastic Diffusion Process for the Dirichlet Distribution
Bakosi, J.; Ristorcelli, J. R.
2013-03-01
The method of potential solutions of Fokker-Planck equations is used to develop a transport equation for the joint probability ofNcoupled stochastic variables with the Dirichlet distribution as its asymptotic solution. To ensure a bounded sample space, a coupled nonlinear diffusion process is required: the Wiener processes in the equivalent system of stochastic differential equations are multiplicative with coefficients dependent on all the stochastic variables. Individual samples of a discrete ensemble, obtained from the stochastic process, satisfy a unit-sum constraint at all times. The process may be used to represent realizations of a fluctuating ensemble ofNvariables subject to a conservation principle.more » Similar to the multivariate Wright-Fisher process, whose invariant is also Dirichlet, the univariate case yields a process whose invariant is the beta distribution. As a test of the results, Monte Carlo simulations are used to evolve numerical ensembles toward the invariant Dirichlet distribution.« less
NASA Astrophysics Data System (ADS)
van der Linden, Joost H.; Narsilio, Guillermo A.; Tordesillas, Antoinette
2016-08-01
We present a data-driven framework to study the relationship between fluid flow at the macroscale and the internal pore structure, across the micro- and mesoscales, in porous, granular media. Sphere packings with varying particle size distribution and confining pressure are generated using the discrete element method. For each sample, a finite element analysis of the fluid flow is performed to compute the permeability. We construct a pore network and a particle contact network to quantify the connectivity of the pores and particles across the mesoscopic spatial scales. Machine learning techniques for feature selection are employed to identify sets of microstructural properties and multiscale complex network features that optimally characterize permeability. We find a linear correlation (in log-log scale) between permeability and the average closeness centrality of the weighted pore network. With the pore network links weighted by the local conductance, the average closeness centrality represents a multiscale measure of efficiency of flow through the pore network in terms of the mean geodesic distance (or shortest path) between all pore bodies in the pore network. Specifically, this study objectively quantifies a hypothesized link between high permeability and efficient shortest paths that thread through relatively large pore bodies connected to each other by high conductance pore throats, embodying connectivity and pore structure.
Deng, Nan-jie; Dai, Wei
2013-01-01
Understanding how kinetics in the unfolded state affects protein folding is a fundamentally important yet less well-understood issue. Here we employ three different models to analyze the unfolded landscape and folding kinetics of the miniprotein Trp-cage. The first is a 208 μs explicit solvent molecular dynamics (MD) simulation from D. E. Shaw Research containing tens of folding events. The second is a Markov state model (MSM-MD) constructed from the same ultra-long MD simulation; MSM-MD can be used to generate thousands of folding events. The third is a Markov state model built from temperature replica exchange MD simulations in implicit solvent (MSM-REMD). All the models exhibit multiple folding pathways, and there is a good correspondence between the folding pathways from direct MD and those computed from the MSMs. The unfolded populations interconvert rapidly between extended and collapsed conformations on time scales ≤ 40 ns, compared with the folding time of ≈ 5 μs. The folding rates are independent of where the folding is initiated from within the unfolded ensemble. About 90 % of the unfolded states are sampled within the first 40 μs of the ultra-long MD trajectory, which on average explores ~27 % of the unfolded state ensemble between consecutive folding events. We clustered the folding pathways according to structural similarity into “tubes”, and kinetically partitioned the unfolded state into populations that fold along different tubes. From our analysis of the simulations and a simple kinetic model, we find that when the mixing within the unfolded state is comparable to or faster than folding, the folding waiting times for all the folding tubes are similar and the folding kinetics is essentially single exponential despite the presence of heterogeneous folding paths with non-uniform barriers. When the mixing is much slower than folding, different unfolded populations fold independently leading to non-exponential kinetics. A kinetic partition of the Trp-cage unfolded state is constructed which reveals that different unfolded populations have almost the same probability to fold along any of the multiple folding paths. We are investigating whether the results for the kinetics in the unfolded state of the twenty-residue Trp-cage is representative of larger single domain proteins. PMID:23705683
USDA-ARS?s Scientific Manuscript database
In Ensemble Kalman Filter (EnKF)-based data assimilation, the background prediction of a model is updated using observations and relative weights based on the model prediction and observation uncertainties. In practice, both model and observation uncertainties are difficult to quantify and they have...
Kim, Jung-Hyun; Powell, Jeffery B; Roberge, Raymond J; Shepherd, Angie; Coca, Aitor
2014-01-01
The purpose of this study was to evaluate the predictive capability of fabric Total Heat Loss (THL) values on thermal stress that Personal Protective Equipment (PPE) ensemble wearers may encounter while performing work. A series of three tests, consisting of the Sweating Hot Plate (SHP) test on two sample fabrics and the Sweating Thermal Manikin (STM) and human performance tests on two single-layer encapsulating ensembles (fabric/ensemble A = low THL and B = high THL), was conducted to compare THL values between SHP and STM methods along with human thermophysiological responses to wearing the ensembles. In human testing, ten male subjects performed a treadmill exercise at 4.8 km and 3% incline for 60 min in two environmental conditions (mild = 22°C, 50% relative humidity (RH) and hot/humid = 35°C, 65% RH). The thermal and evaporative resistances were significantly higher on a fabric level as measured in the SHP test than on the ensemble level as measured in the STM test. Consequently the THL values were also significantly different for both fabric types (SHP vs. STM: 191.3 vs. 81.5 W/m(2) in fabric/ensemble A, and 909.3 vs. 149.9 W/m(2) in fabric/ensemble B (p < 0.001). Body temperature and heart rate response between ensembles A and B were consistently different in both environmental conditions (p < 0.001), which is attributed to significantly higher sweat evaporation in ensemble B than in A (p < 0.05), despite a greater sweat production in ensemble A (p < 0.001) in both environmental conditions. Further, elevation of microclimate temperature (p < 0.001) and humidity (p < 0.01) was significantly greater in ensemble A than in B. It was concluded that: (1) SHP test determined THL values are significantly different from the actual THL potential of the PPE ensemble tested on STM, (2) physiological benefits from wearing a more breathable PPE ensemble may not be feasible with incremental THL values (SHP test) less than approximately 150-200 W·m(2), and (3) the effects of thermal environments on a level of heat stress in PPE ensemble wearers are greater than ensemble thermal characteristics.
pE-DB: a database of structural ensembles of intrinsically disordered and of unfolded proteins.
Varadi, Mihaly; Kosol, Simone; Lebrun, Pierre; Valentini, Erica; Blackledge, Martin; Dunker, A Keith; Felli, Isabella C; Forman-Kay, Julie D; Kriwacki, Richard W; Pierattelli, Roberta; Sussman, Joel; Svergun, Dmitri I; Uversky, Vladimir N; Vendruscolo, Michele; Wishart, David; Wright, Peter E; Tompa, Peter
2014-01-01
The goal of pE-DB (http://pedb.vib.be) is to serve as an openly accessible database for the deposition of structural ensembles of intrinsically disordered proteins (IDPs) and of denatured proteins based on nuclear magnetic resonance spectroscopy, small-angle X-ray scattering and other data measured in solution. Owing to the inherent flexibility of IDPs, solution techniques are particularly appropriate for characterizing their biophysical properties, and structural ensembles in agreement with these data provide a convenient tool for describing the underlying conformational sampling. Database entries consist of (i) primary experimental data with descriptions of the acquisition methods and algorithms used for the ensemble calculations, and (ii) the structural ensembles consistent with these data, provided as a set of models in a Protein Data Bank format. PE-DB is open for submissions from the community, and is intended as a forum for disseminating the structural ensembles and the methodologies used to generate them. While the need to represent the IDP structures is clear, methods for determining and evaluating the structural ensembles are still evolving. The availability of the pE-DB database is expected to promote the development of new modeling methods and leads to a better understanding of how function arises from disordered states.
Total probabilities of ensemble runoff forecasts
NASA Astrophysics Data System (ADS)
Olav Skøien, Jon; Bogner, Konrad; Salamon, Peter; Smith, Paul; Pappenberger, Florian
2016-04-01
Ensemble forecasting has for a long time been used as a method in meteorological modelling to indicate the uncertainty of the forecasts. However, as the ensembles often exhibit both bias and dispersion errors, it is necessary to calibrate and post-process them. Two of the most common methods for this are Bayesian Model Averaging (Raftery et al., 2005) and Ensemble Model Output Statistics (EMOS) (Gneiting et al., 2005). There are also methods for regionalizing these methods (Berrocal et al., 2007) and for incorporating the correlation between lead times (Hemri et al., 2013). Engeland and Steinsland Engeland and Steinsland (2014) developed a framework which can estimate post-processing parameters which are different in space and time, but still can give a spatially and temporally consistent output. However, their method is computationally complex for our larger number of stations, and cannot directly be regionalized in the way we would like, so we suggest a different path below. The target of our work is to create a mean forecast with uncertainty bounds for a large number of locations in the framework of the European Flood Awareness System (EFAS - http://www.efas.eu) We are therefore more interested in improving the forecast skill for high-flows rather than the forecast skill of lower runoff levels. EFAS uses a combination of ensemble forecasts and deterministic forecasts from different forecasters to force a distributed hydrologic model and to compute runoff ensembles for each river pixel within the model domain. Instead of showing the mean and the variability of each forecast ensemble individually, we will now post-process all model outputs to find a total probability, the post-processed mean and uncertainty of all ensembles. The post-processing parameters are first calibrated for each calibration location, but assuring that they have some spatial correlation, by adding a spatial penalty in the calibration process. This can in some cases have a slight negative impact on the calibration error, but makes it easier to interpolate the post-processing parameters to uncalibrated locations. We also look into different methods for handling the non-normal distributions of runoff data and the effect of different data transformations on forecasts skills in general and for floods in particular. Berrocal, V. J., Raftery, A. E. and Gneiting, T.: Combining Spatial Statistical and Ensemble Information in Probabilistic Weather Forecasts, Mon. Weather Rev., 135(4), 1386-1402, doi:10.1175/MWR3341.1, 2007. Engeland, K. and Steinsland, I.: Probabilistic postprocessing models for flow forecasts for a system of catchments and several lead times, Water Resour. Res., 50(1), 182-197, doi:10.1002/2012WR012757, 2014. Gneiting, T., Raftery, A. E., Westveld, A. H. and Goldman, T.: Calibrated Probabilistic Forecasting Using Ensemble Model Output Statistics and Minimum CRPS Estimation, Mon. Weather Rev., 133(5), 1098-1118, doi:10.1175/MWR2904.1, 2005. Hemri, S., Fundel, F. and Zappa, M.: Simultaneous calibration of ensemble river flow predictions over an entire range of lead times, Water Resour. Res., 49(10), 6744-6755, doi:10.1002/wrcr.20542, 2013. Raftery, A. E., Gneiting, T., Balabdaoui, F. and Polakowski, M.: Using Bayesian Model Averaging to Calibrate Forecast Ensembles, Mon. Weather Rev., 133(5), 1155-1174, doi:10.1175/MWR2906.1, 2005.
NASA Astrophysics Data System (ADS)
Iachimciuc, Igor
The dissertation is in two parts, a theoretical study and a musical composition. In Part I the music of Gyorgy Kurtag is analyzed from the point of view of sound color. A brief description of what is understood by the term sound color, and various ways of achieving specific coloristic effects, are presented in the Introduction. An examination of Kurtag's approaches to the domain of sound color occupies the chapters that follow. The musical examples that are analyzed are selected from Kurtag's different compositional periods, showing a certain consistency in sound color techniques, the most important of which are already present in the String Quartet, Op. 1. The compositions selected for analysis are written for different ensembles, but regardless of the instrumentation, certain principles of the formation and organization of sound color remain the same. Rather than relying on extended instrumental techniques, Kurtag creates a large variety of sound colors using traditional means such as pitch material, register, density, rhythm, timbral combinations, dynamics, texture, spatial displacement of the instruments, and the overall musical context. Each sound color unit in Kurtag's music is a separate entity, conceived as a complete microcosm. Sound color units can either be juxtaposed as contrasting elements, forming sound color variations, or superimposed, often resulting in a Klangfarbenmelodie effect. Some of the same gestural figures (objets trouves) appear in different compositions, but with significant coloristic modifications. Thus, the principle of sound color variations is not only a strong organizational tool, but also a characteristic stylistic feature of the music of Gyorgy Kurtag. Part II, Leopard's Path (2010), for flute, clarinet, violin, cello, cimbalom, and piano, is an original composition inspired by the painting of Jesse Allen, a San Francisco based artist. The composition is conceived as a cycle of thirteen short movements. Ten of these movements are the musical interpretation of the objects presented in the painting, and are stylistically similar. These movements are scored for the entire ensemble. The other three movements, entitled Interludes, provide a stylistic contrast, and are not directly connected with the painting.
Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods.
Notaro, Marco; Schubach, Max; Robinson, Peter N; Valentini, Giorgio
2017-10-12
The prediction of human gene-abnormal phenotype associations is a fundamental step toward the discovery of novel genes associated with human disorders, especially when no genes are known to be associated with a specific disease. In this context the Human Phenotype Ontology (HPO) provides a standard categorization of the abnormalities associated with human diseases. While the problem of the prediction of gene-disease associations has been widely investigated, the related problem of gene-phenotypic feature (i.e., HPO term) associations has been largely overlooked, even if for most human genes no HPO term associations are known and despite the increasing application of the HPO to relevant medical problems. Moreover most of the methods proposed in literature are not able to capture the hierarchical relationships between HPO terms, thus resulting in inconsistent and relatively inaccurate predictions. We present two hierarchical ensemble methods that we formally prove to provide biologically consistent predictions according to the hierarchical structure of the HPO. The modular structure of the proposed methods, that consists in a "flat" learning first step and a hierarchical combination of the predictions in the second step, allows the predictions of virtually any flat learning method to be enhanced. The experimental results show that hierarchical ensemble methods are able to predict novel associations between genes and abnormal phenotypes with results that are competitive with state-of-the-art algorithms and with a significant reduction of the computational complexity. Hierarchical ensembles are efficient computational methods that guarantee biologically meaningful predictions that obey the true path rule, and can be used as a tool to improve and make consistent the HPO terms predictions starting from virtually any flat learning method. The implementation of the proposed methods is available as an R package from the CRAN repository.
NASA Astrophysics Data System (ADS)
Yettella, Vineel; Kay, Jennifer E.
2017-09-01
The extratropical precipitation response to global warming is investigated within a 30-member initial condition climate model ensemble. As in observations, modeled cyclonic precipitation contributes a large fraction of extratropical precipitation, especially over the ocean and in the winter hemisphere. When compared to present day, the ensemble projects increased cyclone-associated precipitation under twenty-first century business-as-usual greenhouse gas forcing. While the cyclone-associated precipitation response is weaker in the near-future (2016-2035) than in the far-future (2081-2100), both future periods have similar patterns of response. Though cyclone frequency changes are important regionally, most of the increased cyclone-associated precipitation results from increased within-cyclone precipitation. Consistent with this result, cyclone-centric composites show statistically significant precipitation increases in all cyclone sectors. Decomposition into thermodynamic (mean cyclone water vapor path) and dynamic (mean cyclone wind speed) contributions shows that thermodynamics explains 92 and 95% of the near-future and far-future within-cyclone precipitation increases respectively. Surprisingly, the influence of dynamics on future cyclonic precipitation changes is negligible. In addition, the forced response exceeds internal variability in both future time periods. Overall, this work suggests that future cyclonic precipitation changes will result primarily from increased moisture availability in a warmer world, with secondary contributions from changes in cyclone frequency and cyclone dynamics.
Impact of distributions on the archetypes and prototypes in heterogeneous nanoparticle ensembles.
Fernandez, Michael; Wilson, Hugh F; Barnard, Amanda S
2017-01-05
The magnitude and complexity of the structural and functional data available on nanomaterials requires data analytics, statistical analysis and information technology to drive discovery. We demonstrate that multivariate statistical analysis can recognise the sets of truly significant nanostructures and their most relevant properties in heterogeneous ensembles with different probability distributions. The prototypical and archetypal nanostructures of five virtual ensembles of Si quantum dots (SiQDs) with Boltzmann, frequency, normal, Poisson and random distributions are identified using clustering and archetypal analysis, where we find that their diversity is defined by size and shape, regardless of the type of distribution. At the complex hull of the SiQD ensembles, simple configuration archetypes can efficiently describe a large number of SiQDs, whereas more complex shapes are needed to represent the average ordering of the ensembles. This approach provides a route towards the characterisation of computationally intractable virtual nanomaterial spaces, which can convert big data into smart data, and significantly reduce the workload to simulate experimentally relevant virtual samples.
NASA Astrophysics Data System (ADS)
Wu, Zikai; Hou, Baoyu; Zhang, Hongjuan; Jin, Feng
2014-04-01
Deterministic network models have been attractive media for discussing dynamical processes' dependence on network structural features. On the other hand, the heterogeneity of weights affect dynamical processes taking place on networks. In this paper, we present a family of weighted expanded Koch networks based on Koch networks. They originate from a r-polygon, and each node of current generation produces m r-polygons including the node and whose weighted edges are scaled by factor w in subsequent evolutionary step. We derive closed-form expressions for average weighted shortest path length (AWSP). In large network, AWSP stays bounded with network order growing (0 < w < 1). Then, we focus on a special random walks and trapping issue on the networks. In more detail, we calculate exactly the average receiving time (ART). ART exhibits a sub-linear dependence on network order (0 < w < 1), which implies that nontrivial weighted expanded Koch networks are more efficient than un-weighted expanded Koch networks in receiving information. Besides, efficiency of receiving information at hub nodes is also dependent on parameters m and r. These findings may pave the way for controlling information transportation on general weighted networks.
Fox, Claire L; Farrow, Claire V
2009-10-01
Research has found evidence of a link between being overweight or obese and bullying/peer victimisation, and also between obesity and adjustment problems such as low self-esteem and body dissatisfaction. Studies have also found that adjustment problems can put children at an increased risk of being bullied over time. However, to date the factors that place overweight or obese children at risk of being bullied have been poorly elucidated. Self-report data were collected from a sample of 11-14 year olds (N=376) about their weight status, about their experiences of three different types of bullying (Verbal, Physical and Social), their global self-worth, self-esteem for physical appearance, and body dissatisfaction. Overweight or obese children reported experiencing significantly more verbal and physical (but not social) bullying than their non-overweight peers. Global self-worth, self-esteem for physical appearance and body dissatisfaction each fully mediated the paths between weight status and being a victim of bullying.
Information flow in an atmospheric model and data assimilation
NASA Astrophysics Data System (ADS)
Yoon, Young-noh
2011-12-01
Weather forecasting consists of two processes, model integration and analysis (data assimilation). During the model integration, the state estimate produced by the analysis evolves to the next cycle time according to the atmospheric model to become the background estimate. The analysis then produces a new state estimate by combining the background state estimate with new observations, and the cycle repeats. In an ensemble Kalman filter, the probability distribution of the state estimate is represented by an ensemble of sample states, and the covariance matrix is calculated using the ensemble of sample states. We perform numerical experiments on toy atmospheric models introduced by Lorenz in 2005 to study the information flow in an atmospheric model in conjunction with ensemble Kalman filtering for data assimilation. This dissertation consists of two parts. The first part of this dissertation is about the propagation of information and the use of localization in ensemble Kalman filtering. If we can perform data assimilation locally by considering the observations and the state variables only near each grid point, then we can reduce the number of ensemble members necessary to cover the probability distribution of the state estimate, reducing the computational cost for the data assimilation and the model integration. Several localized versions of the ensemble Kalman filter have been proposed. Although tests applying such schemes have proven them to be extremely promising, a full basic understanding of the rationale and limitations of localization is currently lacking. We address these issues and elucidate the role played by chaotic wave dynamics in the propagation of information and the resulting impact on forecasts. The second part of this dissertation is about ensemble regional data assimilation using joint states. Assuming that we have a global model and a regional model of higher accuracy defined in a subregion inside the global region, we propose a data assimilation scheme that produces the analyses for the global and the regional model simultaneously, considering forecast information from both models. We show that our new data assimilation scheme produces better results both in the subregion and the global region than the data assimilation scheme that produces the analyses for the global and the regional model separately.
Cost-sensitive AdaBoost algorithm for ordinal regression based on extreme learning machine.
Riccardi, Annalisa; Fernández-Navarro, Francisco; Carloni, Sante
2014-10-01
In this paper, the well known stagewise additive modeling using a multiclass exponential (SAMME) boosting algorithm is extended to address problems where there exists a natural order in the targets using a cost-sensitive approach. The proposed ensemble model uses an extreme learning machine (ELM) model as a base classifier (with the Gaussian kernel and the additional regularization parameter). The closed form of the derived weighted least squares problem is provided, and it is employed to estimate analytically the parameters connecting the hidden layer to the output layer at each iteration of the boosting algorithm. Compared to the state-of-the-art boosting algorithms, in particular those using ELM as base classifier, the suggested technique does not require the generation of a new training dataset at each iteration. The adoption of the weighted least squares formulation of the problem has been presented as an unbiased and alternative approach to the already existing ELM boosting techniques. Moreover, the addition of a cost model for weighting the patterns, according to the order of the targets, enables the classifier to tackle ordinal regression problems further. The proposed method has been validated by an experimental study by comparing it with already existing ensemble methods and ELM techniques for ordinal regression, showing competitive results.
Fitting a function to time-dependent ensemble averaged data.
Fogelmark, Karl; Lomholt, Michael A; Irbäck, Anders; Ambjörnsson, Tobias
2018-05-03
Time-dependent ensemble averages, i.e., trajectory-based averages of some observable, are of importance in many fields of science. A crucial objective when interpreting such data is to fit these averages (for instance, squared displacements) with a function and extract parameters (such as diffusion constants). A commonly overlooked challenge in such function fitting procedures is that fluctuations around mean values, by construction, exhibit temporal correlations. We show that the only available general purpose function fitting methods, correlated chi-square method and the weighted least squares method (which neglects correlation), fail at either robust parameter estimation or accurate error estimation. We remedy this by deriving a new closed-form error estimation formula for weighted least square fitting. The new formula uses the full covariance matrix, i.e., rigorously includes temporal correlations, but is free of the robustness issues, inherent to the correlated chi-square method. We demonstrate its accuracy in four examples of importance in many fields: Brownian motion, damped harmonic oscillation, fractional Brownian motion and continuous time random walks. We also successfully apply our method, weighted least squares including correlation in error estimation (WLS-ICE), to particle tracking data. The WLS-ICE method is applicable to arbitrary fit functions, and we provide a publically available WLS-ICE software.
NASA Astrophysics Data System (ADS)
Booth, B. B. B.; Bernie, D.; McNeall, D.; Hawkins, E.; Caesar, J.; Boulton, C.; Friedlingstein, P.; Sexton, D. M. H.
2013-04-01
We compare future changes in global mean temperature in response to different future scenarios which, for the first time, arise from emission-driven rather than concentration-driven perturbed parameter ensemble of a global climate model (GCM). These new GCM simulations sample uncertainties in atmospheric feedbacks, land carbon cycle, ocean physics and aerosol sulphur cycle processes. We find broader ranges of projected temperature responses arising when considering emission rather than concentration-driven simulations (with 10-90th percentile ranges of 1.7 K for the aggressive mitigation scenario, up to 3.9 K for the high-end, business as usual scenario). A small minority of simulations resulting from combinations of strong atmospheric feedbacks and carbon cycle responses show temperature increases in excess of 9 K (RCP8.5) and even under aggressive mitigation (RCP2.6) temperatures in excess of 4 K. While the simulations point to much larger temperature ranges for emission-driven experiments, they do not change existing expectations (based on previous concentration-driven experiments) on the timescales over which different sources of uncertainty are important. The new simulations sample a range of future atmospheric concentrations for each emission scenario. Both in the case of SRES A1B and the Representative Concentration Pathways (RCPs), the concentration scenarios used to drive GCM ensembles, lies towards the lower end of our simulated distribution. This design decision (a legacy of previous assessments) is likely to lead concentration-driven experiments to under-sample strong feedback responses in future projections. Our ensemble of emission-driven simulations span the global temperature response of the CMIP5 emission-driven simulations, except at the low end. Combinations of low climate sensitivity and low carbon cycle feedbacks lead to a number of CMIP5 responses to lie below our ensemble range. The ensemble simulates a number of high-end responses which lie above the CMIP5 carbon cycle range. These high-end simulations can be linked to sampling a number of stronger carbon cycle feedbacks and to sampling climate sensitivities above 4.5 K. This latter aspect highlights the priority in identifying real-world climate-sensitivity constraints which, if achieved, would lead to reductions on the upper bound of projected global mean temperature change. The ensembles of simulations presented here provides a framework to explore relationships between present-day observables and future changes, while the large spread of future-projected changes highlights the ongoing need for such work.
Pérez-Castillo, Yunierkis; Lazar, Cosmin; Taminau, Jonatan; Froeyen, Mathy; Cabrera-Pérez, Miguel Ángel; Nowé, Ann
2012-09-24
Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.
Chen, Chunyi; Yang, Huamin
2017-11-01
The root-mean-square (RMS) bandwidth of temporal light-flux fluctuations is formulated for both plane and spherical waves propagating in the turbulent atmosphere with location-dependent transverse wind. Two path weighting functions characterizing the joint contributions of turbulent eddies and transverse winds at various locations toward the RMS bandwidth are derived. Based on the developed formulations, the roles of variations in both the direction and magnitude of transverse wind velocity with locations over a path on the RMS bandwidth are elucidated. For propagation paths between ground and space, comparisons of the RMS bandwidth computed based on the Bufton wind profile with that calculated by assuming a nominal constant transverse wind velocity are made to exemplify the effect that location dependence of transverse wind velocity has on the RMS bandwidth. Moreover, an expression for the weighted RMS transverse wind velocity has been derived, which can be used as a nominal constant transverse wind velocity over a path for accurately determining the RMS bandwidth.
Warburton, William K.; Momayezi, Michael
2006-06-20
A method and apparatus for processing step-like output signals (primary signals) generated by non-ideal, for example, nominally single-pole ("N-1P ") devices. An exemplary method includes creating a set of secondary signals by directing the primary signal along a plurality of signal paths to a signal summation point, summing the secondary signals reaching the signal summation point after propagating along the signal paths to provide a summed signal, performing a filtering or delaying operation in at least one of said signal paths so that the secondary signals reaching said summing point have a defined time correlation with respect to one another, applying a set of weighting coefficients to the secondary signals propagating along said signal paths, and performing a capturing operation after any filtering or delaying operations so as to provide a weighted signal sum value as a measure of the integrated area QgT of the input signal.
NASA Astrophysics Data System (ADS)
Pan, Yujie; Xue, Ming; Zhu, Kefeng; Wang, Mingjun
2018-05-01
A dual-resolution (DR) version of a regional ensemble Kalman filter (EnKF)-3D ensemble variational (3DEnVar) coupled hybrid data assimilation system is implemented as a prototype for the operational Rapid Refresh forecasting system. The DR 3DEnVar system combines a high-resolution (HR) deterministic background forecast with lower-resolution (LR) EnKF ensemble perturbations used for flow-dependent background error covariance to produce a HR analysis. The computational cost is substantially reduced by running the ensemble forecasts and EnKF analyses at LR. The DR 3DEnVar system is tested with 3-h cycles over a 9-day period using a 40/˜13-km grid spacing combination. The HR forecasts from the DR hybrid analyses are compared with forecasts launched from HR Gridpoint Statistical Interpolation (GSI) 3D variational (3DVar) analyses, and single LR hybrid analyses interpolated to the HR grid. With the DR 3DEnVar system, a 90% weight for the ensemble covariance yields the lowest forecast errors and the DR hybrid system clearly outperforms the HR GSI 3DVar. Humidity and wind forecasts are also better than those launched from interpolated LR hybrid analyses, but the temperature forecasts are slightly worse. The humidity forecasts are improved most. For precipitation forecasts, the DR 3DEnVar always outperforms HR GSI 3DVar. It also outperforms the LR 3DEnVar, except for the initial forecast period and lower thresholds.
Oscillatory ductile compaction dynamics in a cylinder
NASA Astrophysics Data System (ADS)
Uri, Lina; Dysthe, Dag Kristian; Feder, Jens
2006-09-01
Ductile compaction is common in many natural systems, but the temporal evolution of such systems is rarely studied. We observe surprising oscillations in the weight measured at the bottom of a self-compacting ensemble of ductile grains. The oscillations develop during the first ten hours of the experiment, and usually persist through the length of an experiment (one week). The weight oscillations are connected to the grain-wall contacts, and are directly correlated with the observed strain evolution and the dynamics of grain-wall contacts during the compaction. Here, we present the experimental results and characteristic time constants of the system, and discuss possible reasons for the measured weight oscillations.
Oscillatory ductile compaction dynamics in a cylinder.
Uri, Lina; Dysthe, Dag Kristian; Feder, Jens
2006-09-01
Ductile compaction is common in many natural systems, but the temporal evolution of such systems is rarely studied. We observe surprising oscillations in the weight measured at the bottom of a self-compacting ensemble of ductile grains. The oscillations develop during the first ten hours of the experiment, and usually persist through the length of an experiment (one week). The weight oscillations are connected to the grain-wall contacts, and are directly correlated with the observed strain evolution and the dynamics of grain-wall contacts during the compaction. Here, we present the experimental results and characteristic time constants of the system, and discuss possible reasons for the measured weight oscillations.
Clustering cancer gene expression data by projective clustering ensemble
Yu, Xianxue; Yu, Guoxian
2017-01-01
Gene expression data analysis has paramount implications for gene treatments, cancer diagnosis and other domains. Clustering is an important and promising tool to analyze gene expression data. Gene expression data is often characterized by a large amount of genes but with limited samples, thus various projective clustering techniques and ensemble techniques have been suggested to combat with these challenges. However, it is rather challenging to synergy these two kinds of techniques together to avoid the curse of dimensionality problem and to boost the performance of gene expression data clustering. In this paper, we employ a projective clustering ensemble (PCE) to integrate the advantages of projective clustering and ensemble clustering, and to avoid the dilemma of combining multiple projective clusterings. Our experimental results on publicly available cancer gene expression data show PCE can improve the quality of clustering gene expression data by at least 4.5% (on average) than other related techniques, including dimensionality reduction based single clustering and ensemble approaches. The empirical study demonstrates that, to further boost the performance of clustering cancer gene expression data, it is necessary and promising to synergy projective clustering with ensemble clustering. PCE can serve as an effective alternative technique for clustering gene expression data. PMID:28234920
Walsh, Tom P; Butterworth, Paul A; Urquhart, Donna M; Cicuttini, Flavia M; Landorf, Karl B; Wluka, Anita E; Michael Shanahan, E; Menz, Hylton B
2017-01-01
There is a well-recognised relationship between body weight, plantar pressures and foot pain, but the temporal association between these factors is unknown. The aim of this study was to investigate the relationships between increasing weight, plantar pressures and foot pain over a two-year period. Fifty-one participants (33 women and 18 men) completed the two-year longitudinal cohort study. The sample had a mean (standard deviation (SD)) age of 52.6 (8.5) years. At baseline and follow-up, participants completed the Manchester Foot Pain and Disability Index questionnaire, and underwent anthropometric measures, including body weight, body mass index, and dynamic plantar pressures. Within-group analyses examined differences in body weight, foot pain and plantar pressures between baseline and follow up, and multivariate regression analysis examined associations between change in body weight, foot pain and plantar pressure. Path analysis assessed the total impact of both the direct and indirect effects of change in body weight on plantar pressure and pain variables. Mean (SD) body weight increased from 80.3 (19.3), to 82.3 (20.6) kg, p = 0.016 from baseline to follow up. The change in body weight ranged from -16.1 to 12.7 kg. The heel was the only site to exhibit increased peak plantar pressures between baseline and follow up. After adjustment for age, gender and change in contact time (where appropriate), there were significant associations between: (i) change in body weight and changes in midfoot plantar pressure ( B = 4.648, p = 0.038) and functional limitation ( B = 0.409, p = 0.010), (ii) plantar pressure change in the heel and both functional limitation ( B = 4.054, p = 0.013) and pain intensity ( B = 1.831, p = 0.006), (iii) plantar pressure change in the midfoot and both functional limitation ( B = 4.505 , p = 0.018) and pain intensity ( B = 1.913 , p = 0.015) . Path analysis indicated that the effect of increasing body weight on foot-related functional limitation and foot pain intensity may be mediated by increased plantar pressure in the midfoot. These findings suggest that as body weight and plantar pressure increase, foot pain increases, and that the midfoot may be the most vulnerable site for pressure-related pain.
Evaluation of annual, global seismicity forecasts, including ensemble models
NASA Astrophysics Data System (ADS)
Taroni, Matteo; Zechar, Jeremy; Marzocchi, Warner
2013-04-01
In 2009, the Collaboratory for the Study of the Earthquake Predictability (CSEP) initiated a prototype global earthquake forecast experiment. Three models participated in this experiment for 2009, 2010 and 2011—each model forecast the number of earthquakes above magnitude 6 in 1x1 degree cells that span the globe. Here we use likelihood-based metrics to evaluate the consistency of the forecasts with the observed seismicity. We compare model performance with statistical tests and a new method based on the peer-to-peer gambling score. The results of the comparisons are used to build ensemble models that are a weighted combination of the individual models. Notably, in these experiments the ensemble model always performs significantly better than the single best-performing model. Our results indicate the following: i) time-varying forecasts, if not updated after each major shock, may not provide significant advantages with respect to time-invariant models in 1-year forecast experiments; ii) the spatial distribution seems to be the most important feature to characterize the different forecasting performances of the models; iii) the interpretation of consistency tests may be misleading because some good models may be rejected while trivial models may pass consistency tests; iv) a proper ensemble modeling seems to be a valuable procedure to get the best performing model for practical purposes.
Simulating ensembles of source water quality using a K-nearest neighbor resampling approach.
Towler, Erin; Rajagopalan, Balaji; Seidel, Chad; Summers, R Scott
2009-03-01
Climatological, geological, and water management factors can cause significant variability in surface water quality. As drinking water quality standards become more stringent, the ability to quantify the variability of source water quality becomes more important for decision-making and planning in water treatment for regulatory compliance. However, paucity of long-term water quality data makes it challenging to apply traditional simulation techniques. To overcome this limitation, we have developed and applied a robust nonparametric K-nearest neighbor (K-nn) bootstrap approach utilizing the United States Environmental Protection Agency's Information Collection Rule (ICR) data. In this technique, first an appropriate "feature vector" is formed from the best available explanatory variables. The nearest neighbors to the feature vector are identified from the ICR data and are resampled using a weight function. Repetition of this results in water quality ensembles, and consequently the distribution and the quantification of the variability. The main strengths of the approach are its flexibility, simplicity, and the ability to use a large amount of spatial data with limited temporal extent to provide water quality ensembles for any given location. We demonstrate this approach by applying it to simulate monthly ensembles of total organic carbon for two utilities in the U.S. with very different watersheds and to alkalinity and bromide at two other U.S. utilities.
Role of Aquaporins in a Composite Model of Water Transport in the Leaf.
Yaaran, Adi; Moshelion, Menachem
2016-06-30
Water-transport pathways through the leaf are complex and include several checkpoints. Some of these checkpoints exhibit dynamic behavior that may be regulated by aquaporins (AQPs). To date, neither the relative weight of the different water pathways nor their molecular mechanisms are well understood. Here, we have collected evidence to support a putative composite model of water pathways in the leaf and the distribution of water across those pathways. We describe how water moves along a single transcellular path through the parenchyma and continues toward the mesophyll and stomata along transcellular, symplastic and apoplastic paths. We present evidence that points to a role for AQPs in regulating the relative weight of each path in the overall leaf water-transport system and the movement of water between these paths as a result of the integration of multiple signals, including transpiration demand, water potential and turgor. We also present a new theory, the hydraulic fuse theory, to explain effects of the leaf turgor-loss-point on water paths alternation and the subsequent reduction in leaf hydraulic conductivity. An improved understating of leaf water-balance management may lead to the development of crops that use water more efficiently, and responds better to environmental changes.
NASA Astrophysics Data System (ADS)
Krishnamoorthy, C.; Balaji, C.
2016-05-01
In the present study, the effect of horizontal and vertical localization scales on the assimilation of direct SAPHIR radiances is studied. An Artificial Neural Network (ANN) has been used as a surrogate for the forward radiative calculations. The training input dataset for ANN consists of vertical layers of atmospheric pressure, temperature, relative humidity and other hydrometeor profiles with 6 channel Brightness Temperatures (BTs) as output. The best neural network architecture has been arrived at, by a neuron independence study. Since vertical localization of radiance data requires weighting functions, a ANN has been trained for this purpose. The radiances were ingested into the NWP using the Ensemble Kalman Filter (EnKF) technique. The horizontal localization has been taken care of, by using a Gaussian localization function centered around the observed coordinates. Similarly, the vertical localization is accomplished by assuming a function which depends on the weighting function of the channel to be assimilated. The effect of both horizontal and vertical localizations has been studied in terms of ensemble spread in the precipitation. Aditionally, improvements in 24 hr forecast from assimilation are also reported.
Adaptive sampling strategies with high-throughput molecular dynamics
NASA Astrophysics Data System (ADS)
Clementi, Cecilia
Despite recent significant hardware and software developments, the complete thermodynamic and kinetic characterization of large macromolecular complexes by molecular simulations still presents significant challenges. The high dimensionality of these systems and the complexity of the associated potential energy surfaces (creating multiple metastable regions connected by high free energy barriers) does not usually allow to adequately sample the relevant regions of their configurational space by means of a single, long Molecular Dynamics (MD) trajectory. Several different approaches have been proposed to tackle this sampling problem. We focus on the development of ensemble simulation strategies, where data from a large number of weakly coupled simulations are integrated to explore the configurational landscape of a complex system more efficiently. Ensemble methods are of increasing interest as the hardware roadmap is now mostly based on increasing core counts, rather than clock speeds. The main challenge in the development of an ensemble approach for efficient sampling is in the design of strategies to adaptively distribute the trajectories over the relevant regions of the systems' configurational space, without using any a priori information on the system global properties. We will discuss the definition of smart adaptive sampling approaches that can redirect computational resources towards unexplored yet relevant regions. Our approaches are based on new developments in dimensionality reduction for high dimensional dynamical systems, and optimal redistribution of resources. NSF CHE-1152344, NSF CHE-1265929, Welch Foundation C-1570.
Nullspace Sampling with Holonomic Constraints Reveals Molecular Mechanisms of Protein Gαs.
Pachov, Dimitar V; van den Bedem, Henry
2015-07-01
Proteins perform their function or interact with partners by exchanging between conformational substates on a wide range of spatiotemporal scales. Structurally characterizing these exchanges is challenging, both experimentally and computationally. Large, diffusional motions are often on timescales that are difficult to access with molecular dynamics simulations, especially for large proteins and their complexes. The low frequency modes of normal mode analysis (NMA) report on molecular fluctuations associated with biological activity. However, NMA is limited to a second order expansion about a minimum of the potential energy function, which limits opportunities to observe diffusional motions. By contrast, kino-geometric conformational sampling (KGS) permits large perturbations while maintaining the exact geometry of explicit conformational constraints, such as hydrogen bonds. Here, we extend KGS and show that a conformational ensemble of the α subunit Gαs of heterotrimeric stimulatory protein Gs exhibits structural features implicated in its activation pathway. Activation of protein Gs by G protein-coupled receptors (GPCRs) is associated with GDP release and large conformational changes of its α-helical domain. Our method reveals a coupled α-helical domain opening motion while, simultaneously, Gαs helix α5 samples an activated conformation. These motions are moderated in the activated state. The motion centers on a dynamic hub near the nucleotide-binding site of Gαs, and radiates to helix α4. We find that comparative NMA-based ensembles underestimate the amplitudes of the motion. Additionally, the ensembles fall short in predicting the accepted direction of the full activation pathway. Taken together, our findings suggest that nullspace sampling with explicit, holonomic constraints yields ensembles that illuminate molecular mechanisms involved in GDP release and protein Gs activation, and further establish conformational coupling between key structural elements of Gαs.
Nullspace Sampling with Holonomic Constraints Reveals Molecular Mechanisms of Protein Gαs
Pachov, Dimitar V.; van den Bedem, Henry
2015-01-01
Proteins perform their function or interact with partners by exchanging between conformational substates on a wide range of spatiotemporal scales. Structurally characterizing these exchanges is challenging, both experimentally and computationally. Large, diffusional motions are often on timescales that are difficult to access with molecular dynamics simulations, especially for large proteins and their complexes. The low frequency modes of normal mode analysis (NMA) report on molecular fluctuations associated with biological activity. However, NMA is limited to a second order expansion about a minimum of the potential energy function, which limits opportunities to observe diffusional motions. By contrast, kino-geometric conformational sampling (KGS) permits large perturbations while maintaining the exact geometry of explicit conformational constraints, such as hydrogen bonds. Here, we extend KGS and show that a conformational ensemble of the α subunit Gαs of heterotrimeric stimulatory protein Gs exhibits structural features implicated in its activation pathway. Activation of protein Gs by G protein-coupled receptors (GPCRs) is associated with GDP release and large conformational changes of its α-helical domain. Our method reveals a coupled α-helical domain opening motion while, simultaneously, Gαs helix α5 samples an activated conformation. These motions are moderated in the activated state. The motion centers on a dynamic hub near the nucleotide-binding site of Gαs, and radiates to helix α4. We find that comparative NMA-based ensembles underestimate the amplitudes of the motion. Additionally, the ensembles fall short in predicting the accepted direction of the full activation pathway. Taken together, our findings suggest that nullspace sampling with explicit, holonomic constraints yields ensembles that illuminate molecular mechanisms involved in GDP release and protein Gs activation, and further establish conformational coupling between key structural elements of Gαs. PMID:26218073
Diagnostic and Remedial Learning Strategy Based on Conceptual Graphs
ERIC Educational Resources Information Center
Jong, BinShyan; Lin, TsongWuu; Wu, YuLung; Chan, Teyi
2004-01-01
Numerous scholars have applied conceptual graphs for explanatory purposes. This study devised the Remedial-Instruction Decisive path (RID path) algorithm for diagnosing individual student learning situation. This study focuses on conceptual graphs. According to the concepts learned by students and the weight values of relations among these…
Ensemble-Biased Metadynamics: A Molecular Simulation Method to Sample Experimental Distributions
Marinelli, Fabrizio; Faraldo-Gómez, José D.
2015-01-01
We introduce an enhanced-sampling method for molecular dynamics (MD) simulations referred to as ensemble-biased metadynamics (EBMetaD). The method biases a conventional MD simulation to sample a molecular ensemble that is consistent with one or more probability distributions known a priori, e.g., experimental intramolecular distance distributions obtained by double electron-electron resonance or other spectroscopic techniques. To this end, EBMetaD adds an adaptive biasing potential throughout the simulation that discourages sampling of configurations inconsistent with the target probability distributions. The bias introduced is the minimum necessary to fulfill the target distributions, i.e., EBMetaD satisfies the maximum-entropy principle. Unlike other methods, EBMetaD does not require multiple simulation replicas or the introduction of Lagrange multipliers, and is therefore computationally efficient and straightforward in practice. We demonstrate the performance and accuracy of the method for a model system as well as for spin-labeled T4 lysozyme in explicit water, and show how EBMetaD reproduces three double electron-electron resonance distance distributions concurrently within a few tens of nanoseconds of simulation time. EBMetaD is integrated in the open-source PLUMED plug-in (www.plumed-code.org), and can be therefore readily used with multiple MD engines. PMID:26083917
Salmon, Loïc; Giambaşu, George M; Nikolova, Evgenia N; Petzold, Katja; Bhattacharya, Akash; Case, David A; Al-Hashimi, Hashim M
2015-10-14
Approaches that combine experimental data and computational molecular dynamics (MD) to determine atomic resolution ensembles of biomolecules require the measurement of abundant experimental data. NMR residual dipolar couplings (RDCs) carry rich dynamics information, however, difficulties in modulating overall alignment of nucleic acids have limited the ability to fully extract this information. We present a strategy for modulating RNA alignment that is based on introducing variable dynamic kinks in terminal helices. With this strategy, we measured seven sets of RDCs in a cUUCGg apical loop and used this rich data set to test the accuracy of an 0.8 μs MD simulation computed using the Amber ff10 force field as well as to determine an atomic resolution ensemble. The MD-generated ensemble quantitatively reproduces the measured RDCs, but selection of a sub-ensemble was required to satisfy the RDCs within error. The largest discrepancies between the RDC-selected and MD-generated ensembles are observed for the most flexible loop residues and backbone angles connecting the loop to the helix, with the RDC-selected ensemble resulting in more uniform dynamics. Comparison of the RDC-selected ensemble with NMR spin relaxation data suggests that the dynamics occurs on the ps-ns time scales as verified by measurements of R(1ρ) relaxation-dispersion data. The RDC-satisfying ensemble samples many conformations adopted by the hairpin in crystal structures indicating that intrinsic plasticity may play important roles in conformational adaptation. The approach presented here can be applied to test nucleic acid force fields and to characterize dynamics in diverse RNA motifs at atomic resolution.