Generalized Ensemble Sampling of Enzyme Reaction Free Energy Pathways
Wu, Dongsheng; Fajer, Mikolai I.; Cao, Liaoran; Cheng, Xiaolin; Yang, Wei
2016-01-01
Free energy path sampling plays an essential role in computational understanding of chemical reactions, particularly those occurring in enzymatic environments. Among a variety of molecular dynamics simulation approaches, the generalized ensemble sampling strategy is uniquely attractive for the fact that it not only can enhance the sampling of rare chemical events but also can naturally ensure consistent exploration of environmental degrees of freedom. In this review, we plan to provide a tutorial-like tour on an emerging topic: generalized ensemble sampling of enzyme reaction free energy path. The discussion is largely focused on our own studies, particularly ones based on the metadynamics free energy sampling method and the on-the-path random walk path sampling method. We hope that this mini presentation will provide interested practitioners some meaningful guidance for future algorithm formulation and application study. PMID:27498634
Bhatt, Divesh; Zuckerman, Daniel M.
2010-01-01
We performed “weighted ensemble” path–sampling simulations of adenylate kinase, using several semi–atomistic protein models. The models have an all–atom backbone with various levels of residue interactions. The primary result is that full statistically rigorous path sampling required only a few weeks of single–processor computing time with these models, indicating the addition of further chemical detail should be readily feasible. Our semi–atomistic path ensembles are consistent with previous biophysical findings: the presence of two distinct pathways, identification of intermediates, and symmetry of forward and reverse pathways. PMID:21660120
Path planning in uncertain flow fields using ensemble method
NASA Astrophysics Data System (ADS)
Wang, Tong; Le Maître, Olivier P.; Hoteit, Ibrahim; Knio, Omar M.
2016-10-01
An ensemble-based approach is developed to conduct optimal path planning in unsteady ocean currents under uncertainty. We focus our attention on two-dimensional steady and unsteady uncertain flows, and adopt a sampling methodology that is well suited to operational forecasts, where an ensemble of deterministic predictions is used to model and quantify uncertainty. In an operational setting, much about dynamics, topography, and forcing of the ocean environment is uncertain. To address this uncertainty, the flow field is parametrized using a finite number of independent canonical random variables with known densities, and the ensemble is generated by sampling these variables. For each of the resulting realizations of the uncertain current field, we predict the path that minimizes the travel time by solving a boundary value problem (BVP), based on the Pontryagin maximum principle. A family of backward-in-time trajectories starting at the end position is used to generate suitable initial values for the BVP solver. This allows us to examine and analyze the performance of the sampling strategy and to develop insight into extensions dealing with general circulation ocean models. In particular, the ensemble method enables us to perform a statistical analysis of travel times and consequently develop a path planning approach that accounts for these statistics. The proposed methodology is tested for a number of scenarios. We first validate our algorithms by reproducing simple canonical solutions, and then demonstrate our approach in more complex flow fields, including idealized, steady and unsteady double-gyre flows.
Girsanov reweighting for path ensembles and Markov state models
NASA Astrophysics Data System (ADS)
Donati, L.; Hartmann, C.; Keller, B. G.
2017-06-01
The sensitivity of molecular dynamics on changes in the potential energy function plays an important role in understanding the dynamics and function of complex molecules. We present a method to obtain path ensemble averages of a perturbed dynamics from a set of paths generated by a reference dynamics. It is based on the concept of path probability measure and the Girsanov theorem, a result from stochastic analysis to estimate a change of measure of a path ensemble. Since Markov state models (MSMs) of the molecular dynamics can be formulated as a combined phase-space and path ensemble average, the method can be extended to reweight MSMs by combining it with a reweighting of the Boltzmann distribution. We demonstrate how to efficiently implement the Girsanov reweighting in a molecular dynamics simulation program by calculating parts of the reweighting factor "on the fly" during the simulation, and we benchmark the method on test systems ranging from a two-dimensional diffusion process and an artificial many-body system to alanine dipeptide and valine dipeptide in implicit and explicit water. The method can be used to study the sensitivity of molecular dynamics on external perturbations as well as to reweight trajectories generated by enhanced sampling schemes to the original dynamics.
Li, Wenjin
2018-02-28
Transition path ensemble consists of reactive trajectories and possesses all the information necessary for the understanding of the mechanism and dynamics of important condensed phase processes. However, quantitative description of the properties of the transition path ensemble is far from being established. Here, with numerical calculations on a model system, the equipartition terms defined in thermal equilibrium were for the first time estimated in the transition path ensemble. It was not surprising to observe that the energy was not equally distributed among all the coordinates. However, the energies distributed on a pair of conjugated coordinates remained equal. Higher energies were observed to be distributed on several coordinates, which are highly coupled to the reaction coordinate, while the rest were almost equally distributed. In addition, the ensemble-averaged energy on each coordinate as a function of time was also quantified. These quantitative analyses on energy distributions provided new insights into the transition path ensemble.
A benchmark for reaction coordinates in the transition path ensemble
2016-01-01
The molecular mechanism of a reaction is embedded in its transition path ensemble, the complete collection of reactive trajectories. Utilizing the information in the transition path ensemble alone, we developed a novel metric, which we termed the emergent potential energy, for distinguishing reaction coordinates from the bath modes. The emergent potential energy can be understood as the average energy cost for making a displacement of a coordinate in the transition path ensemble. Where displacing a bath mode invokes essentially no cost, it costs significantly to move the reaction coordinate. Based on some general assumptions of the behaviors of reaction and bath coordinates in the transition path ensemble, we proved theoretically with statistical mechanics that the emergent potential energy could serve as a benchmark of reaction coordinates and demonstrated its effectiveness by applying it to a prototypical system of biomolecular dynamics. Using the emergent potential energy as guidance, we developed a committor-free and intuition-independent method for identifying reaction coordinates in complex systems. We expect this method to be applicable to a wide range of reaction processes in complex biomolecular systems. PMID:27059559
Sampling the kinetic pathways of a micelle fusion and fission transition.
Pool, René; Bolhuis, Peter G
2007-06-28
The mechanism and kinetics of micellar breakup and fusion in a dilute solution of a model surfactant are investigated by path sampling techniques. Analysis of the path ensemble gives insight in the mechanism of the transition. For larger, less stable micelles the fission/fusion occurs via a clear neck formation, while for smaller micelles the mechanism is more direct. In addition, path analysis yields an appropriate order parameter to evaluate the fusion and fission rate constants using stochastic transition interface sampling. For the small, stable micelle (50 surfactants) the computed fission rate constant is a factor of 10 lower than the fusion rate constant. The procedure opens the way for accurate calculation of free energy and kinetics for, e.g., membrane fusion, and wormlike micelle endcap formation.
Reactive trajectories of the Ru2+/3+ self-exchange reaction and the connection to Marcus' theory.
Tiwari, Ambuj; Ensing, Bernd
2016-12-22
Outer sphere electron transfer between two ions in aqueous solution is a rare event on the time scale of first principles molecular dynamics simulations. We have used transition path sampling to generate an ensemble of reactive trajectories of the self-exchange reaction between a pair of Ru 2+ and Ru 3+ ions in water. To distinguish between the reactant and product states, we use as an order parameter the position of the maximally localised Wannier center associated with the transferring electron. This allows us to align the trajectories with respect to the moment of barrier crossing and compute statistical averages over the path ensemble. We compare our order parameter with two typical reaction coordinates used in applications of Marcus theory of electron transfer: the vertical gap energy and the solvent electrostatic potential at the ions.
Fluctuating observation time ensembles in the thermodynamics of trajectories
NASA Astrophysics Data System (ADS)
Budini, Adrián A.; Turner, Robert M.; Garrahan, Juan P.
2014-03-01
The dynamics of stochastic systems, both classical and quantum, can be studied by analysing the statistical properties of dynamical trajectories. The properties of ensembles of such trajectories for long, but fixed, times are described by large-deviation (LD) rate functions. These LD functions play the role of dynamical free energies: they are cumulant generating functions for time-integrated observables, and their analytic structure encodes dynamical phase behaviour. This ‘thermodynamics of trajectories’ approach is to trajectories and dynamics what the equilibrium ensemble method of statistical mechanics is to configurations and statics. Here we show that, just like in the static case, there are a variety of alternative ensembles of trajectories, each defined by their global constraints, with that of trajectories of fixed total time being just one of these. We show how the LD functions that describe an ensemble of trajectories where some time-extensive quantity is constant (and large) but where total observation time fluctuates can be mapped to those of the fixed-time ensemble. We discuss how the correspondence between generalized ensembles can be exploited in path sampling schemes for generating rare dynamical trajectories.
Statistical Analysis of the First Passage Path Ensemble of Jump Processes
NASA Astrophysics Data System (ADS)
von Kleist, Max; Schütte, Christof; Zhang, Wei
2018-02-01
The transition mechanism of jump processes between two different subsets in state space reveals important dynamical information of the processes and therefore has attracted considerable attention in the past years. In this paper, we study the first passage path ensemble of both discrete-time and continuous-time jump processes on a finite state space. The main approach is to divide each first passage path into nonreactive and reactive segments and to study them separately. The analysis can be applied to jump processes which are non-ergodic, as well as continuous-time jump processes where the waiting time distributions are non-exponential. In the particular case that the jump processes are both Markovian and ergodic, our analysis elucidates the relations between the study of the first passage paths and the study of the transition paths in transition path theory. We provide algorithms to numerically compute statistics of the first passage path ensemble. The computational complexity of these algorithms scales with the complexity of solving a linear system, for which efficient methods are available. Several examples demonstrate the wide applicability of the derived results across research areas.
Free energy landscape from path-sampling: application to the structural transition in LJ38
NASA Astrophysics Data System (ADS)
Adjanor, G.; Athènes, M.; Calvo, F.
2006-09-01
We introduce a path-sampling scheme that allows equilibrium state-ensemble averages to be computed by means of a biased distribution of non-equilibrium paths. This non-equilibrium method is applied to the case of the 38-atom Lennard-Jones atomic cluster, which has a double-funnel energy landscape. We calculate the free energy profile along the Q4 bond orientational order parameter. At high or moderate temperature the results obtained using the non-equilibrium approach are consistent with those obtained using conventional equilibrium methods, including parallel tempering and Wang-Landau Monte Carlo simulations. At lower temperatures, the non-equilibrium approach becomes more efficient in exploring the relevant inherent structures. In particular, the free energy agrees with the predictions of the harmonic superposition approximation.
Graph transformation method for calculating waiting times in Markov chains.
Trygubenko, Semen A; Wales, David J
2006-06-21
We describe an exact approach for calculating transition probabilities and waiting times in finite-state discrete-time Markov processes. All the states and the rules for transitions between them must be known in advance. We can then calculate averages over a given ensemble of paths for both additive and multiplicative properties in a nonstochastic and noniterative fashion. In particular, we can calculate the mean first-passage time between arbitrary groups of stationary points for discrete path sampling databases, and hence extract phenomenological rate constants. We present a number of examples to demonstrate the efficiency and robustness of this approach.
Cendagorta, Joseph R; Bačić, Zlatko; Tuckerman, Mark E
2018-03-14
We introduce a scheme for approximating quantum time correlation functions numerically within the Feynman path integral formulation. Starting with the symmetrized version of the correlation function expressed as a discretized path integral, we introduce a change of integration variables often used in the derivation of trajectory-based semiclassical methods. In particular, we transform to sum and difference variables between forward and backward complex-time propagation paths. Once the transformation is performed, the potential energy is expanded in powers of the difference variables, which allows us to perform the integrals over these variables analytically. The manner in which this procedure is carried out results in an open-chain path integral (in the remaining sum variables) with a modified potential that is evaluated using imaginary-time path-integral sampling rather than requiring the generation of a large ensemble of trajectories. Consequently, any number of path integral sampling schemes can be employed to compute the remaining path integral, including Monte Carlo, path-integral molecular dynamics, or enhanced path-integral molecular dynamics. We believe that this approach constitutes a different perspective in semiclassical-type approximations to quantum time correlation functions. Importantly, we argue that our approximation can be systematically improved within a cumulant expansion formalism. We test this approximation on a set of one-dimensional problems that are commonly used to benchmark approximate quantum dynamical schemes. We show that the method is at least as accurate as the popular ring-polymer molecular dynamics technique and linearized semiclassical initial value representation for correlation functions of linear operators in most of these examples and improves the accuracy of correlation functions of nonlinear operators.
NASA Astrophysics Data System (ADS)
Cendagorta, Joseph R.; Bačić, Zlatko; Tuckerman, Mark E.
2018-03-01
We introduce a scheme for approximating quantum time correlation functions numerically within the Feynman path integral formulation. Starting with the symmetrized version of the correlation function expressed as a discretized path integral, we introduce a change of integration variables often used in the derivation of trajectory-based semiclassical methods. In particular, we transform to sum and difference variables between forward and backward complex-time propagation paths. Once the transformation is performed, the potential energy is expanded in powers of the difference variables, which allows us to perform the integrals over these variables analytically. The manner in which this procedure is carried out results in an open-chain path integral (in the remaining sum variables) with a modified potential that is evaluated using imaginary-time path-integral sampling rather than requiring the generation of a large ensemble of trajectories. Consequently, any number of path integral sampling schemes can be employed to compute the remaining path integral, including Monte Carlo, path-integral molecular dynamics, or enhanced path-integral molecular dynamics. We believe that this approach constitutes a different perspective in semiclassical-type approximations to quantum time correlation functions. Importantly, we argue that our approximation can be systematically improved within a cumulant expansion formalism. We test this approximation on a set of one-dimensional problems that are commonly used to benchmark approximate quantum dynamical schemes. We show that the method is at least as accurate as the popular ring-polymer molecular dynamics technique and linearized semiclassical initial value representation for correlation functions of linear operators in most of these examples and improves the accuracy of correlation functions of nonlinear operators.
NASA Astrophysics Data System (ADS)
Orellana, Laura; Yoluk, Ozge; Carrillo, Oliver; Orozco, Modesto; Lindahl, Erik
2016-08-01
Protein conformational changes are at the heart of cell functions, from signalling to ion transport. However, the transient nature of the intermediates along transition pathways hampers their experimental detection, making the underlying mechanisms elusive. Here we retrieve dynamic information on the actual transition routes from principal component analysis (PCA) of structurally-rich ensembles and, in combination with coarse-grained simulations, explore the conformational landscapes of five well-studied proteins. Modelling them as elastic networks in a hybrid elastic-network Brownian dynamics simulation (eBDIMS), we generate trajectories connecting stable end-states that spontaneously sample the crystallographic motions, predicting the structures of known intermediates along the paths. We also show that the explored non-linear routes can delimit the lowest energy passages between end-states sampled by atomistic molecular dynamics. The integrative methodology presented here provides a powerful framework to extract and expand dynamic pathway information from the Protein Data Bank, as well as to validate sampling methods in general.
NASA Technical Reports Server (NTRS)
MIittman, David S
2011-01-01
Ensemble is an open architecture for the development, integration, and deployment of mission operations software. Fundamentally, it is an adaptation of the Eclipse Rich Client Platform (RCP), a widespread, stable, and supported framework for component-based application development. By capitalizing on the maturity and availability of the Eclipse RCP, Ensemble offers a low-risk, politically neutral path towards a tighter integration of operations tools. The Ensemble project is a highly successful, ongoing collaboration among NASA Centers. Since 2004, the Ensemble project has supported the development of mission operations software for NASA's Exploration Systems, Science, and Space Operations Directorates.
Edwards, James P; Gerber, Urs; Schubert, Christian; Trejo, Maria Anabel; Weber, Axel
2018-04-01
We introduce two integral transforms of the quantum mechanical transition kernel that represent physical information about the path integral. These transforms can be interpreted as probability distributions on particle trajectories measuring respectively the relative contribution to the path integral from paths crossing a given spatial point (the hit function) and the likelihood of values of the line integral of the potential along a path in the ensemble (the path-averaged potential).
NASA Astrophysics Data System (ADS)
Edwards, James P.; Gerber, Urs; Schubert, Christian; Trejo, Maria Anabel; Weber, Axel
2018-04-01
We introduce two integral transforms of the quantum mechanical transition kernel that represent physical information about the path integral. These transforms can be interpreted as probability distributions on particle trajectories measuring respectively the relative contribution to the path integral from paths crossing a given spatial point (the hit function) and the likelihood of values of the line integral of the potential along a path in the ensemble (the path-averaged potential).
Molloy, Kevin; Shehu, Amarda
2013-01-01
Many proteins tune their biological function by transitioning between different functional states, effectively acting as dynamic molecular machines. Detailed structural characterization of transition trajectories is central to understanding the relationship between protein dynamics and function. Computational approaches that build on the Molecular Dynamics framework are in principle able to model transition trajectories at great detail but also at considerable computational cost. Methods that delay consideration of dynamics and focus instead on elucidating energetically-credible conformational paths connecting two functionally-relevant structures provide a complementary approach. Effective sampling-based path planning methods originating in robotics have been recently proposed to produce conformational paths. These methods largely model short peptides or address large proteins by simplifying conformational space. We propose a robotics-inspired method that connects two given structures of a protein by sampling conformational paths. The method focuses on small- to medium-size proteins, efficiently modeling structural deformations through the use of the molecular fragment replacement technique. In particular, the method grows a tree in conformational space rooted at the start structure, steering the tree to a goal region defined around the goal structure. We investigate various bias schemes over a progress coordinate for balance between coverage of conformational space and progress towards the goal. A geometric projection layer promotes path diversity. A reactive temperature scheme allows sampling of rare paths that cross energy barriers. Experiments are conducted on small- to medium-size proteins of length up to 214 amino acids and with multiple known functionally-relevant states, some of which are more than 13Å apart of each-other. Analysis reveals that the method effectively obtains conformational paths connecting structural states that are significantly different. A detailed analysis on the depth and breadth of the tree suggests that a soft global bias over the progress coordinate enhances sampling and results in higher path diversity. The explicit geometric projection layer that biases the exploration away from over-sampled regions further increases coverage, often improving proximity to the goal by forcing the exploration to find new paths. The reactive temperature scheme is shown effective in increasing path diversity, particularly in difficult structural transitions with known high-energy barriers.
Improved transition path sampling methods for simulation of rare events
NASA Astrophysics Data System (ADS)
Chopra, Manan; Malshe, Rohit; Reddy, Allam S.; de Pablo, J. J.
2008-04-01
The free energy surfaces of a wide variety of systems encountered in physics, chemistry, and biology are characterized by the existence of deep minima separated by numerous barriers. One of the central aims of recent research in computational chemistry and physics has been to determine how transitions occur between deep local minima on rugged free energy landscapes, and transition path sampling (TPS) Monte-Carlo methods have emerged as an effective means for numerical investigation of such transitions. Many of the shortcomings of TPS-like approaches generally stem from their high computational demands. Two new algorithms are presented in this work that improve the efficiency of TPS simulations. The first algorithm uses biased shooting moves to render the sampling of reactive trajectories more efficient. The second algorithm is shown to substantially improve the accuracy of the transition state ensemble by introducing a subset of local transition path simulations in the transition state. The system considered in this work consists of a two-dimensional rough energy surface that is representative of numerous systems encountered in applications. When taken together, these algorithms provide gains in efficiency of over two orders of magnitude when compared to traditional TPS simulations.
Rate Constant and Reaction Coordinate of Trp-Cage Folding in Explicit Water
Juraszek, Jarek; Bolhuis, Peter G.
2008-01-01
We report rate constant calculations and a reaction coordinate analysis of the rate-limiting folding and unfolding process of the Trp-cage mini-protein in explicit solvent using transition interface sampling. Previous transition path sampling simulations revealed that in this (un)folding process the protein maintains its compact configuration, while a (de)increase of secondary structure is observed. The calculated folding rate agrees reasonably with experiment, while the unfolding rate is 10 times higher. We discuss possible origins for this mismatch. We recomputed the rates with the forward flux sampling method, and found a discrepancy of four orders of magnitude, probably caused by the method's higher sensitivity to the choice of order parameter with respect to transition interface sampling. Finally, we used the previously computed transition path-sampling ensemble to screen combinations of many order parameters for the best model of the reaction coordinate by employing likelihood maximization. We found that a combination of the root mean-square deviation of the helix and of the entire protein was, of the set of tried order parameters, the one that best describes the reaction coordination. PMID:18676648
Zwier, Matthew C.; Adelman, Joshua L.; Kaus, Joseph W.; Pratt, Adam J.; Wong, Kim F.; Rego, Nicholas B.; Suárez, Ernesto; Lettieri, Steven; Wang, David W.; Grabe, Michael; Zuckerman, Daniel M.; Chong, Lillian T.
2015-01-01
The weighted ensemble (WE) path sampling approach orchestrates an ensemble of parallel calculations with intermittent communication to enhance the sampling of rare events, such as molecular associations or conformational changes in proteins or peptides. Trajectories are replicated and pruned in a way that focuses computational effort on under-explored regions of configuration space while maintaining rigorous kinetics. To enable the simulation of rare events at any scale (e.g. atomistic, cellular), we have developed an open-source, interoperable, and highly scalable software package for the execution and analysis of WE simulations: WESTPA (The Weighted Ensemble Simulation Toolkit with Parallelization and Analysis). WESTPA scales to thousands of CPU cores and includes a suite of analysis tools that have been implemented in a massively parallel fashion. The software has been designed to interface conveniently with any dynamics engine and has already been used with a variety of molecular dynamics (e.g. GROMACS, NAMD, OpenMM, AMBER) and cell-modeling packages (e.g. BioNetGen, MCell). WESTPA has been in production use for over a year, and its utility has been demonstrated for a broad set of problems, ranging from atomically detailed host-guest associations to non-spatial chemical kinetics of cellular signaling networks. The following describes the design and features of WESTPA, including the facilities it provides for running WE simulations, storing and analyzing WE simulation data, as well as examples of input and output. PMID:26392815
Long-time Dynamics of Stochastic Wave Breaking
NASA Astrophysics Data System (ADS)
Restrepo, J. M.; Ramirez, J. M.; Deike, L.; Melville, K.
2017-12-01
A stochastic parametrization is proposed for the dynamics of wave breaking of progressive water waves. The model is shown to agree with transport estimates, derived from the Lagrangian path of fluid parcels. These trajectories are obtained numerically and are shown to agree well with theory in the non-breaking regime. Of special interest is the impact of wave breaking on transport, momentum exchanges and energy dissipation, as well as dispersion of trajectories. The proposed model, ensemble averaged to larger time scales, is compared to ensemble averages of the numerically generated parcel dynamics, and is then used to capture energy dissipation and path dispersion.
Quantum Gibbs ensemble Monte Carlo
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fantoni, Riccardo, E-mail: rfantoni@ts.infn.it; Moroni, Saverio, E-mail: moroni@democritos.it
We present a path integral Monte Carlo method which is the full quantum analogue of the Gibbs ensemble Monte Carlo method of Panagiotopoulos to study the gas-liquid coexistence line of a classical fluid. Unlike previous extensions of Gibbs ensemble Monte Carlo to include quantum effects, our scheme is viable even for systems with strong quantum delocalization in the degenerate regime of temperature. This is demonstrated by an illustrative application to the gas-superfluid transition of {sup 4}He in two dimensions.
2013-01-01
Background Many proteins tune their biological function by transitioning between different functional states, effectively acting as dynamic molecular machines. Detailed structural characterization of transition trajectories is central to understanding the relationship between protein dynamics and function. Computational approaches that build on the Molecular Dynamics framework are in principle able to model transition trajectories at great detail but also at considerable computational cost. Methods that delay consideration of dynamics and focus instead on elucidating energetically-credible conformational paths connecting two functionally-relevant structures provide a complementary approach. Effective sampling-based path planning methods originating in robotics have been recently proposed to produce conformational paths. These methods largely model short peptides or address large proteins by simplifying conformational space. Methods We propose a robotics-inspired method that connects two given structures of a protein by sampling conformational paths. The method focuses on small- to medium-size proteins, efficiently modeling structural deformations through the use of the molecular fragment replacement technique. In particular, the method grows a tree in conformational space rooted at the start structure, steering the tree to a goal region defined around the goal structure. We investigate various bias schemes over a progress coordinate for balance between coverage of conformational space and progress towards the goal. A geometric projection layer promotes path diversity. A reactive temperature scheme allows sampling of rare paths that cross energy barriers. Results and conclusions Experiments are conducted on small- to medium-size proteins of length up to 214 amino acids and with multiple known functionally-relevant states, some of which are more than 13Å apart of each-other. Analysis reveals that the method effectively obtains conformational paths connecting structural states that are significantly different. A detailed analysis on the depth and breadth of the tree suggests that a soft global bias over the progress coordinate enhances sampling and results in higher path diversity. The explicit geometric projection layer that biases the exploration away from over-sampled regions further increases coverage, often improving proximity to the goal by forcing the exploration to find new paths. The reactive temperature scheme is shown effective in increasing path diversity, particularly in difficult structural transitions with known high-energy barriers. PMID:24565158
Measuring excess free energies of self-assembled membrane structures.
Norizoe, Yuki; Daoulas, Kostas Ch; Müller, Marcus
2010-01-01
Using computer simulation of a solvent-free, coarse-grained model for amphiphilic membranes, we study the excess free energy of hourglass-shaped connections (i.e., stalks) between two apposed bilayer membranes. In order to calculate the free energy by simulation in the canonical ensemble, we reversibly transfer two apposed bilayers into a configuration with a stalk in three steps. First, we gradually replace the intermolecular interactions by an external, ordering field. The latter is chosen such that the structure of the non-interacting system in this field closely resembles the structure of the original, interacting system in the absence of the external field. The absence of structural changes along this path suggests that it is reversible; a fact which is confirmed by expanded-ensemble simulations. Second, the external, ordering field is changed as to transform the non-interacting system from the apposed bilayer structure to two-bilayers connected by a stalk. The final external field is chosen such that the structure of the non-interacting system resembles the structure of the stalk in the interacting system without a field. On the third branch of the transformation path, we reversibly replace the external, ordering field by non-bonded interactions. Using expanded-ensemble techniques, the free energy change along this reversible path can be obtained with an accuracy of 10(-3)k(B)T per molecule in the n VT-ensemble. Calculating the chemical potential, we obtain the free energy of a stalk in the grandcanonical ensemble, and employing semi-grandcanonical techniques, we calculate the change of the excess free energy upon altering the molecular architecture. This computational strategy can be applied to compute the free energy of self-assembled phases in lipid and copolymer systems, and the excess free energy of defects or interfaces.
Parameter Uncertainty on AGCM-simulated Tropical Cyclones
NASA Astrophysics Data System (ADS)
He, F.
2015-12-01
This work studies the parameter uncertainty on tropical cyclone (TC) simulations in Atmospheric General Circulation Models (AGCMs) using the Reed-Jablonowski TC test case, which is illustrated in Community Atmosphere Model (CAM). It examines the impact from 24 parameters across the physical parameterization schemes that represent the convection, turbulence, precipitation and cloud processes in AGCMs. The one-at-a-time (OAT) sensitivity analysis method first quantifies their relative importance on TC simulations and identifies the key parameters to the six different TC characteristics: intensity, precipitation, longwave cloud radiative forcing (LWCF), shortwave cloud radiative forcing (SWCF), cloud liquid water path (LWP) and ice water path (IWP). Then, 8 physical parameters are chosen and perturbed using the Latin-Hypercube Sampling (LHS) method. The comparison between OAT ensemble run and LHS ensemble run shows that the simulated TC intensity is mainly affected by the parcel fractional mass entrainment rate in Zhang-McFarlane (ZM) deep convection scheme. The nonlinear interactive effect among different physical parameters is negligible on simulated TC intensity. In contrast, this nonlinear interactive effect plays a significant role in other simulated tropical cyclone characteristics (precipitation, LWCF, SWCF, LWP and IWP) and greatly enlarge their simulated uncertainties. The statistical emulator Extended Multivariate Adaptive Regression Splines (EMARS) is applied to characterize the response functions for nonlinear effect. Last, we find that the intensity uncertainty caused by physical parameters is in a degree comparable to uncertainty caused by model structure (e.g. grid) and initial conditions (e.g. sea surface temperature, atmospheric moisture). These findings suggest the importance of using the perturbed physics ensemble (PPE) method to revisit tropical cyclone prediction under climate change scenario.
Nonequilibrium umbrella sampling in spaces of many order parameters
NASA Astrophysics Data System (ADS)
Dickson, Alex; Warmflash, Aryeh; Dinner, Aaron R.
2009-02-01
We recently introduced an umbrella sampling method for obtaining nonequilibrium steady-state probability distributions projected onto an arbitrary number of coordinates that characterize a system (order parameters) [A. Warmflash, P. Bhimalapuram, and A. R. Dinner, J. Chem. Phys. 127, 154112 (2007)]. Here, we show how our algorithm can be combined with the image update procedure from the finite-temperature string method for reversible processes [E. Vanden-Eijnden and M. Venturoli, "Revisiting the finite temperature string method for calculation of reaction tubes and free energies," J. Chem. Phys. (in press)] to enable restricted sampling of a nonequilibrium steady state in the vicinity of a path in a many-dimensional space of order parameters. For the study of transitions between stable states, the adapted algorithm results in improved scaling with the number of order parameters and the ability to progressively refine the regions of enforced sampling. We demonstrate the algorithm by applying it to a two-dimensional model of driven Brownian motion and a coarse-grained (Ising) model for nucleation under shear. It is found that the choice of order parameters can significantly affect the convergence of the simulation; local magnetization variables other than those used previously for sampling transition paths in Ising systems are needed to ensure that the reactive flux is primarily contained within a tube in the space of order parameters. The relation of this method to other algorithms that sample the statistics of path ensembles is discussed.
Path Similarity Analysis: A Method for Quantifying Macromolecular Pathways
Seyler, Sean L.; Kumar, Avishek; Thorpe, M. F.; Beckstein, Oliver
2015-01-01
Diverse classes of proteins function through large-scale conformational changes and various sophisticated computational algorithms have been proposed to enhance sampling of these macromolecular transition paths. Because such paths are curves in a high-dimensional space, it has been difficult to quantitatively compare multiple paths, a necessary prerequisite to, for instance, assess the quality of different algorithms. We introduce a method named Path Similarity Analysis (PSA) that enables us to quantify the similarity between two arbitrary paths and extract the atomic-scale determinants responsible for their differences. PSA utilizes the full information available in 3N-dimensional configuration space trajectories by employing the Hausdorff or Fréchet metrics (adopted from computational geometry) to quantify the degree of similarity between piecewise-linear curves. It thus completely avoids relying on projections into low dimensional spaces, as used in traditional approaches. To elucidate the principles of PSA, we quantified the effect of path roughness induced by thermal fluctuations using a toy model system. Using, as an example, the closed-to-open transitions of the enzyme adenylate kinase (AdK) in its substrate-free form, we compared a range of protein transition path-generating algorithms. Molecular dynamics-based dynamic importance sampling (DIMS) MD and targeted MD (TMD) and the purely geometric FRODA (Framework Rigidity Optimized Dynamics Algorithm) were tested along with seven other methods publicly available on servers, including several based on the popular elastic network model (ENM). PSA with clustering revealed that paths produced by a given method are more similar to each other than to those from another method and, for instance, that the ENM-based methods produced relatively similar paths. PSA applied to ensembles of DIMS MD and FRODA trajectories of the conformational transition of diphtheria toxin, a particularly challenging example, showed that the geometry-based FRODA occasionally sampled the pathway space of force field-based DIMS MD. For the AdK transition, the new concept of a Hausdorff-pair map enabled us to extract the molecular structural determinants responsible for differences in pathways, namely a set of conserved salt bridges whose charge-charge interactions are fully modelled in DIMS MD but not in FRODA. PSA has the potential to enhance our understanding of transition path sampling methods, validate them, and to provide a new approach to analyzing conformational transitions. PMID:26488417
Effects of a mutation on the folding mechanism of a beta-hairpin.
Juraszek, Jarek; Bolhuis, Peter G
2009-12-17
The folding mechanism of a protein is determined by its primary sequence. Yet, how the mechanism is changed by a mutation is still poorly understood, even for basic secondary structures such as beta-hairpins. We perform an extensive simulation study of the effects of mutating the GB1 beta-hairpin into Trpzip4 (Y5W, F12W, V14W) on the folding mechanism. While Trpzip4 has a much more stable native state due to very strong hydrophobic interactions of the side chains, its folding rate does not differ significantly from the wild type beta-hairpin. We sample the free-energy landscapes of both hairpins with Replica Exchange Molecular Dynamics (REMD) and identify the four (meta)stable states (U, H, F, and N). Using Transition Path Sampling (TPS), we then harvest ensembles of unbiased pathways between the H and F states and between the F and N states to investigate the unbiased folding mechanisms. In both hairpins, the hydrophobic collapse (U-H) is followed by the middle hydrogen bond formation (H-F), and finally a closing of the strands in a zipper-like fashion (F-N). For the Trpzip4, the path ensembles indicate that the final F-N step is much more difficult than for GB1 and involves partial unfolding, rezipping of hydrogen bonds, and rearrangement of the Trp-14 side chain. For the rate-limiting (H-F) step, the path ensembles show that in GB1 desolvation and strand closure go hand in hand, while in Trpzip4 desolvation is decoupled from strand closure. Nevertheless, likelihood maximization shows that the reaction coordinate for both hairpins remains the interstrand distance. We conclude that the folding mechanism of both hairpins is a combination of hydrophobic collapse and zipping of hydrogen bonds but that the zipper mechanism is more visible in Trpzip4. A major difference between the two hairpins is that in the transition state of the rate-limiting step for Trpzip4 one tryptophan is exposed to the solvent due to steric hindrance, making the folding mechanism more complex and leading to an increased F-N barrier. Thus, our results show in atomistic detail how a mutation leads to a different folding mechanism and results in a more frustrated folding free-energy landscape.
NASA Technical Reports Server (NTRS)
Hizanidis, Kyriakos; Vlahos, L.; Polymilis, C.
1989-01-01
The relativistic motion of an ensemble of electrons in an intense monochromatic electromagnetic wave propagating obliquely in a uniform external magnetic field is studied. The problem is formulated from the viewpoint of Hamiltonian theory and the Fokker-Planck-Kolmogorov approach analyzed by Hizanidis (1989), leading to a one-dimensional diffusive acceleration along paths of constant zeroth-order generalized Hamiltonian. For values of the wave amplitude and the propagating angle inside the analytically predicted stochastic region, the numerical results suggest that the diffusion probes proceeds in stages. In the first stage, the electrons are accelerated to relatively high energies by sampling the first few overlapping resonances one by one. During that stage, the ensemble-average square deviation of the variable involved scales quadratically with time. During the second stage, they scale linearly with time. For much longer times, deviation from linear scaling slowly sets in.
Asymptotic Linear Spectral Statistics for Spiked Hermitian Random Matrices
NASA Astrophysics Data System (ADS)
Passemier, Damien; McKay, Matthew R.; Chen, Yang
2015-07-01
Using the Coulomb Fluid method, this paper derives central limit theorems (CLTs) for linear spectral statistics of three "spiked" Hermitian random matrix ensembles. These include Johnstone's spiked model (i.e., central Wishart with spiked correlation), non-central Wishart with rank-one non-centrality, and a related class of non-central matrices. For a generic linear statistic, we derive simple and explicit CLT expressions as the matrix dimensions grow large. For all three ensembles under consideration, we find that the primary effect of the spike is to introduce an correction term to the asymptotic mean of the linear spectral statistic, which we characterize with simple formulas. The utility of our proposed framework is demonstrated through application to three different linear statistics problems: the classical likelihood ratio test for a population covariance, the capacity analysis of multi-antenna wireless communication systems with a line-of-sight transmission path, and a classical multiple sample significance testing problem.
Synoptic Factors Affecting Structure Predictability of Hurricane Alex (2016)
NASA Astrophysics Data System (ADS)
Gonzalez-Aleman, J. J.; Evans, J. L.; Kowaleski, A. M.
2016-12-01
On January 7, 2016, a disturbance formed over the western North Atlantic basin. After undergoing tropical transition, the system became the first hurricane of 2016 - and the first North Atlantic hurricane to form in January since 1938. Already an extremely rare hurricane event, Alex then underwent extratropical transition [ET] just north of the Azores Islands. We examine the factors affecting Alex's structural evolution through a new technique called path-clustering. In this way, 51 ensembles from the European Centre for Medium-Range Weather Forecasts Ensemble Prediction System (ECMWF-EPS) are grouped based on similarities in the storm's path through the Cyclone Phase Space (CPS). The differing clusters group various possible scenarios of structural development represented in the ensemble forecasts. As a result, it is possible to shed light on the role of the synoptic scale in changing the structure of this hurricane in the midlatitudes through intercomparison of the most "realistic" forecast of the evolution of Alex and the other physically plausible modes of its development.
Entanglement between two spatially separated atomic modes
NASA Astrophysics Data System (ADS)
Lange, Karsten; Peise, Jan; Lücke, Bernd; Kruse, Ilka; Vitagliano, Giuseppe; Apellaniz, Iagoba; Kleinmann, Matthias; Tóth, Géza; Klempt, Carsten
2018-04-01
Modern quantum technologies in the fields of quantum computing, quantum simulation, and quantum metrology require the creation and control of large ensembles of entangled particles. In ultracold ensembles of neutral atoms, nonclassical states have been generated with mutual entanglement among thousands of particles. The entanglement generation relies on the fundamental particle-exchange symmetry in ensembles of identical particles, which lacks the standard notion of entanglement between clearly definable subsystems. Here, we present the generation of entanglement between two spatially separated clouds by splitting an ensemble of ultracold identical particles prepared in a twin Fock state. Because the clouds can be addressed individually, our experiments open a path to exploit the available entangled states of indistinguishable particles for quantum information applications.
SSAGES: Software Suite for Advanced General Ensemble Simulations.
Sidky, Hythem; Colón, Yamil J; Helfferich, Julian; Sikora, Benjamin J; Bezik, Cody; Chu, Weiwei; Giberti, Federico; Guo, Ashley Z; Jiang, Xikai; Lequieu, Joshua; Li, Jiyuan; Moller, Joshua; Quevillon, Michael J; Rahimi, Mohammad; Ramezani-Dakhel, Hadi; Rathee, Vikramjit S; Reid, Daniel R; Sevgen, Emre; Thapar, Vikram; Webb, Michael A; Whitmer, Jonathan K; de Pablo, Juan J
2018-01-28
Molecular simulation has emerged as an essential tool for modern-day research, but obtaining proper results and making reliable conclusions from simulations requires adequate sampling of the system under consideration. To this end, a variety of methods exist in the literature that can enhance sampling considerably, and increasingly sophisticated, effective algorithms continue to be developed at a rapid pace. Implementation of these techniques, however, can be challenging for experts and non-experts alike. There is a clear need for software that provides rapid, reliable, and easy access to a wide range of advanced sampling methods and that facilitates implementation of new techniques as they emerge. Here we present SSAGES, a publicly available Software Suite for Advanced General Ensemble Simulations designed to interface with multiple widely used molecular dynamics simulations packages. SSAGES allows facile application of a variety of enhanced sampling techniques-including adaptive biasing force, string methods, and forward flux sampling-that extract meaningful free energy and transition path data from all-atom and coarse-grained simulations. A noteworthy feature of SSAGES is a user-friendly framework that facilitates further development and implementation of new methods and collective variables. In this work, the use of SSAGES is illustrated in the context of simple representative applications involving distinct methods and different collective variables that are available in the current release of the suite. The code may be found at: https://github.com/MICCoM/SSAGES-public.
Annealed importance sampling with constant cooling rate
NASA Astrophysics Data System (ADS)
Giovannelli, Edoardo; Cardini, Gianni; Gellini, Cristina; Pietraperzia, Giangaetano; Chelli, Riccardo
2015-02-01
Annealed importance sampling is a simulation method devised by Neal [Stat. Comput. 11, 125 (2001)] to assign weights to configurations generated by simulated annealing trajectories. In particular, the equilibrium average of a generic physical quantity can be computed by a weighted average exploiting weights and estimates of this quantity associated to the final configurations of the annealed trajectories. Here, we review annealed importance sampling from the perspective of nonequilibrium path-ensemble averages [G. E. Crooks, Phys. Rev. E 61, 2361 (2000)]. The equivalence of Neal's and Crooks' treatments highlights the generality of the method, which goes beyond the mere thermal-based protocols. Furthermore, we show that a temperature schedule based on a constant cooling rate outperforms stepwise cooling schedules and that, for a given elapsed computer time, performances of annealed importance sampling are, in general, improved by increasing the number of intermediate temperatures.
SSAGES: Software Suite for Advanced General Ensemble Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sidky, Hythem; Colón, Yamil J.; Helfferich, Julian
Molecular simulation has emerged as an essential tool for modern-day research, but obtaining proper results and making reliable conclusions from simulations requires adequate sampling of the system under consideration. To this end, a variety of methods exist in the literature that can enhance sampling considerably, and increasingly sophisticated, effective algorithms continue to be developed at a rapid pace. Implementation of these techniques, however, can be challenging for experts and non-experts alike. There is a clear need for software that provides rapid, reliable, and easy access to a wide range of advanced sampling methods, and that facilitates implementation of new techniquesmore » as they emerge. Here we present SSAGES, a publicly available Software Suite for Advanced General Ensemble Simulations designed to interface with multiple widely used molecular dynamics simulations packages. SSAGES allows facile application of a variety of enhanced sampling techniques—including adaptive biasing force, string methods, and forward flux sampling—that extract meaningful free energy and transition path data from all-atom and coarse grained simulations. A noteworthy feature of SSAGES is a user-friendly framework that facilitates further development and implementation of new methods and collective variables. In this work, the use of SSAGES is illustrated in the context of simple representative applications involving distinct methods and different collective variables that are available in the current release of the suite.« less
SSAGES: Software Suite for Advanced General Ensemble Simulations
NASA Astrophysics Data System (ADS)
Sidky, Hythem; Colón, Yamil J.; Helfferich, Julian; Sikora, Benjamin J.; Bezik, Cody; Chu, Weiwei; Giberti, Federico; Guo, Ashley Z.; Jiang, Xikai; Lequieu, Joshua; Li, Jiyuan; Moller, Joshua; Quevillon, Michael J.; Rahimi, Mohammad; Ramezani-Dakhel, Hadi; Rathee, Vikramjit S.; Reid, Daniel R.; Sevgen, Emre; Thapar, Vikram; Webb, Michael A.; Whitmer, Jonathan K.; de Pablo, Juan J.
2018-01-01
Molecular simulation has emerged as an essential tool for modern-day research, but obtaining proper results and making reliable conclusions from simulations requires adequate sampling of the system under consideration. To this end, a variety of methods exist in the literature that can enhance sampling considerably, and increasingly sophisticated, effective algorithms continue to be developed at a rapid pace. Implementation of these techniques, however, can be challenging for experts and non-experts alike. There is a clear need for software that provides rapid, reliable, and easy access to a wide range of advanced sampling methods and that facilitates implementation of new techniques as they emerge. Here we present SSAGES, a publicly available Software Suite for Advanced General Ensemble Simulations designed to interface with multiple widely used molecular dynamics simulations packages. SSAGES allows facile application of a variety of enhanced sampling techniques—including adaptive biasing force, string methods, and forward flux sampling—that extract meaningful free energy and transition path data from all-atom and coarse-grained simulations. A noteworthy feature of SSAGES is a user-friendly framework that facilitates further development and implementation of new methods and collective variables. In this work, the use of SSAGES is illustrated in the context of simple representative applications involving distinct methods and different collective variables that are available in the current release of the suite. The code may be found at: https://github.com/MICCoM/SSAGES-public.
PhytoPath: an integrative resource for plant pathogen genomics.
Pedro, Helder; Maheswari, Uma; Urban, Martin; Irvine, Alistair George; Cuzick, Alayne; McDowall, Mark D; Staines, Daniel M; Kulesha, Eugene; Hammond-Kosack, Kim Elizabeth; Kersey, Paul Julian
2016-01-04
PhytoPath (www.phytopathdb.org) is a resource for genomic and phenotypic data from plant pathogen species, that integrates phenotypic data for genes from PHI-base, an expertly curated catalog of genes with experimentally verified pathogenicity, with the Ensembl tools for data visualization and analysis. The resource is focused on fungi, protists (oomycetes) and bacterial plant pathogens that have genomes that have been sequenced and annotated. Genes with associated PHI-base data can be easily identified across all plant pathogen species using a BioMart-based query tool and visualized in their genomic context on the Ensembl genome browser. The PhytoPath resource contains data for 135 genomic sequences from 87 plant pathogen species, and 1364 genes curated for their role in pathogenicity and as targets for chemical intervention. Support for community annotation of gene models is provided using the WebApollo online gene editor, and we are working with interested communities to improve reference annotation for selected species. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
PathVisio-Faceted Search: an exploration tool for multi-dimensional navigation of large pathways
Fried, Jake Y.; Luna, Augustin
2013-01-01
Purpose: The PathVisio-Faceted Search plugin helps users explore and understand complex pathways by overlaying experimental data and data from webservices, such as Ensembl BioMart, onto diagrams drawn using formalized notations in PathVisio. The plugin then provides a filtering mechanism, known as a faceted search, to find and highlight diagram nodes (e.g. genes and proteins) of interest based on imported data. The tool additionally provides a flexible scripting mechanism to handle complex queries. Availability: The PathVisio-Faceted Search plugin is compatible with PathVisio 3.0 and above. PathVisio is compatible with Windows, Mac OS X and Linux. The plugin, documentation, example diagrams and Groovy scripts are available at http://PathVisio.org/wiki/PathVisioFacetedSearchHelp. The plugin is free, open-source and licensed by the Apache 2.0 License. Contact: augustin@mail.nih.gov or jakeyfried@gmail.com PMID:23547033
Ensemble Sampling vs. Time Sampling in Molecular Dynamics Simulations of Thermal Conductivity
Gordiz, Kiarash; Singh, David J.; Henry, Asegun
2015-01-29
In this report we compare time sampling and ensemble averaging as two different methods available for phase space sampling. For the comparison, we calculate thermal conductivities of solid argon and silicon structures, using equilibrium molecular dynamics. We introduce two different schemes for the ensemble averaging approach, and show that both can reduce the total simulation time as compared to time averaging. It is also found that velocity rescaling is an efficient mechanism for phase space exploration. Although our methodology is tested using classical molecular dynamics, the ensemble generation approaches may find their greatest utility in computationally expensive simulations such asmore » first principles molecular dynamics. For such simulations, where each time step is costly, time sampling can require long simulation times because each time step must be evaluated sequentially and therefore phase space averaging is achieved through sequential operations. On the other hand, with ensemble averaging, phase space sampling can be achieved through parallel operations, since each ensemble is independent. For this reason, particularly when using massively parallel architectures, ensemble sampling can result in much shorter simulation times and exhibits similar overall computational effort.« less
NASA Astrophysics Data System (ADS)
Hoteit, I.; Hollt, T.; Hadwiger, M.; Knio, O. M.; Gopalakrishnan, G.; Zhan, P.
2016-02-01
Ocean reanalyses and forecasts are nowadays generated by combining ensemble simulations with data assimilation techniques. Most of these techniques resample the ensemble members after each assimilation cycle. Tracking behavior over time, such as all possible paths of a particle in an ensemble vector field, becomes very difficult, as the number of combinations rises exponentially with the number of assimilation cycles. In general a single possible path is not of interest but only the probabilities that any point in space might be reached by a particle at some point in time. We present an approach using probability-weighted piecewise particle trajectories to allow for interactive probability mapping. This is achieved by binning the domain and splitting up the tracing process into the individual assimilation cycles, so that particles that fall into the same bin after a cycle can be treated as a single particle with a larger probability as input for the next cycle. As a result we loose the possibility to track individual particles, but can create probability maps for any desired seed at interactive rates. The technique is integrated in an interactive visualization system that enables the visual analysis of the particle traces side by side with other forecast variables, such as the sea surface height, and their corresponding behavior over time. By harnessing the power of modern graphics processing units (GPUs) for visualization as well as computation, our system allows the user to browse through the simulation ensembles in real-time, view specific parameter settings or simulation models and move between different spatial or temporal regions without delay. In addition our system provides advanced visualizations to highlight the uncertainty, or show the complete distribution of the simulations at user-defined positions over the complete time series of the domain.
Resonance fluorescence trajectories in superconducting qubit
NASA Astrophysics Data System (ADS)
Naghiloo, Mahdi; Tan, Dian; Harrington, Patrick; Lewalle, Philippe; Jordan, Andrew; Murch, Kater
We employ phase-sensitive amplification to perform homodyne detection of the resonance fluorescence from a driven superconducting artificial atom. Entanglement between the emitter and its fluorescence allows us to track the individual quantum state trajectories of the emitter. We analyze the ensemble properties of these trajectories by considering paths that connect specific initial and final states. By applying a stochastic path integral formalism, we calculate equations of motion for the most likely path between two quantum states and compare these predicted paths to experimental data. Drawing on the mathematical similarity between the action formalism of the most likely quantum paths and ray optics, we study the emergence of caustics in quantum trajectories-situations where multiple extrema in the stochastic action occur. We observe such multiple most likely paths in experimental data and find these paths to be in reasonable quantitative agreement with theoretical calculations. Supported by the John Templeton Foundation.
Water evaporation: a transition path sampling study.
Varilly, Patrick; Chandler, David
2013-02-07
We use transition path sampling to study evaporation in the SPC/E model of liquid water. On the basis of thousands of evaporation trajectories, we characterize the members of the transition state ensemble (TSE), which exhibit a liquid-vapor interface with predominantly negative mean curvature at the site of evaporation. We also find that after evaporation is complete, the distributions of translational and angular momenta of the evaporated water are Maxwellian with a temperature equal to that of the liquid. To characterize the evaporation trajectories in their entirety, we find that it suffices to project them onto just two coordinates: the distance of the evaporating molecule to the instantaneous liquid-vapor interface and the velocity of the water along the average interface normal. In this projected space, we find that the TSE is well-captured by a simple model of ballistic escape from a deep potential well, with no additional barrier to evaporation beyond the cohesive strength of the liquid. Equivalently, they are consistent with a near-unity probability for a water molecule impinging upon a liquid droplet to condense. These results agree with previous simulations and with some, but not all, recent experiments.
Importance sampling large deviations in nonequilibrium steady states. I.
Ray, Ushnish; Chan, Garnet Kin-Lic; Limmer, David T
2018-03-28
Large deviation functions contain information on the stability and response of systems driven into nonequilibrium steady states and in such a way are similar to free energies for systems at equilibrium. As with equilibrium free energies, evaluating large deviation functions numerically for all but the simplest systems is difficult because by construction they depend on exponentially rare events. In this first paper of a series, we evaluate different trajectory-based sampling methods capable of computing large deviation functions of time integrated observables within nonequilibrium steady states. We illustrate some convergence criteria and best practices using a number of different models, including a biased Brownian walker, a driven lattice gas, and a model of self-assembly. We show how two popular methods for sampling trajectory ensembles, transition path sampling and diffusion Monte Carlo, suffer from exponentially diverging correlations in trajectory space as a function of the bias parameter when estimating large deviation functions. Improving the efficiencies of these algorithms requires introducing guiding functions for the trajectories.
Importance sampling large deviations in nonequilibrium steady states. I
NASA Astrophysics Data System (ADS)
Ray, Ushnish; Chan, Garnet Kin-Lic; Limmer, David T.
2018-03-01
Large deviation functions contain information on the stability and response of systems driven into nonequilibrium steady states and in such a way are similar to free energies for systems at equilibrium. As with equilibrium free energies, evaluating large deviation functions numerically for all but the simplest systems is difficult because by construction they depend on exponentially rare events. In this first paper of a series, we evaluate different trajectory-based sampling methods capable of computing large deviation functions of time integrated observables within nonequilibrium steady states. We illustrate some convergence criteria and best practices using a number of different models, including a biased Brownian walker, a driven lattice gas, and a model of self-assembly. We show how two popular methods for sampling trajectory ensembles, transition path sampling and diffusion Monte Carlo, suffer from exponentially diverging correlations in trajectory space as a function of the bias parameter when estimating large deviation functions. Improving the efficiencies of these algorithms requires introducing guiding functions for the trajectories.
Quantum caustics in resonance-fluorescence trajectories
NASA Astrophysics Data System (ADS)
Naghiloo, M.; Tan, D.; Harrington, P. M.; Lewalle, P.; Jordan, A. N.; Murch, K. W.
2017-11-01
We employ phase-sensitive amplification to perform homodyne detection of the resonance fluorescence from a driven superconducting artificial atom. Entanglement between the emitter and its fluorescence allows us to track the individual quantum state trajectories of the emitter conditioned on the outcomes of the field measurements. We analyze the ensemble properties of these trajectories by considering trajectories that connect specific initial and final states. By applying the stochastic path-integral formalism, we calculate equations of motion for the most-likely path between two quantum states and compare these predicted paths to experimental data. Drawing on the mathematical similarity between the action formalism of the most-likely quantum paths and ray optics, we study the emergence of caustics in quantum trajectories: places where multiple extrema in the stochastic action occur. We observe such multiple most-likely paths in experimental data and find these paths to be in reasonable quantitative agreement with theoretical calculations.
Ensemble: an Architecture for Mission-Operations Software
NASA Technical Reports Server (NTRS)
Norris, Jeffrey; Powell, Mark; Fox, Jason; Rabe, Kenneth; Shu, IHsiang; McCurdy, Michael; Vera, Alonso
2008-01-01
Ensemble is the name of an open architecture for, and a methodology for the development of, spacecraft mission operations software. Ensemble is also potentially applicable to the development of non-spacecraft mission-operations- type software. Ensemble capitalizes on the strengths of the open-source Eclipse software and its architecture to address several issues that have arisen repeatedly in the development of mission-operations software: Heretofore, mission-operations application programs have been developed in disparate programming environments and integrated during the final stages of development of missions. The programs have been poorly integrated, and it has been costly to develop, test, and deploy them. Users of each program have been forced to interact with several different graphical user interfaces (GUIs). Also, the strategy typically used in integrating the programs has yielded serial chains of operational software tools of such a nature that during use of a given tool, it has not been possible to gain access to the capabilities afforded by other tools. In contrast, the Ensemble approach offers a low-risk path towards tighter integration of mission-operations software tools.
Creating ensembles of decision trees through sampling
Kamath, Chandrika; Cantu-Paz, Erick
2005-08-30
A system for decision tree ensembles that includes a module to read the data, a module to sort the data, a module to evaluate a potential split of the data according to some criterion using a random sample of the data, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method is based on statistical sampling techniques and includes the steps of reading the data; sorting the data; evaluating a potential split according to some criterion using a random sample of the data, splitting the data, and combining multiple decision trees in ensembles.
NASA Astrophysics Data System (ADS)
Chen, L. A.; Doddridge, B. G.; Dickerson, R. R.
2001-12-01
As the primary field experiment for Maryland Aerosol Research and CHaracterization (MARCH-Atlantic) study, chemically speciated PM2.5 has been sampled at Fort Meade (FME, 39.10° N 76.74° W) since July 1999. FME is suburban, located in the middle of the bustling Baltimore-Washington corridor, which is generally downwind of the highly industrialized Midwest. Due to this unique sampling location, the PM2.5 observed at FME is expected to be of both local and regional sources, with relative contributions varying temporally. This variation, believed to be largely controlled by the meteorology, influences day-to-day or seasonal profiles of PM2.5 mass concentration and chemical composition. Air parcel back trajectories, which describe the path of air parcels traveling backward in time from site (receptor), reflect changes in the synoptic meteorological conditions. In this paper, an ensemble back trajectory method is employed to study the meteorology associated with each high/low PM2.5 episode in different seasons. For every sampling day, the residence time of air parcels within the eastern US at a 1° x 1° x 500 m geographic resolution can be estimated in order to resolve areas likely dominating the production of various PM2.5 components. Local sources are found to be more dominant in winter than in summer. "Factor analysis" is based on mass balance approach, providing useful insights on air pollution data. Here, a newly developed factor analysis model (UNMIX) is used to extract source profiles and contributions from the speciated PM2.5 data. Combing the model results with ensemble back trajectory method improves the understanding of the source regions and helps partition the contributions from local or more distant areas. >http://www.meto.umd.edu/~bruce/MARCH-Atl.html
NASA Astrophysics Data System (ADS)
Maleki, Yusef; Zheltikov, Aleksei M.
2018-01-01
An ensemble of nitrogen-vacancy (NV) centers coupled to a circuit QED device is shown to enable an efficient, high-fidelity generation of high-N00N states. Instead of first creating entanglement and then increasing the number of entangled particles N , our source of high-N00N states first prepares a high-N Fock state in one of the NV ensembles and then entangles it to the rest of the system. With such a strategy, high-N N00N states can be generated in just a few operational steps with an extraordinary fidelity. Once prepared, such a state can be stored over a longer period of time due to the remarkably long coherence time of NV centers.
Ovchinnikov, Victor; Karplus, Martin
2012-07-26
The popular targeted molecular dynamics (TMD) method for generating transition paths in complex biomolecular systems is revisited. In a typical TMD transition path, the large-scale changes occur early and the small-scale changes tend to occur later. As a result, the order of events in the computed paths depends on the direction in which the simulations are performed. To identify the origin of this bias, and to propose a method in which the bias is absent, variants of TMD in the restraint formulation are introduced and applied to the complex open ↔ closed transition in the protein calmodulin. Due to the global best-fit rotation that is typically part of the TMD method, the simulated system is guided implicitly along the lowest-frequency normal modes, until the large spatial scales associated with these modes are near the target conformation. The remaining portion of the transition is described progressively by higher-frequency modes, which correspond to smaller-scale rearrangements. A straightforward modification of TMD that avoids the global best-fit rotation is the locally restrained TMD (LRTMD) method, in which the biasing potential is constructed from a number of TMD potentials, each acting on a small connected portion of the protein sequence. With a uniform distribution of these elements, transition paths that lack the length-scale bias are obtained. Trajectories generated by steered MD in dihedral angle space (DSMD), a method that avoids best-fit rotations altogether, also lack the length-scale bias. To examine the importance of the paths generated by TMD, LRTMD, and DSMD in the actual transition, we use the finite-temperature string method to compute the free energy profile associated with a transition tube around a path generated by each algorithm. The free energy barriers associated with the paths are comparable, suggesting that transitions can occur along each route with similar probabilities. This result indicates that a broad ensemble of paths needs to be calculated to obtain a full description of conformational changes in biomolecules. The breadth of the contributing ensemble suggests that energetic barriers for conformational transitions in proteins are offset by entropic contributions that arise from a large number of possible paths.
Kingsley, Laura J.; Lill, Markus A.
2014-01-01
Computational prediction of ligand entry and egress paths in proteins has become an emerging topic in computational biology and has proven useful in fields such as protein engineering and drug design. Geometric tunnel prediction programs, such as Caver3.0 and MolAxis, are computationally efficient methods to identify potential ligand entry and egress routes in proteins. Although many geometric tunnel programs are designed to accommodate a single input structure, the increasingly recognized importance of protein flexibility in tunnel formation and behavior has led to the more widespread use of protein ensembles in tunnel prediction. However, there has not yet been an attempt to directly investigate the influence of ensemble size and composition on geometric tunnel prediction. In this study, we compared tunnels found in a single crystal structure to ensembles of various sizes generated using different methods on both the apo and holo forms of cytochrome P450 enzymes CYP119, CYP2C9, and CYP3A4. Several protein structure clustering methods were tested in an attempt to generate smaller ensembles that were capable of reproducing the data from larger ensembles. Ultimately, we found that by including members from both the apo and holo data sets, we could produce ensembles containing less than 15 members that were comparable to apo or holo ensembles containing over 100 members. Furthermore, we found that, in the absence of either apo or holo crystal structure data, pseudo-apo or –holo ensembles (e.g. adding ligand to apo protein throughout MD simulations) could be used to resemble the structural ensembles of the corresponding apo and holo ensembles, respectively. Our findings not only further highlight the importance of including protein flexibility in geometric tunnel prediction, but also suggest that smaller ensembles can be as capable as larger ensembles at capturing many of the protein motions important for tunnel prediction at a lower computational cost. PMID:24956479
Lessons from Climate Modeling on the Design and Use of Ensembles for Crop Modeling
NASA Technical Reports Server (NTRS)
Wallach, Daniel; Mearns, Linda O.; Ruane, Alexander C.; Roetter, Reimund P.; Asseng, Senthold
2016-01-01
Working with ensembles of crop models is a recent but important development in crop modeling which promises to lead to better uncertainty estimates for model projections and predictions, better predictions using the ensemble mean or median, and closer collaboration within the modeling community. There are numerous open questions about the best way to create and analyze such ensembles. Much can be learned from the field of climate modeling, given its much longer experience with ensembles. We draw on that experience to identify questions and make propositions that should help make ensemble modeling with crop models more rigorous and informative. The propositions include defining criteria for acceptance of models in a crop MME, exploring criteria for evaluating the degree of relatedness of models in a MME, studying the effect of number of models in the ensemble, development of a statistical model of model sampling, creation of a repository for MME results, studies of possible differential weighting of models in an ensemble, creation of single model ensembles based on sampling from the uncertainty distribution of parameter values or inputs specifically oriented toward uncertainty estimation, the creation of super ensembles that sample more than one source of uncertainty, the analysis of super ensemble results to obtain information on total uncertainty and the separate contributions of different sources of uncertainty and finally further investigation of the use of the multi-model mean or median as a predictor.
Analyses and forecasts of a tornadic supercell outbreak using a 3DVAR system ensemble
NASA Astrophysics Data System (ADS)
Zhuang, Zhaorong; Yussouf, Nusrat; Gao, Jidong
2016-05-01
As part of NOAA's "Warn-On-Forecast" initiative, a convective-scale data assimilation and prediction system was developed using the WRF-ARW model and ARPS 3DVAR data assimilation technique. The system was then evaluated using retrospective short-range ensemble analyses and probabilistic forecasts of the tornadic supercell outbreak event that occurred on 24 May 2011 in Oklahoma, USA. A 36-member multi-physics ensemble system provided the initial and boundary conditions for a 3-km convective-scale ensemble system. Radial velocity and reflectivity observations from four WSR-88Ds were assimilated into the ensemble using the ARPS 3DVAR technique. Five data assimilation and forecast experiments were conducted to evaluate the sensitivity of the system to data assimilation frequencies, in-cloud temperature adjustment schemes, and fixed- and mixed-microphysics ensembles. The results indicated that the experiment with 5-min assimilation frequency quickly built up the storm and produced a more accurate analysis compared with the 10-min assimilation frequency experiment. The predicted vertical vorticity from the moist-adiabatic in-cloud temperature adjustment scheme was larger in magnitude than that from the latent heat scheme. Cycled data assimilation yielded good forecasts, where the ensemble probability of high vertical vorticity matched reasonably well with the observed tornado damage path. Overall, the results of the study suggest that the 3DVAR analysis and forecast system can provide reasonable forecasts of tornadic supercell storms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, Weiwei; Domcke, Wolfgang; Farantos, Stavros C.
A trajectory method of calculating tunneling probabilities from phase integrals along straight line tunneling paths, originally suggested by Makri and Miller [J. Chem. Phys. 91, 4026 (1989)] and recently implemented by Truhlar and co-workers [Chem. Sci. 5, 2091 (2014)], is tested for one- and two-dimensional ab initio based potentials describing hydrogen dissociation in the {sup 1}B{sub 1} excited electronic state of pyrrole. The primary observables are the tunneling rates in a progression of bending vibrational states lying below the dissociation barrier and their isotope dependences. Several initial ensembles of classical trajectories have been considered, corresponding to the quasiclassical and themore » quantum mechanical samplings of the initial conditions. It is found that the sampling based on the fixed energy Wigner density gives the best agreement with the quantum mechanical dissociation rates.« less
NASA Astrophysics Data System (ADS)
Isoguchi, O.; Matsui, K.; Kamachi, M.; Usui, N.; Miyazawa, Y.; Ishikawa, Y.; Hirose, N.
2017-12-01
Several operational ocean assimilation models are currently available for the Northwestern Pacific and surrounding marginal seas. One of the main targets is predicting the Kuroshio/Kuroshio Extension, which have an impact not only on social activities, such as fishery and ship routing, but also on local weather. There is a demand to assess their quality comprehensively and make the best out the available products. In the present study, several ocean data assimilation products and their multi-ensemble product were assessed by comparing with satellite-derived sea surface temperature (SST), sea surface height (SSH), and in-situ hydrographic sections. The Kuroshio axes were also computed from the surface currents of these products and were compared with the Kuroshio Axis data produced analyzing satellite-SST, SSH, and in-situ observations by Marine Information Research Center (MIRC). The multi-model ensemble products generally showed the best accuracy in terms of the comparisons with the satellite-derived SST and SSH. On the other hand, the ensemble products didn't result in the best one in the comparison with the hydrographic sections. It is thus suggested that the multi-model ensemble works efficiently for the horizontally 2D parameters for which each assimilation product tends to have random errors while it does not work well for the vertical 2D comparisons for which it tends to have bias errors with respect to in-situ data. In the assessment with the Kuroshio Axis Data, some products showed more energetic behavior than the Kuroshio Axis data, resulting in the large path errors which are defined as a ratio between an area surrounded by the reference and model-derived ones and a path length. It is however not determined which are real, because in-situ observations are still lacking to resolve energetic Kuroshio behavior even though the Kuroshio is one of the strongest current.
MSEBAG: a dynamic classifier ensemble generation based on `minimum-sufficient ensemble' and bagging
NASA Astrophysics Data System (ADS)
Chen, Lei; Kamel, Mohamed S.
2016-01-01
In this paper, we propose a dynamic classifier system, MSEBAG, which is characterised by searching for the 'minimum-sufficient ensemble' and bagging at the ensemble level. It adopts an 'over-generation and selection' strategy and aims to achieve a good bias-variance trade-off. In the training phase, MSEBAG first searches for the 'minimum-sufficient ensemble', which maximises the in-sample fitness with the minimal number of base classifiers. Then, starting from the 'minimum-sufficient ensemble', a backward stepwise algorithm is employed to generate a collection of ensembles. The objective is to create a collection of ensembles with a descending fitness on the data, as well as a descending complexity in the structure. MSEBAG dynamically selects the ensembles from the collection for the decision aggregation. The extended adaptive aggregation (EAA) approach, a bagging-style algorithm performed at the ensemble level, is employed for this task. EAA searches for the competent ensembles using a score function, which takes into consideration both the in-sample fitness and the confidence of the statistical inference, and averages the decisions of the selected ensembles to label the test pattern. The experimental results show that the proposed MSEBAG outperforms the benchmarks on average.
On the Likely Utility of Hybrid Weights Optimized for Variances in Hybrid Error Covariance Models
NASA Astrophysics Data System (ADS)
Satterfield, E.; Hodyss, D.; Kuhl, D.; Bishop, C. H.
2017-12-01
Because of imperfections in ensemble data assimilation schemes, one cannot assume that the ensemble covariance is equal to the true error covariance of a forecast. Previous work demonstrated how information about the distribution of true error variances given an ensemble sample variance can be revealed from an archive of (observation-minus-forecast, ensemble-variance) data pairs. Here, we derive a simple and intuitively compelling formula to obtain the mean of this distribution of true error variances given an ensemble sample variance from (observation-minus-forecast, ensemble-variance) data pairs produced by a single run of a data assimilation system. This formula takes the form of a Hybrid weighted average of the climatological forecast error variance and the ensemble sample variance. Here, we test the extent to which these readily obtainable weights can be used to rapidly optimize the covariance weights used in Hybrid data assimilation systems that employ weighted averages of static covariance models and flow-dependent ensemble based covariance models. Univariate data assimilation and multi-variate cycling ensemble data assimilation are considered. In both cases, it is found that our computationally efficient formula gives Hybrid weights that closely approximate the optimal weights found through the simple but computationally expensive process of testing every plausible combination of weights.
Santander, Julian E; Tsapatsis, Michael; Auerbach, Scott M
2013-04-16
We have constructed and applied an algorithm to simulate the behavior of zeolite frameworks during liquid adsorption. We applied this approach to compute the adsorption isotherms of furfural-water and hydroxymethyl furfural (HMF)-water mixtures adsorbing in silicalite zeolite at 300 K for comparison with experimental data. We modeled these adsorption processes under two different statistical mechanical ensembles: the grand canonical (V-Nz-μg-T or GC) ensemble keeping volume fixed, and the P-Nz-μg-T (osmotic) ensemble allowing volume to fluctuate. To optimize accuracy and efficiency, we compared pure Monte Carlo (MC) sampling to hybrid MC-molecular dynamics (MD) simulations. For the external furfural-water and HMF-water phases, we assumed the ideal solution approximation and employed a combination of tabulated data and extended ensemble simulations for computing solvation free energies. We found that MC sampling in the V-Nz-μg-T ensemble (i.e., standard GCMC) does a poor job of reproducing both the Henry's law regime and the saturation loadings of these systems. Hybrid MC-MD sampling of the V-Nz-μg-T ensemble, which includes framework vibrations at fixed total volume, provides better results in the Henry's law region, but this approach still does not reproduce experimental saturation loadings. Pure MC sampling of the osmotic ensemble was found to approach experimental saturation loadings more closely, whereas hybrid MC-MD sampling of the osmotic ensemble quantitatively reproduces such loadings because the MC-MD approach naturally allows for locally anisotropic volume changes wherein some pores expand whereas others contract.
The Origins of the "Fanga" Dance
ERIC Educational Resources Information Center
Damm, Robert J.
2015-01-01
The "fanga" is a dance taught throughout the United States to children in elementary music classes, students in African dance classes, teachers in multicultural workshops, and professional dancers in touring ensembles. Although the history of the fanga is a path overgrown with myth, this article offers information about the dance's…
Muždalo, Anja; Saalfrank, Peter; Vreede, Jocelyne; Santer, Mark
2018-04-10
Azobenzene-based molecular photoswitches are becoming increasingly important for the development of photoresponsive, functional soft-matter material systems. Upon illumination with light, fast interconversion between a more stable trans and a metastable cis configuration can be established resulting in pronounced changes in conformation, dipole moment or hydrophobicity. A rational design of functional photosensitive molecules with embedded azo moieties requires a thorough understanding of isomerization mechanisms and rates, especially the thermally activated relaxation. For small azo derivatives considered in the gas phase or simple solvents, Eyring's classical transition state theory (TST) approach yields useful predictions for trends in activation energies or corresponding half-life times of the cis isomer. However, TST or improved theories cannot easily be applied when the azo moiety is part of a larger molecular complex or embedded into a heterogeneous environment, where a multitude of possible reaction pathways may exist. In these cases, only the sampling of an ensemble of dynamic reactive trajectories (transition path sampling, TPS) with explicit models of the environment may reveal the nature of the processes involved. In the present work we show how a TPS approach can conveniently be implemented for the phenomenon of relaxation-isomerization of azobenzenes starting with the simple examples of pure azobenzene and a push-pull derivative immersed in a polar (DMSO) and apolar (toluene) solvent. The latter are represented explicitly at a molecular mechanical (MM) and the azo moiety at a quantum mechanical (QM) level. We demonstrate for the push-pull azobenzene that path sampling in combination with the chosen QM/MM scheme produces the expected change in isomerization pathway from inversion to rotation in going from a low to a high permittivity (explicit) solvent model. We discuss the potential of the simulation procedure presented for comparative calculation of reaction rates and an improved understanding of activated states.
DOE Office of Scientific and Technical Information (OSTI.GOV)
D'Elia, M.; Edwards, H. C.; Hu, J.
Previous work has demonstrated that propagating groups of samples, called ensembles, together through forward simulations can dramatically reduce the aggregate cost of sampling-based uncertainty propagation methods [E. Phipps, M. D'Elia, H. C. Edwards, M. Hoemmen, J. Hu, and S. Rajamanickam, SIAM J. Sci. Comput., 39 (2017), pp. C162--C193]. However, critical to the success of this approach when applied to challenging problems of scientific interest is the grouping of samples into ensembles to minimize the total computational work. For example, the total number of linear solver iterations for ensemble systems may be strongly influenced by which samples form the ensemble whenmore » applying iterative linear solvers to parameterized and stochastic linear systems. In this paper we explore sample grouping strategies for local adaptive stochastic collocation methods applied to PDEs with uncertain input data, in particular canonical anisotropic diffusion problems where the diffusion coefficient is modeled by truncated Karhunen--Loève expansions. Finally, we demonstrate that a measure of the total anisotropy of the diffusion coefficient is a good surrogate for the number of linear solver iterations for each sample and therefore provides a simple and effective metric for grouping samples.« less
D'Elia, M.; Edwards, H. C.; Hu, J.; ...
2018-01-18
Previous work has demonstrated that propagating groups of samples, called ensembles, together through forward simulations can dramatically reduce the aggregate cost of sampling-based uncertainty propagation methods [E. Phipps, M. D'Elia, H. C. Edwards, M. Hoemmen, J. Hu, and S. Rajamanickam, SIAM J. Sci. Comput., 39 (2017), pp. C162--C193]. However, critical to the success of this approach when applied to challenging problems of scientific interest is the grouping of samples into ensembles to minimize the total computational work. For example, the total number of linear solver iterations for ensemble systems may be strongly influenced by which samples form the ensemble whenmore » applying iterative linear solvers to parameterized and stochastic linear systems. In this paper we explore sample grouping strategies for local adaptive stochastic collocation methods applied to PDEs with uncertain input data, in particular canonical anisotropic diffusion problems where the diffusion coefficient is modeled by truncated Karhunen--Loève expansions. Finally, we demonstrate that a measure of the total anisotropy of the diffusion coefficient is a good surrogate for the number of linear solver iterations for each sample and therefore provides a simple and effective metric for grouping samples.« less
Simulation studies of the fidelity of biomolecular structure ensemble recreation
NASA Astrophysics Data System (ADS)
Lätzer, Joachim; Eastwood, Michael P.; Wolynes, Peter G.
2006-12-01
We examine the ability of Bayesian methods to recreate structural ensembles for partially folded molecules from averaged data. Specifically we test the ability of various algorithms to recreate different transition state ensembles for folding proteins using a multiple replica simulation algorithm using input from "gold standard" reference ensembles that were first generated with a Gō-like Hamiltonian having nonpairwise additive terms. A set of low resolution data, which function as the "experimental" ϕ values, were first constructed from this reference ensemble. The resulting ϕ values were then treated as one would treat laboratory experimental data and were used as input in the replica reconstruction algorithm. The resulting ensembles of structures obtained by the replica algorithm were compared to the gold standard reference ensemble, from which those "data" were, in fact, obtained. It is found that for a unimodal transition state ensemble with a low barrier, the multiple replica algorithm does recreate the reference ensemble fairly successfully when no experimental error is assumed. The Kolmogorov-Smirnov test as well as principal component analysis show that the overlap of the recovered and reference ensembles is significantly enhanced when multiple replicas are used. Reduction of the multiple replica ensembles by clustering successfully yields subensembles with close similarity to the reference ensembles. On the other hand, for a high barrier transition state with two distinct transition state ensembles, the single replica algorithm only samples a few structures of one of the reference ensemble basins. This is due to the fact that the ϕ values are intrinsically ensemble averaged quantities. The replica algorithm with multiple copies does sample both reference ensemble basins. In contrast to the single replica case, the multiple replicas are constrained to reproduce the average ϕ values, but allow fluctuations in ϕ for each individual copy. These fluctuations facilitate a more faithful sampling of the reference ensemble basins. Finally, we test how robustly the reconstruction algorithm can function by introducing errors in ϕ comparable in magnitude to those suggested by some authors. In this circumstance we observe that the chances of ensemble recovery with the replica algorithm are poor using a single replica, but are improved when multiple copies are used. A multimodal transition state ensemble, however, turns out to be more sensitive to large errors in ϕ (if appropriately gauged) and attempts at successful recreation of the reference ensemble with simple replica algorithms can fall short.
Jarzynski equality in the context of maximum path entropy
NASA Astrophysics Data System (ADS)
González, Diego; Davis, Sergio
2017-06-01
In the global framework of finding an axiomatic derivation of nonequilibrium Statistical Mechanics from fundamental principles, such as the maximum path entropy - also known as Maximum Caliber principle -, this work proposes an alternative derivation of the well-known Jarzynski equality, a nonequilibrium identity of great importance today due to its applications to irreversible processes: biological systems (protein folding), mechanical systems, among others. This equality relates the free energy differences between two equilibrium thermodynamic states with the work performed when going between those states, through an average over a path ensemble. In this work the analysis of Jarzynski's equality will be performed using the formalism of inference over path space. This derivation highlights the wide generality of Jarzynski's original result, which could even be used in non-thermodynamical settings such as social systems, financial and ecological systems.
A method for determining the weak statistical stationarity of a random process
NASA Technical Reports Server (NTRS)
Sadeh, W. Z.; Koper, C. A., Jr.
1978-01-01
A method for determining the weak statistical stationarity of a random process is presented. The core of this testing procedure consists of generating an equivalent ensemble which approximates a true ensemble. Formation of an equivalent ensemble is accomplished through segmenting a sufficiently long time history of a random process into equal, finite, and statistically independent sample records. The weak statistical stationarity is ascertained based on the time invariance of the equivalent-ensemble averages. Comparison of these averages with their corresponding time averages over a single sample record leads to a heuristic estimate of the ergodicity of a random process. Specific variance tests are introduced for evaluating the statistical independence of the sample records, the time invariance of the equivalent-ensemble autocorrelations, and the ergodicity. Examination and substantiation of these procedures were conducted utilizing turbulent velocity signals.
A Kolmogorov-Smirnov test for the molecular clock based on Bayesian ensembles of phylogenies
Antoneli, Fernando; Passos, Fernando M.; Lopes, Luciano R.
2018-01-01
Divergence date estimates are central to understand evolutionary processes and depend, in the case of molecular phylogenies, on tests of molecular clocks. Here we propose two non-parametric tests of strict and relaxed molecular clocks built upon a framework that uses the empirical cumulative distribution (ECD) of branch lengths obtained from an ensemble of Bayesian trees and well known non-parametric (one-sample and two-sample) Kolmogorov-Smirnov (KS) goodness-of-fit test. In the strict clock case, the method consists in using the one-sample Kolmogorov-Smirnov (KS) test to directly test if the phylogeny is clock-like, in other words, if it follows a Poisson law. The ECD is computed from the discretized branch lengths and the parameter λ of the expected Poisson distribution is calculated as the average branch length over the ensemble of trees. To compensate for the auto-correlation in the ensemble of trees and pseudo-replication we take advantage of thinning and effective sample size, two features provided by Bayesian inference MCMC samplers. Finally, it is observed that tree topologies with very long or very short branches lead to Poisson mixtures and in this case we propose the use of the two-sample KS test with samples from two continuous branch length distributions, one obtained from an ensemble of clock-constrained trees and the other from an ensemble of unconstrained trees. Moreover, in this second form the test can also be applied to test for relaxed clock models. The use of a statistically equivalent ensemble of phylogenies to obtain the branch lengths ECD, instead of one consensus tree, yields considerable reduction of the effects of small sample size and provides a gain of power. PMID:29300759
DOE Office of Scientific and Technical Information (OSTI.GOV)
Man, Jun; Zhang, Jiangjiang; Li, Weixuan
2016-10-01
The ensemble Kalman filter (EnKF) has been widely used in parameter estimation for hydrological models. The focus of most previous studies was to develop more efficient analysis (estimation) algorithms. On the other hand, it is intuitively understandable that a well-designed sampling (data-collection) strategy should provide more informative measurements and subsequently improve the parameter estimation. In this work, a Sequential Ensemble-based Optimal Design (SEOD) method, coupled with EnKF, information theory and sequential optimal design, is proposed to improve the performance of parameter estimation. Based on the first-order and second-order statistics, different information metrics including the Shannon entropy difference (SD), degrees ofmore » freedom for signal (DFS) and relative entropy (RE) are used to design the optimal sampling strategy, respectively. The effectiveness of the proposed method is illustrated by synthetic one-dimensional and two-dimensional unsaturated flow case studies. It is shown that the designed sampling strategies can provide more accurate parameter estimation and state prediction compared with conventional sampling strategies. Optimal sampling designs based on various information metrics perform similarly in our cases. The effect of ensemble size on the optimal design is also investigated. Overall, larger ensemble size improves the parameter estimation and convergence of optimal sampling strategy. Although the proposed method is applied to unsaturated flow problems in this study, it can be equally applied in any other hydrological problems.« less
Zheng, Lianqing; Chen, Mengen; Yang, Wei
2009-06-21
To overcome the pseudoergodicity problem, conformational sampling can be accelerated via generalized ensemble methods, e.g., through the realization of random walks along prechosen collective variables, such as spatial order parameters, energy scaling parameters, or even system temperatures or pressures, etc. As usually observed, in generalized ensemble simulations, hidden barriers are likely to exist in the space perpendicular to the collective variable direction and these residual free energy barriers could greatly abolish the sampling efficiency. This sampling issue is particularly severe when the collective variable is defined in a low-dimension subset of the target system; then the "Hamiltonian lagging" problem, which reveals the fact that necessary structural relaxation falls behind the move of the collective variable, may be likely to occur. To overcome this problem in equilibrium conformational sampling, we adopted the orthogonal space random walk (OSRW) strategy, which was originally developed in the context of free energy simulation [L. Zheng, M. Chen, and W. Yang, Proc. Natl. Acad. Sci. U.S.A. 105, 20227 (2008)]. Thereby, generalized ensemble simulations can simultaneously escape both the explicit barriers along the collective variable direction and the hidden barriers that are strongly coupled with the collective variable move. As demonstrated in our model studies, the present OSRW based generalized ensemble treatments show improved sampling capability over the corresponding classical generalized ensemble treatments.
An ensemble-based approach for breast mass classification in mammography images
NASA Astrophysics Data System (ADS)
Ribeiro, Patricia B.; Papa, João. P.; Romero, Roseli A. F.
2017-03-01
Mammography analysis is an important tool that helps detecting breast cancer at the very early stages of the disease, thus increasing the quality of life of hundreds of thousands of patients worldwide. In Computer-Aided Detection systems, the identification of mammograms with and without masses (without clinical findings) is highly needed to reduce the false positive rates regarding the automatic selection of regions of interest that may contain some suspicious content. In this work, the introduce a variant of the Optimum-Path Forest (OPF) classifier for breast mass identification, as well as we employed an ensemble-based approach that can enhance the effectiveness of individual classifiers aiming at dealing with the aforementioned purpose. The experimental results also comprise the naïve OPF and a traditional neural network, being the most accurate results obtained through the ensemble of classifiers, with an accuracy nearly to 86%.
ERIC Educational Resources Information Center
Murphy, Sean
2013-01-01
The saxophone section of a wind ensemble can easily be one of the most frustrating to work with when it comes to producing a clear, characteristic tone. Sometimes, the road to an improved sound can be a long path of daily diligence and practice; however, there are many quicker solutions that will drastically improve a student's tone. This article…
2013-01-01
Background Many problems in protein modeling require obtaining a discrete representation of the protein conformational space as an ensemble of conformations. In ab-initio structure prediction, in particular, where the goal is to predict the native structure of a protein chain given its amino-acid sequence, the ensemble needs to satisfy energetic constraints. Given the thermodynamic hypothesis, an effective ensemble contains low-energy conformations which are similar to the native structure. The high-dimensionality of the conformational space and the ruggedness of the underlying energy surface currently make it very difficult to obtain such an ensemble. Recent studies have proposed that Basin Hopping is a promising probabilistic search framework to obtain a discrete representation of the protein energy surface in terms of local minima. Basin Hopping performs a series of structural perturbations followed by energy minimizations with the goal of hopping between nearby energy minima. This approach has been shown to be effective in obtaining conformations near the native structure for small systems. Recent work by us has extended this framework to larger systems through employment of the molecular fragment replacement technique, resulting in rapid sampling of large ensembles. Methods This paper investigates the algorithmic components in Basin Hopping to both understand and control their effect on the sampling of near-native minima. Realizing that such an ensemble is reduced before further refinement in full ab-initio protocols, we take an additional step and analyze the quality of the ensemble retained by ensemble reduction techniques. We propose a novel multi-objective technique based on the Pareto front to filter the ensemble of sampled local minima. Results and conclusions We show that controlling the magnitude of the perturbation allows directly controlling the distance between consecutively-sampled local minima and, in turn, steering the exploration towards conformations near the native structure. For the minimization step, we show that the addition of Metropolis Monte Carlo-based minimization is no more effective than a simple greedy search. Finally, we show that the size of the ensemble of sampled local minima can be effectively and efficiently reduced by a multi-objective filter to obtain a simpler representation of the probed energy surface. PMID:24564970
Oliveira, Roberta B; Pereira, Aledir S; Tavares, João Manuel R S
2017-10-01
The number of deaths worldwide due to melanoma has risen in recent times, in part because melanoma is the most aggressive type of skin cancer. Computational systems have been developed to assist dermatologists in early diagnosis of skin cancer, or even to monitor skin lesions. However, there still remains a challenge to improve classifiers for the diagnosis of such skin lesions. The main objective of this article is to evaluate different ensemble classification models based on input feature manipulation to diagnose skin lesions. Input feature manipulation processes are based on feature subset selections from shape properties, colour variation and texture analysis to generate diversity for the ensemble models. Three subset selection models are presented here: (1) a subset selection model based on specific feature groups, (2) a correlation-based subset selection model, and (3) a subset selection model based on feature selection algorithms. Each ensemble classification model is generated using an optimum-path forest classifier and integrated with a majority voting strategy. The proposed models were applied on a set of 1104 dermoscopic images using a cross-validation procedure. The best results were obtained by the first ensemble classification model that generates a feature subset ensemble based on specific feature groups. The skin lesion diagnosis computational system achieved 94.3% accuracy, 91.8% sensitivity and 96.7% specificity. The input feature manipulation process based on specific feature subsets generated the greatest diversity for the ensemble classification model with very promising results. Copyright © 2017 Elsevier B.V. All rights reserved.
Ren, Fulong; Cao, Peng; Li, Wei; Zhao, Dazhe; Zaiane, Osmar
2017-01-01
Diabetic retinopathy (DR) is a progressive disease, and its detection at an early stage is crucial for saving a patient's vision. An automated screening system for DR can help in reduce the chances of complete blindness due to DR along with lowering the work load on ophthalmologists. Among the earliest signs of DR are microaneurysms (MAs). However, current schemes for MA detection appear to report many false positives because detection algorithms have high sensitivity. Inevitably some non-MAs structures are labeled as MAs in the initial MAs identification step. This is a typical "class imbalance problem". Class imbalanced data has detrimental effects on the performance of conventional classifiers. In this work, we propose an ensemble based adaptive over-sampling algorithm for overcoming the class imbalance problem in the false positive reduction, and we use Boosting, Bagging, Random subspace as the ensemble framework to improve microaneurysm detection. The ensemble based over-sampling methods we proposed combine the strength of adaptive over-sampling and ensemble. The objective of the amalgamation of ensemble and adaptive over-sampling is to reduce the induction biases introduced from imbalanced data and to enhance the generalization classification performance of extreme learning machines (ELM). Experimental results show that our ASOBoost method has higher area under the ROC curve (AUC) and G-mean values than many existing class imbalance learning methods. Copyright © 2016 Elsevier Ltd. All rights reserved.
A global sampling approach to designing and reengineering RNA secondary structures.
Levin, Alex; Lis, Mieszko; Ponty, Yann; O'Donnell, Charles W; Devadas, Srinivas; Berger, Bonnie; Waldispühl, Jérôme
2012-11-01
The development of algorithms for designing artificial RNA sequences that fold into specific secondary structures has many potential biomedical and synthetic biology applications. To date, this problem remains computationally difficult, and current strategies to address it resort to heuristics and stochastic search techniques. The most popular methods consist of two steps: First a random seed sequence is generated; next, this seed is progressively modified (i.e. mutated) to adopt the desired folding properties. Although computationally inexpensive, this approach raises several questions such as (i) the influence of the seed; and (ii) the efficiency of single-path directed searches that may be affected by energy barriers in the mutational landscape. In this article, we present RNA-ensign, a novel paradigm for RNA design. Instead of taking a progressive adaptive walk driven by local search criteria, we use an efficient global sampling algorithm to examine large regions of the mutational landscape under structural and thermodynamical constraints until a solution is found. When considering the influence of the seeds and the target secondary structures, our results show that, compared to single-path directed searches, our approach is more robust, succeeds more often and generates more thermodynamically stable sequences. An ensemble approach to RNA design is thus well worth pursuing as a complement to existing approaches. RNA-ensign is available at http://csb.cs.mcgill.ca/RNAensign.
A global sampling approach to designing and reengineering RNA secondary structures
Levin, Alex; Lis, Mieszko; Ponty, Yann; O’Donnell, Charles W.; Devadas, Srinivas; Berger, Bonnie; Waldispühl, Jérôme
2012-01-01
The development of algorithms for designing artificial RNA sequences that fold into specific secondary structures has many potential biomedical and synthetic biology applications. To date, this problem remains computationally difficult, and current strategies to address it resort to heuristics and stochastic search techniques. The most popular methods consist of two steps: First a random seed sequence is generated; next, this seed is progressively modified (i.e. mutated) to adopt the desired folding properties. Although computationally inexpensive, this approach raises several questions such as (i) the influence of the seed; and (ii) the efficiency of single-path directed searches that may be affected by energy barriers in the mutational landscape. In this article, we present RNA-ensign, a novel paradigm for RNA design. Instead of taking a progressive adaptive walk driven by local search criteria, we use an efficient global sampling algorithm to examine large regions of the mutational landscape under structural and thermodynamical constraints until a solution is found. When considering the influence of the seeds and the target secondary structures, our results show that, compared to single-path directed searches, our approach is more robust, succeeds more often and generates more thermodynamically stable sequences. An ensemble approach to RNA design is thus well worth pursuing as a complement to existing approaches. RNA-ensign is available at http://csb.cs.mcgill.ca/RNAensign. PMID:22941632
Bayesian ensemble refinement by replica simulations and reweighting.
Hummer, Gerhard; Köfinger, Jürgen
2015-12-28
We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.
Bayesian ensemble refinement by replica simulations and reweighting
NASA Astrophysics Data System (ADS)
Hummer, Gerhard; Köfinger, Jürgen
2015-12-01
We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.
NASA Astrophysics Data System (ADS)
Li, Hui; Hong, Lu-Yao; Zhou, Qing; Yu, Hai-Jie
2015-08-01
The business failure of numerous companies results in financial crises. The high social costs associated with such crises have made people to search for effective tools for business risk prediction, among which, support vector machine is very effective. Several modelling means, including single-technique modelling, hybrid modelling, and ensemble modelling, have been suggested in forecasting business risk with support vector machine. However, existing literature seldom focuses on the general modelling frame for business risk prediction, and seldom investigates performance differences among different modelling means. We reviewed researches on forecasting business risk with support vector machine, proposed the general assisted prediction modelling frame with hybridisation and ensemble (APMF-WHAE), and finally, investigated the use of principal components analysis, support vector machine, random sampling, and group decision, under the general frame in forecasting business risk. Under the APMF-WHAE frame with support vector machine as the base predictive model, four specific predictive models were produced, namely, pure support vector machine, a hybrid support vector machine involved with principal components analysis, a support vector machine ensemble involved with random sampling and group decision, and an ensemble of hybrid support vector machine using group decision to integrate various hybrid support vector machines on variables produced from principle components analysis and samples from random sampling. The experimental results indicate that hybrid support vector machine and ensemble of hybrid support vector machines were able to produce dominating performance than pure support vector machine and support vector machine ensemble.
2017-06-01
11 Table 1 Notation for fabric and ensemble resistances . .......................................... 13 Thermal manikin...Table 1 Notation for fabric and ensemble resistances .................................................. 13 Table 2 Weight reduction of CB garment...samples were tested on a Sweating Guarded Hot Plate (SGHP) to measure fabric thermal and evaporative resistance , respectively. The ensembles were tested
Use of combined radar and radiometer systems in space for precipitation measurement: Some ideas
NASA Technical Reports Server (NTRS)
Moore, R. K.
1981-01-01
A brief survey is given of some fundamental physical concepts of optimal polarization characteristics of a transmission path or scatter ensemble of hydrometers. It is argued that, based on this optimization concept, definite advances in remote atmospheric sensing are to be expected. Basic properties of Kennaugh's optimal polarization theory are identified.
Observation of ground-state quantum beats in atomic spontaneous emission.
Norris, D G; Orozco, L A; Barberis-Blostein, P; Carmichael, H J
2010-09-17
We report ground-state quantum beats in spontaneous emission from a continuously driven atomic ensemble. Beats are visible only in an intensity autocorrelation and evidence spontaneously generated coherence in radiative decay. Our measurement realizes a quantum eraser where a first photon detection prepares a superposition and a second erases the "which path" information in the intermediate state.
Zhang, Cuicui; Liang, Xuefeng; Matsuyama, Takashi
2014-12-08
Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the "small sample size" (SSS) problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1) how to define diverse base classifiers from the small data; (2) how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0-1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system.
Zhang, Cuicui; Liang, Xuefeng; Matsuyama, Takashi
2014-01-01
Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the “small sample size” (SSS) problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1) how to define diverse base classifiers from the small data; (2) how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0–1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system. PMID:25494350
Ensemble stump classifiers and gene expression signatures in lung cancer.
Frey, Lewis; Edgerton, Mary; Fisher, Douglas; Levy, Shawn
2007-01-01
Microarray data sets for cancer tumor tissue generally have very few samples, each sample having thousands of probes (i.e., continuous variables). The sparsity of samples makes it difficult for machine learning techniques to discover probes relevant to the classification of tumor tissue. By combining data from different platforms (i.e., data sources), data sparsity is reduced, but this typically requires normalizing data from the different platforms, which can be non-trivial. This paper proposes a variant on the idea of ensemble learners to circumvent the need for normalization. To facilitate comprehension we build ensembles of very simple classifiers known as decision stumps--decision trees of one test each. The Ensemble Stump Classifier (ESC) identifies an mRNA signature having three probes and high accuracy for distinguishing between adenocarcinoma and squamous cell carcinoma of the lung across four data sets. In terms of accuracy, ESC outperforms a decision tree classifier on all four data sets, outperforms ensemble decision trees on three data sets, and simple stump classifiers on two data sets.
An ensemble predictive modeling framework for breast cancer classification.
Nagarajan, Radhakrishnan; Upreti, Meenakshi
2017-12-01
Molecular changes often precede clinical presentation of diseases and can be useful surrogates with potential to assist in informed clinical decision making. Recent studies have demonstrated the usefulness of modeling approaches such as classification that can predict the clinical outcomes from molecular expression profiles. While useful, a majority of these approaches implicitly use all molecular markers as features in the classification process often resulting in sparse high-dimensional projection of the samples often comparable to that of the sample size. In this study, a variant of the recently proposed ensemble classification approach is used for predicting good and poor-prognosis breast cancer samples from their molecular expression profiles. In contrast to traditional single and ensemble classifiers, the proposed approach uses multiple base classifiers with varying feature sets obtained from two-dimensional projection of the samples in conjunction with a majority voting strategy for predicting the class labels. In contrast to our earlier implementation, base classifiers in the ensembles are chosen based on maximal sensitivity and minimal redundancy by choosing only those with low average cosine distance. The resulting ensemble sets are subsequently modeled as undirected graphs. Performance of four different classification algorithms is shown to be better within the proposed ensemble framework in contrast to using them as traditional single classifier systems. Significance of a subset of genes with high-degree centrality in the network abstractions across the poor-prognosis samples is also discussed. Copyright © 2017 Elsevier Inc. All rights reserved.
Ensemble Bayesian forecasting system Part I: Theory and algorithms
NASA Astrophysics Data System (ADS)
Herr, Henry D.; Krzysztofowicz, Roman
2015-05-01
The ensemble Bayesian forecasting system (EBFS), whose theory was published in 2001, is developed for the purpose of quantifying the total uncertainty about a discrete-time, continuous-state, non-stationary stochastic process such as a time series of stages, discharges, or volumes at a river gauge. The EBFS is built of three components: an input ensemble forecaster (IEF), which simulates the uncertainty associated with random inputs; a deterministic hydrologic model (of any complexity), which simulates physical processes within a river basin; and a hydrologic uncertainty processor (HUP), which simulates the hydrologic uncertainty (an aggregate of all uncertainties except input). It works as a Monte Carlo simulator: an ensemble of time series of inputs (e.g., precipitation amounts) generated by the IEF is transformed deterministically through a hydrologic model into an ensemble of time series of outputs, which is next transformed stochastically by the HUP into an ensemble of time series of predictands (e.g., river stages). Previous research indicated that in order to attain an acceptable sampling error, the ensemble size must be on the order of hundreds (for probabilistic river stage forecasts and probabilistic flood forecasts) or even thousands (for probabilistic stage transition forecasts). The computing time needed to run the hydrologic model this many times renders the straightforward simulations operationally infeasible. This motivates the development of the ensemble Bayesian forecasting system with randomization (EBFSR), which takes full advantage of the analytic meta-Gaussian HUP and generates multiple ensemble members after each run of the hydrologic model; this auxiliary randomization reduces the required size of the meteorological input ensemble and makes it operationally feasible to generate a Bayesian ensemble forecast of large size. Such a forecast quantifies the total uncertainty, is well calibrated against the prior (climatic) distribution of predictand, possesses a Bayesian coherence property, constitutes a random sample of the predictand, and has an acceptable sampling error-which makes it suitable for rational decision making under uncertainty.
NASA Technical Reports Server (NTRS)
Keppenne, Christian L.
2013-01-01
A two-step ensemble recentering Kalman filter (ERKF) analysis scheme is introduced. The algorithm consists of a recentering step followed by an ensemble Kalman filter (EnKF) analysis step. The recentering step is formulated such as to adjust the prior distribution of an ensemble of model states so that the deviations of individual samples from the sample mean are unchanged but the original sample mean is shifted to the prior position of the most likely particle, where the likelihood of each particle is measured in terms of closeness to a chosen subset of the observations. The computational cost of the ERKF is essentially the same as that of a same size EnKF. The ERKF is applied to the assimilation of Argo temperature profiles into the OGCM component of an ensemble of NASA GEOS-5 coupled models. Unassimilated Argo salt data are used for validation. A surprisingly small number (16) of model trajectories is sufficient to significantly improve model estimates of salinity over estimates from an ensemble run without assimilation. The two-step algorithm also performs better than the EnKF although its performance is degraded in poorly observed regions.
A new large initial condition ensemble to assess avoided impacts in a climate mitigation scenario
NASA Astrophysics Data System (ADS)
Sanderson, B. M.; Tebaldi, C.; Knutti, R.; Oleson, K. W.
2014-12-01
It has recently been demonstrated that when considering timescales of up to 50 years, natural variability may play an equal role to anthropogenic forcing on subcontinental trends for a variety of climate indicators. Thus, for many questions assessing climate impacts on such time and spatial scales, it has become clear that a significant number of ensemble members may be required to produce robust statistics (and especially so for extreme events). However, large ensemble experiments to date have considered the role of variability in a single scenario, leaving uncertain the relationship between the forced climate trajectory and the variability about that path. To address this issue, we present a new, publicly available, 15 member initial condition ensemble of 21st century climate projections for the RCP 4.5 scenario using the CESM1.1 Earth System Model, which we propose as a companion project to the existing 40 member CESM large ensemble which uses the higher greenhouse gas emission future of RCP8.5. This provides a valuable data set for assessing what societal and ecological impacts might be avoided through a moderate mitigation strategy in contrast to a fossil fuel intensive future. We present some early analyses of these combined ensembles to assess to what degree the climate variability can be considered to combine linearly with the underlying forced response. In regions where there is no detectable relationship between the mean state and the variability about the mean trajectory, then linear assumptions can be trivially exploited to utilize a single ensemble or control simulation to characterize the variability in any scenario of interest. We highlight regions where there is a detectable nonlinearity in extreme event frequency, how far in the future they will be manifested and propose mechanisms to account for these effects.
Metadynamic metainference: Enhanced sampling of the metainference ensemble using metadynamics
Bonomi, Massimiliano; Camilloni, Carlo; Vendruscolo, Michele
2016-01-01
Accurate and precise structural ensembles of proteins and macromolecular complexes can be obtained with metainference, a recently proposed Bayesian inference method that integrates experimental information with prior knowledge and deals with all sources of errors in the data as well as with sample heterogeneity. The study of complex macromolecular systems, however, requires an extensive conformational sampling, which represents a separate challenge. To address such challenge and to exhaustively and efficiently generate structural ensembles we combine metainference with metadynamics and illustrate its application to the calculation of the free energy landscape of the alanine dipeptide. PMID:27561930
NASA Astrophysics Data System (ADS)
Livorati, André L. P.; Palmero, Matheus S.; Díaz-I, Gabriel; Dettmann, Carl P.; Caldas, Iberê L.; Leonel, Edson D.
2018-02-01
We study the dynamics of an ensemble of non interacting particles constrained by two infinitely heavy walls, where one of them is moving periodically in time, while the other is fixed. The system presents mixed dynamics, where the accessible region for the particle to diffuse chaotically is bordered by an invariant spanning curve. Statistical analysis for the root mean square velocity, considering high and low velocity ensembles, leads the dynamics to the same steady state plateau for long times. A transport investigation of the dynamics via escape basins reveals that depending of the initial velocity ensemble, the decay rates of the survival probability present different shapes and bumps, in a mix of exponential, power law and stretched exponential decays. After an analysis of step-size averages, we found that the stable manifolds play the role of a preferential path for faster escape, being responsible for the bumps and different shapes of the survival probability.
Ensemble codes involving hippocampal neurons are at risk during delayed performance tests.
Hampson, R E; Deadwyler, S A
1996-11-26
Multielectrode recording techniques were used to record ensemble activity from 10 to 16 simultaneously active CA1 and CA3 neurons in the rat hippocampus during performance of a spatial delayed-nonmatch-to-sample task. Extracted sources of variance were used to assess the nature of two different types of errors that accounted for 30% of total trials. The two types of errors included ensemble "miscodes" of sample phase information and errors associated with delay-dependent corruption or disappearance of sample information at the time of the nonmatch response. Statistical assessment of trial sequences and associated "strength" of hippocampal ensemble codes revealed that miscoded error trials always followed delay-dependent error trials in which encoding was "weak," indicating that the two types of errors were "linked." It was determined that the occurrence of weakly encoded, delay-dependent error trials initiated an ensemble encoding "strategy" that increased the chances of being correct on the next trial and avoided the occurrence of further delay-dependent errors. Unexpectedly, the strategy involved "strongly" encoding response position information from the prior (delay-dependent) error trial and carrying it forward to the sample phase of the next trial. This produced a miscode type error on trials in which the "carried over" information obliterated encoding of the sample phase response on the next trial. Application of this strategy, irrespective of outcome, was sufficient to reorient the animal to the proper between trial sequence of response contingencies (nonmatch-to-sample) and boost performance to 73% correct on subsequent trials. The capacity for ensemble analyses of strength of information encoding combined with statistical assessment of trial sequences therefore provided unique insight into the "dynamic" nature of the role hippocampus plays in delay type memory tasks.
Optimized Free Energies from Bidirectional Single-Molecule Force Spectroscopy
NASA Astrophysics Data System (ADS)
Minh, David D. L.; Adib, Artur B.
2008-05-01
An optimized method for estimating path-ensemble averages using data from processes driven in opposite directions is presented. Based on this estimator, bidirectional expressions for reconstructing free energies and potentials of mean force from single-molecule force spectroscopy—valid for biasing potentials of arbitrary stiffness—are developed. Numerical simulations on a model potential indicate that these methods perform better than unidirectional strategies.
NASA Astrophysics Data System (ADS)
Akibue, Seiseki; Kato, Go
2018-04-01
For distinguishing quantum states sampled from a fixed ensemble, the gap in bipartite and single-party distinguishability can be interpreted as a nonlocality of the ensemble. In this paper, we consider bipartite state discrimination in a composite system consisting of N subsystems, where each subsystem is shared between two parties and the state of each subsystem is randomly sampled from a particular ensemble comprising the Bell states. We show that the success probability of perfectly identifying the state converges to 1 as N →∞ if the entropy of the probability distribution associated with the ensemble is less than 1, even if the success probability is less than 1 for any finite N . In other words, the nonlocality of the N -fold ensemble asymptotically disappears if the probability distribution associated with each ensemble is concentrated. Furthermore, we show that the disappearance of the nonlocality can be regarded as a remarkable counterexample of a fundamental open question in theoretical computer science, called a parallel repetition conjecture of interactive games with two classically communicating players. Measurements for the discrimination task include a projective measurement of one party represented by stabilizer states, which enable the other party to perfectly distinguish states that are sampled with high probability.
Ensemble transcript interaction networks: a case study on Alzheimer's disease.
Armañanzas, Rubén; Larrañaga, Pedro; Bielza, Concha
2012-10-01
Systems biology techniques are a topic of recent interest within the neurological field. Computational intelligence (CI) addresses this holistic perspective by means of consensus or ensemble techniques ultimately capable of uncovering new and relevant findings. In this paper, we propose the application of a CI approach based on ensemble Bayesian network classifiers and multivariate feature subset selection to induce probabilistic dependences that could match or unveil biological relationships. The research focuses on the analysis of high-throughput Alzheimer's disease (AD) transcript profiling. The analysis is conducted from two perspectives. First, we compare the expression profiles of hippocampus subregion entorhinal cortex (EC) samples of AD patients and controls. Second, we use the ensemble approach to study four types of samples: EC and dentate gyrus (DG) samples from both patients and controls. Results disclose transcript interaction networks with remarkable structures and genes not directly related to AD by previous studies. The ensemble is able to identify a variety of transcripts that play key roles in other neurological pathologies. Classical statistical assessment by means of non-parametric tests confirms the relevance of the majority of the transcripts. The ensemble approach pinpoints key metabolic mechanisms that could lead to new findings in the pathogenesis and development of AD. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Gao, Jiali; Major, Dan T; Fan, Yao; Lin, Yen-Lin; Ma, Shuhua; Wong, Kin-Yiu
2008-01-01
A method for incorporating quantum mechanics into enzyme kinetics modeling is presented. Three aspects are emphasized: 1) combined quantum mechanical and molecular mechanical methods are used to represent the potential energy surface for modeling bond forming and breaking processes, 2) instantaneous normal mode analyses are used to incorporate quantum vibrational free energies to the classical potential of mean force, and 3) multidimensional tunneling methods are used to estimate quantum effects on the reaction coordinate motion. Centroid path integral simulations are described to make quantum corrections to the classical potential of mean force. In this method, the nuclear quantum vibrational and tunneling contributions are not separable. An integrated centroid path integral-free energy perturbation and umbrella sampling (PI-FEP/UM) method along with a bisection sampling procedure was summarized, which provides an accurate, easily convergent method for computing kinetic isotope effects for chemical reactions in solution and in enzymes. In the ensemble-averaged variational transition state theory with multidimensional tunneling (EA-VTST/MT), these three aspects of quantum mechanical effects can be individually treated, providing useful insights into the mechanism of enzymatic reactions. These methods are illustrated by applications to a model process in the gas phase, the decarboxylation reaction of N-methyl picolinate in water, and the proton abstraction and reprotonation process catalyzed by alanine racemase. These examples show that the incorporation of quantum mechanical effects is essential for enzyme kinetics simulations.
Ensemble Data Assimilation Without Ensembles: Methodology and Application to Ocean Data Assimilation
NASA Technical Reports Server (NTRS)
Keppenne, Christian L.; Rienecker, Michele M.; Kovach, Robin M.; Vernieres, Guillaume
2013-01-01
Two methods to estimate background error covariances for data assimilation are introduced. While both share properties with the ensemble Kalman filter (EnKF), they differ from it in that they do not require the integration of multiple model trajectories. Instead, all the necessary covariance information is obtained from a single model integration. The first method is referred-to as SAFE (Space Adaptive Forecast error Estimation) because it estimates error covariances from the spatial distribution of model variables within a single state vector. It can thus be thought of as sampling an ensemble in space. The second method, named FAST (Flow Adaptive error Statistics from a Time series), constructs an ensemble sampled from a moving window along a model trajectory. The underlying assumption in these methods is that forecast errors in data assimilation are primarily phase errors in space and/or time.
Quantum Ensemble Classification: A Sampling-Based Learning Control Approach.
Chen, Chunlin; Dong, Daoyi; Qi, Bo; Petersen, Ian R; Rabitz, Herschel
2017-06-01
Quantum ensemble classification (QEC) has significant applications in discrimination of atoms (or molecules), separation of isotopes, and quantum information extraction. However, quantum mechanics forbids deterministic discrimination among nonorthogonal states. The classification of inhomogeneous quantum ensembles is very challenging, since there exist variations in the parameters characterizing the members within different classes. In this paper, we recast QEC as a supervised quantum learning problem. A systematic classification methodology is presented by using a sampling-based learning control (SLC) approach for quantum discrimination. The classification task is accomplished via simultaneously steering members belonging to different classes to their corresponding target states (e.g., mutually orthogonal states). First, a new discrimination method is proposed for two similar quantum systems. Then, an SLC method is presented for QEC. Numerical results demonstrate the effectiveness of the proposed approach for the binary classification of two-level quantum ensembles and the multiclass classification of multilevel quantum ensembles.
NASA Astrophysics Data System (ADS)
Li, J. F.; Waliser, D. E.; Chen, W.; Deng, M.; Lebsock, M. D.; Stephens, G. L.; Guan, B.; Christensen, M.; Teixeira, J.
2013-12-01
Representing clouds and cloud climate feedbacks in global climate models (GCMs) remains a pressing challenge to reduce and quantify uncertainties associated with climate change projection. Vertical structures of clouds simulated by present-day models have not been extensively examined using vertically-resolved cloud hydrometers such as cloud ice water (CIW) content and cloud liquid water (CLW) content. The gap in available observations for cloud mass was clearly evident from the wide disparity in the CIW path [Waliser et al., 2009] and CLW path [Li et al., 2008;2011] values exhibited in the CMIP3 GCMs. We present an observationally-based evaluation of the CIW and CLW of present-day GCMs, notably 20th century CMIP5 simulations, and compare these results to the CMIP3 and two recent reanalyses (ECMWF and MERRA). We use three different CloudSat+CALIPSO CIW products as well as three different observation CLW products, CloudSat, MODIS and AMSRE and their combined product for CLW with methods to remove the contribution from the convective core ice mass and/or precipitating cloud hydrometeors with variable sizes and falling speeds so that a robust observational estimate with uncertainty can be obtained for model evaluations. Note, considering the CloudSat's limitations of CLW retrievals due to contamination from the precipitation and from radar clutter near the surface, an alternative CLW is synergistically constructed using MODIS CLW and CloudSat CLW. The results show that for annual mean CIW path, there are factors of 2-10 in the differences between observations and models for a majority of the GCMs and for a number of regions. Based on a number of metrics, the ensemble behavior of CMIP5 has improved considerably relative to CMIP3 (~ 50%), although neither the CMIP5 ensemble mean nor any individual model performs particularly well, and there are still a number of models that exhibit very large biases despite the availability of relevant observations. For CLW, most of the CMIP3/CMIP5 annual mean CLW path values are overestimated by factors of 2-10 compared to observations globally. For the vertical structure of CIW/CLW content, significant systematic biases are found with many models biased significantly. Based on the Taylor diagram, the ensemble performance of CMIP5 CLW path simulation shows little or no improvement relative to CMIP3. The implications of these results on model representations of the earth radiation balance are discussed, along with caveats and uncertainties associated with the observational estimates, model and observation representations of the precipitating and cloudy ice components, relevant physical processes and parameterizations.
NMR Studies of Dynamic Biomolecular Conformational Ensembles
Torchia, Dennis A.
2015-01-01
Multidimensional heteronuclear NMR approaches can provide nearly complete sequential signal assignments of isotopically enriched biomolecules. The availability of assignments together with measurements of spin relaxation rates, residual spin interactions, J-couplings and chemical shifts provides information at atomic resolution about internal dynamics on timescales ranging from ps to ms, both in solution and in the solid state. However, due to the complexity of biomolecules, it is not possible to extract a unique atomic-resolution description of biomolecular motions even from extensive NMR data when many conformations are sampled on multiple timescales. For this reason, powerful computational approaches are increasingly applied to large NMR data sets to elucidate conformational ensembles sampled by biomolecules. In the past decade, considerable attention has been directed at an important class of biomolecules that function by binding to a wide variety of target molecules. Questions of current interest are: “Does the free biomolecule sample a conformational ensemble that encompasses the conformations found when it binds to various targets; and if so, on what time scale is the ensemble sampled?” This article reviews recent efforts to answer these questions, with a focus on comparing ensembles obtained for the same biomolecules by different investigators. A detailed comparison of results obtained is provided for three biomolecules: ubiquitin, calmodulin and the HIV-1 trans-activation response RNA. PMID:25669739
The integrated process rates (IPR) estimated by the Eta-CMAQ model at grid cells along the trajectory of the air mass transport path were analyzed to quantitatively investigate the relative importance of physical and chemical processes for O3 formation and evolution ov...
An ensemble method for extracting adverse drug events from social media.
Liu, Jing; Zhao, Songzheng; Zhang, Xiaodi
2016-06-01
Because adverse drug events (ADEs) are a serious health problem and a leading cause of death, it is of vital importance to identify them correctly and in a timely manner. With the development of Web 2.0, social media has become a large data source for information on ADEs. The objective of this study is to develop a relation extraction system that uses natural language processing techniques to effectively distinguish between ADEs and non-ADEs in informal text on social media. We develop a feature-based approach that utilizes various lexical, syntactic, and semantic features. Information-gain-based feature selection is performed to address high-dimensional features. Then, we evaluate the effectiveness of four well-known kernel-based approaches (i.e., subset tree kernel, tree kernel, shortest dependency path kernel, and all-paths graph kernel) and several ensembles that are generated by adopting different combination methods (i.e., majority voting, weighted averaging, and stacked generalization). All of the approaches are tested using three data sets: two health-related discussion forums and one general social networking site (i.e., Twitter). When investigating the contribution of each feature subset, the feature-based approach attains the best area under the receiver operating characteristics curve (AUC) values, which are 78.6%, 72.2%, and 79.2% on the three data sets. When individual methods are used, we attain the best AUC values of 82.1%, 73.2%, and 77.0% using the subset tree kernel, shortest dependency path kernel, and feature-based approach on the three data sets, respectively. When using classifier ensembles, we achieve the best AUC values of 84.5%, 77.3%, and 84.5% on the three data sets, outperforming the baselines. Our experimental results indicate that ADE extraction from social media can benefit from feature selection. With respect to the effectiveness of different feature subsets, lexical features and semantic features can enhance the ADE extraction capability. Kernel-based approaches, which can stay away from the feature sparsity issue, are qualified to address the ADE extraction problem. Combining different individual classifiers using suitable combination methods can further enhance the ADE extraction effectiveness. Copyright © 2016 Elsevier B.V. All rights reserved.
Composite pulses for interferometry in a thermal cold atom cloud
NASA Astrophysics Data System (ADS)
Dunning, Alexander; Gregory, Rachel; Bateman, James; Cooper, Nathan; Himsworth, Matthew; Jones, Jonathan A.; Freegarde, Tim
2014-09-01
Atom interferometric sensors and quantum information processors must maintain coherence while the evolving quantum wave function is split, transformed, and recombined, but suffer from experimental inhomogeneities and uncertainties in the speeds and paths of these operations. Several error-correction techniques have been proposed to isolate the variable of interest. Here we apply composite pulse methods to velocity-sensitive Raman state manipulation in a freely expanding thermal atom cloud. We compare several established pulse sequences, and follow the state evolution within them. The agreement between measurements and simple predictions shows the underlying coherence of the atom ensemble, and the inversion infidelity in a ˜80μK atom cloud is halved. Composite pulse techniques, especially if tailored for atom interferometric applications, should allow greater interferometer areas, larger atomic samples, and longer interaction times, and hence improve the sensitivity of quantum technologies from inertial sensing and clocks to quantum information processors and tests of fundamental physics.
A path integral approach to the full Dicke model with dipole-dipole interaction
NASA Astrophysics Data System (ADS)
Aparicio Alcalde, M.; Stephany, J.; Svaiter, N. F.
2011-12-01
We consider the full Dicke spin-boson model composed by a single bosonic mode and an ensemble of N identical two-level atoms with different couplings for the resonant and anti-resonant interaction terms, and incorporate a dipole-dipole interaction between the atoms. Assuming that the system is in thermal equilibrium with a reservoir at temperature β-1, we compute the free energy in the thermodynamic limit N → ∞ in the saddle-point approximation to the path integral and determine the critical temperature for the super-radiant phase transition. In the zero temperature limit, we recover the critical coupling of the quantum phase transition, presented in the literature.
High-density amorphous ice: A path-integral simulation
NASA Astrophysics Data System (ADS)
Herrero, Carlos P.; Ramírez, Rafael
2012-09-01
Structural and thermodynamic properties of high-density amorphous (HDA) ice have been studied by path-integral molecular dynamics simulations in the isothermal-isobaric ensemble. Interatomic interactions were modeled by using the effective q-TIP4P/F potential for flexible water. Quantum nuclear motion is found to affect several observable properties of the amorphous solid. At low temperature (T = 50 K) the molar volume of HDA ice is found to increase by 6%, and the intramolecular O-H distance rises by 1.4% due to quantum motion. Peaks in the radial distribution function of HDA ice are broadened with respect to their classical expectancy. The bulk modulus, B, is found to rise linearly with the pressure, with a slope ∂B/∂P = 7.1. Our results are compared with those derived earlier from classical and path-integral simulations of HDA ice. We discuss similarities and discrepancies with those earlier simulations.
NASA Astrophysics Data System (ADS)
Weber, Steven; Murch, K. W.; Chantasri, A.; Dressel, J.; Jordan, A. N.; Siddiqi, I.
2014-03-01
We use weak measurements to track individual quantum trajectories of a superconducting qubit embedded in a microwave cavity. Using a near-quantum-limited parametric amplifier, we selectively measure either the phase or amplitude of the cavity field, and thereby confine trajectories to either the equator or a meridian of the Bloch sphere. We analyze ensembles of trajectories to determine statistical properties such as the most likely path and most likely time connecting pre and post-selected quantum states. We compare our results with theoretical predictions derived from an action principle for continuous quantum measurement. Furthermore, by introducing a qubit drive, we investigate the interplay between unitary state evolution and non-unitary measurement dynamics. This work was supported by the IARPA CSQ program and the ONR.
Dual-wavelength pump-probe microscopy analysis of melanin composition
NASA Astrophysics Data System (ADS)
Thompson, Andrew; Robles, Francisco E.; Wilson, Jesse W.; Deb, Sanghamitra; Calderbank, Robert; Warren, Warren S.
2016-11-01
Pump-probe microscopy is an emerging technique that provides detailed chemical information of absorbers with sub-micrometer spatial resolution. Recent work has shown that the pump-probe signals from melanin in human skin cancers correlate well with clinical concern, but it has been difficult to infer the molecular origins of these differences. Here we develop a mathematical framework to describe the pump-probe dynamics of melanin in human pigmented tissue samples, which treats the ensemble of individual chromophores that make up melanin as Gaussian absorbers with bandwidth related via Frenkel excitons. Thus, observed signals result from an interplay between the spectral bandwidths of the individual underlying chromophores and spectral proximity of the pump and probe wavelengths. The model is tested using a dual-wavelength pump-probe approach and a novel signal processing method based on gnomonic projections. Results show signals can be described by a single linear transition path with different rates of progress for different individual pump-probe wavelength pairs. Moreover, the combined dual-wavelength data shows a nonlinear transition that supports our mathematical framework and the excitonic model to describe the optical properties of melanin. The novel gnomonic projection analysis can also be an attractive generic tool for analyzing mixing paths in biomolecular and analytical chemistry.
Dual-wavelength pump-probe microscopy analysis of melanin composition
Thompson, Andrew; Robles, Francisco E.; Wilson, Jesse W.; Deb, Sanghamitra; Calderbank, Robert; Warren, Warren S.
2016-01-01
Pump-probe microscopy is an emerging technique that provides detailed chemical information of absorbers with sub-micrometer spatial resolution. Recent work has shown that the pump-probe signals from melanin in human skin cancers correlate well with clinical concern, but it has been difficult to infer the molecular origins of these differences. Here we develop a mathematical framework to describe the pump-probe dynamics of melanin in human pigmented tissue samples, which treats the ensemble of individual chromophores that make up melanin as Gaussian absorbers with bandwidth related via Frenkel excitons. Thus, observed signals result from an interplay between the spectral bandwidths of the individual underlying chromophores and spectral proximity of the pump and probe wavelengths. The model is tested using a dual-wavelength pump-probe approach and a novel signal processing method based on gnomonic projections. Results show signals can be described by a single linear transition path with different rates of progress for different individual pump-probe wavelength pairs. Moreover, the combined dual-wavelength data shows a nonlinear transition that supports our mathematical framework and the excitonic model to describe the optical properties of melanin. The novel gnomonic projection analysis can also be an attractive generic tool for analyzing mixing paths in biomolecular and analytical chemistry. PMID:27833147
NASA Astrophysics Data System (ADS)
Wu, Xiongwu; Brooks, Bernard R.
2011-11-01
The self-guided Langevin dynamics (SGLD) is a method to accelerate conformational searching. This method is unique in the way that it selectively enhances and suppresses molecular motions based on their frequency to accelerate conformational searching without modifying energy surfaces or raising temperatures. It has been applied to studies of many long time scale events, such as protein folding. Recent progress in the understanding of the conformational distribution in SGLD simulations makes SGLD also an accurate method for quantitative studies. The SGLD partition function provides a way to convert the SGLD conformational distribution to the canonical ensemble distribution and to calculate ensemble average properties through reweighting. Based on the SGLD partition function, this work presents a force-momentum-based self-guided Langevin dynamics (SGLDfp) simulation method to directly sample the canonical ensemble. This method includes interaction forces in its guiding force to compensate the perturbation caused by the momentum-based guiding force so that it can approximately sample the canonical ensemble. Using several example systems, we demonstrate that SGLDfp simulations can approximately maintain the canonical ensemble distribution and significantly accelerate conformational searching. With optimal parameters, SGLDfp and SGLD simulations can cross energy barriers of more than 15 kT and 20 kT, respectively, at similar rates for LD simulations to cross energy barriers of 10 kT. The SGLDfp method is size extensive and works well for large systems. For studies where preserving accessible conformational space is critical, such as free energy calculations and protein folding studies, SGLDfp is an efficient approach to search and sample the conformational space.
A New Look into the Effect of Large Drops on Radiative Transfer Process
NASA Technical Reports Server (NTRS)
Marshak, Alexander
2003-01-01
Recent studies indicate that a cloudy atmosphere absorbs more solar radiation than any current 1D or 3D radiation model can predict. The excess absorption is not large, perhaps 10-15 W/sq m or less, but any such systematic bias is of concern since radiative transfer models are assumed to be sufficiently accurate for remote sensing applications and climate modeling. The most natural explanation would be that models do not capture real 3D cloud structure and, as a consequence, their photon path lengths are too short. However, extensive calculations, using increasingly realistic 3D cloud structures, failed to produce photon paths long enough to explain the excess absorption. Other possible explanations have also been unsuccessful so, at this point, conventional models seem to offer no solution to this puzzle. The weakest link in conventional models is the way a size distribution of cloud particles is mathematically handled. Basically, real particles are replaced with a single average particle. This "ensemble assumption" assumes that all particle sizes are well represented in any given elementary volume. But the concentration of larger particles can be so low that this assumption is significantly violated. We show how a different mathematical route, using the concept of a cumulative distribution, avoids the ensemble assumption. The cumulative distribution has jumps, or steps, corresponding to the rarer sizes. These jumps result in an additional term, a kind of Green's function, in the solution of the radiative transfer equation. Solving the cloud radiative transfer equation with the measured particle distributions, described in a cumulative rather than an ensemble fashion, may lead to increased cloud absorption of the magnitude observed.
Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang
2016-11-16
The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.
Enhanced Sampling in the Well-Tempered Ensemble
NASA Astrophysics Data System (ADS)
Bonomi, M.; Parrinello, M.
2010-05-01
We introduce the well-tempered ensemble (WTE) which is the biased ensemble sampled by well-tempered metadynamics when the energy is used as collective variable. WTE can be designed so as to have approximately the same average energy as the canonical ensemble but much larger fluctuations. These two properties lead to an extremely fast exploration of phase space. An even greater efficiency is obtained when WTE is combined with parallel tempering. Unbiased Boltzmann averages are computed on the fly by a recently developed reweighting method [M. Bonomi , J. Comput. Chem. 30, 1615 (2009)JCCHDD0192-865110.1002/jcc.21305]. We apply WTE and its parallel tempering variant to the 2d Ising model and to a Gō model of HIV protease, demonstrating in these two representative cases that convergence is accelerated by orders of magnitude.
Enhanced sampling in the well-tempered ensemble.
Bonomi, M; Parrinello, M
2010-05-14
We introduce the well-tempered ensemble (WTE) which is the biased ensemble sampled by well-tempered metadynamics when the energy is used as collective variable. WTE can be designed so as to have approximately the same average energy as the canonical ensemble but much larger fluctuations. These two properties lead to an extremely fast exploration of phase space. An even greater efficiency is obtained when WTE is combined with parallel tempering. Unbiased Boltzmann averages are computed on the fly by a recently developed reweighting method [M. Bonomi, J. Comput. Chem. 30, 1615 (2009)]. We apply WTE and its parallel tempering variant to the 2d Ising model and to a Gō model of HIV protease, demonstrating in these two representative cases that convergence is accelerated by orders of magnitude.
NASA Astrophysics Data System (ADS)
Pribram-Jones, Aurora
Warm dense matter (WDM) is a high energy phase between solids and plasmas, with characteristics of both. It is present in the centers of giant planets, within the earth's core, and on the path to ignition of inertial confinement fusion. The high temperatures and pressures of warm dense matter lead to complications in its simulation, as both classical and quantum effects must be included. One of the most successful simulation methods is density functional theory-molecular dynamics (DFT-MD). Despite great success in a diverse array of applications, DFT-MD remains computationally expensive and it neglects the explicit temperature dependence of electron-electron interactions known to exist within exact DFT. Finite-temperature density functional theory (FT DFT) is an extension of the wildly successful ground-state DFT formalism via thermal ensembles, broadening its quantum mechanical treatment of electrons to include systems at non-zero temperatures. Exact mathematical conditions have been used to predict the behavior of approximations in limiting conditions and to connect FT DFT to the ground-state theory. An introduction to FT DFT is given within the context of ensemble DFT and the larger field of DFT is discussed for context. Ensemble DFT is used to describe ensembles of ground-state and excited systems. Exact conditions in ensemble DFT and the performance of approximations depend on ensemble weights. Using an inversion method, exact Kohn-Sham ensemble potentials are found and compared to approximations. The symmetry eigenstate Hartree-exchange approximation is in good agreement with exact calculations because of its inclusion of an ensemble derivative discontinuity. Since ensemble weights in FT DFT are temperature-dependent Fermi weights, this insight may help develop approximations well-suited to both ground-state and FT DFT. A novel, highly efficient approach to free energy calculations, finite-temperature potential functional theory, is derived, which has the potential to transform the simulation of warm dense matter. As a semiclassical method, it connects the normally disparate regimes of cold condensed matter physics and hot plasma physics. This orbital-free approach captures the smooth classical density envelope and quantum density oscillations that are both crucial to accurate modeling of materials where temperature and pressure effects are influential.
Scanless nonlinear optical microscope for image reconstruction and space-time correlation analysis
NASA Astrophysics Data System (ADS)
Ceffa, N. G.; Radaelli, F.; Pozzi, P.; Collini, M.; Sironi, L.; D'alfonso, L.; Chirico, G.
2017-06-01
Optical Microscopy has been applied to life science from its birth and reached widespread application due to its major advantages: limited perturbation of the biological tissue and the easy accessibility of the light sources. However, as the spatial and time resolution requirements and the time stability of the microscopes increase, researchers are struggling against some of its limitations: limited transparency and the refractivity of the living tissue to light and the field perturbations induced by the path in the tissue. We have developed a compact stand-alone, completely scan-less, optical setup that allows to acquire non-linear excitation images and to measure the sample dynamics simultaneously on an ensemble of arbitrary chosen regions of interests. The image is obtained by shining a square array of spots on the sample obtained by a spatial light modulator and by shifting it (10 ms refresh time) on the sample. The final image is computed from the superposition of (100-1000) images. Filtering procedures can be applied to the raw images of the excitation array before building the image. We discuss results that show how this setup can be used for the correction of wave front aberrations induced by turbid samples (such as living tissues) and for the computation of space-time cross-correlations in complex networks.
Multivariate localization methods for ensemble Kalman filtering
NASA Astrophysics Data System (ADS)
Roh, S.; Jun, M.; Szunyogh, I.; Genton, M. G.
2015-05-01
In ensemble Kalman filtering (EnKF), the small number of ensemble members that is feasible to use in a practical data assimilation application leads to sampling variability of the estimates of the background error covariances. The standard approach to reducing the effects of this sampling variability, which has also been found to be highly efficient in improving the performance of EnKF, is the localization of the estimates of the covariances. One family of localization techniques is based on taking the Schur (entry-wise) product of the ensemble-based sample covariance matrix and a correlation matrix whose entries are obtained by the discretization of a distance-dependent correlation function. While the proper definition of the localization function for a single state variable has been extensively investigated, a rigorous definition of the localization function for multiple state variables has been seldom considered. This paper introduces two strategies for the construction of localization functions for multiple state variables. The proposed localization functions are tested by assimilating simulated observations experiments into the bivariate Lorenz 95 model with their help.
Giuliani, Alessandro; Tomita, Masaru
2010-01-01
Cell fate decision remarkably generates specific cell differentiation path among the multiple possibilities that can arise through the complex interplay of high-dimensional genome activities. The coordinated action of thousands of genes to switch cell fate decision has indicated the existence of stable attractors guiding the process. However, origins of the intracellular mechanisms that create “cellular attractor” still remain unknown. Here, we examined the collective behavior of genome-wide expressions for neutrophil differentiation through two different stimuli, dimethyl sulfoxide (DMSO) and all-trans-retinoic acid (atRA). To overcome the difficulties of dealing with single gene expression noises, we grouped genes into ensembles and analyzed their expression dynamics in correlation space defined by Pearson correlation and mutual information. The standard deviation of correlation distributions of gene ensembles reduces when the ensemble size is increased following the inverse square root law, for both ensembles chosen randomly from whole genome and ranked according to expression variances across time. Choosing the ensemble size of 200 genes, we show the two probability distributions of correlations of randomly selected genes for atRA and DMSO responses overlapped after 48 hours, defining the neutrophil attractor. Next, tracking the ranked ensembles' trajectories, we noticed that only certain, not all, fall into the attractor in a fractal-like manner. The removal of these genome elements from the whole genomes, for both atRA and DMSO responses, destroys the attractor providing evidence for the existence of specific genome elements (named “genome vehicle”) responsible for the neutrophil attractor. Notably, within the genome vehicles, genes with low or moderate expression changes, which are often considered noisy and insignificant, are essential components for the creation of the neutrophil attractor. Further investigations along with our findings might provide a comprehensive mechanistic view of cell fate decision. PMID:20725638
Quantum chaos inside black holes
NASA Astrophysics Data System (ADS)
Addazi, Andrea
2017-06-01
We show how semiclassical black holes can be reinterpreted as an effective geometry, composed of a large ensemble of horizonless naked singularities (eventually smoothed at the Planck scale). We call these new items frizzy-balls, which can be rigorously defined by Euclidean path integral approach. This leads to interesting implications about information paradoxes. We demonstrate that infalling information will chaotically propagate inside this system before going to the full quantum gravity regime (Planck scale).
Path statistics, memory, and coarse-graining of continuous-time random walks on networks
Kion-Crosby, Willow; Morozov, Alexandre V.
2015-01-01
Continuous-time random walks (CTRWs) on discrete state spaces, ranging from regular lattices to complex networks, are ubiquitous across physics, chemistry, and biology. Models with coarse-grained states (for example, those employed in studies of molecular kinetics) or spatial disorder can give rise to memory and non-exponential distributions of waiting times and first-passage statistics. However, existing methods for analyzing CTRWs on complex energy landscapes do not address these effects. Here we use statistical mechanics of the nonequilibrium path ensemble to characterize first-passage CTRWs on networks with arbitrary connectivity, energy landscape, and waiting time distributions. Our approach can be applied to calculating higher moments (beyond the mean) of path length, time, and action, as well as statistics of any conservative or non-conservative force along a path. For homogeneous networks, we derive exact relations between length and time moments, quantifying the validity of approximating a continuous-time process with its discrete-time projection. For more general models, we obtain recursion relations, reminiscent of transfer matrix and exact enumeration techniques, to efficiently calculate path statistics numerically. We have implemented our algorithm in PathMAN (Path Matrix Algorithm for Networks), a Python script that users can apply to their model of choice. We demonstrate the algorithm on a few representative examples which underscore the importance of non-exponential distributions, memory, and coarse-graining in CTRWs. PMID:26646868
Ensemble coding of face identity is present but weaker in congenital prosopagnosia.
Robson, Matthew K; Palermo, Romina; Jeffery, Linda; Neumann, Markus F
2018-03-01
Individuals with congenital prosopagnosia (CP) are impaired at identifying individual faces but do not appear to show impairments in extracting the average identity from a group of faces (known as ensemble coding). However, possible deficits in ensemble coding in a previous study (CPs n = 4) may have been masked because CPs relied on pictorial (image) cues rather than identity cues. Here we asked whether a larger sample of CPs (n = 11) would show intact ensemble coding of identity when availability of image cues was minimised. Participants viewed a "set" of four faces and then judged whether a subsequent individual test face, either an exemplar or a "set average", was in the preceding set. Ensemble coding occurred when matching (vs. mismatching) averages were mistakenly endorsed as set members. We assessed both image- and identity-based ensemble coding, by varying whether test faces were either the same or different images of the identities in the set. CPs showed significant ensemble coding in both tasks, indicating that their performance was independent of image cues. As a group, CPs' ensemble coding was weaker than controls in both tasks, consistent with evidence that perceptual processing of face identity is disrupted in CP. This effect was driven by CPs (n= 3) who, in addition to having impaired face memory, also performed particularly poorly on a measure of face perception (CFPT). Future research, using larger samples, should examine whether deficits in ensemble coding may be restricted to CPs who also have substantial face perception deficits. Copyright © 2018 Elsevier Ltd. All rights reserved.
Visualization and classification of physiological failure modes in ensemble hemorrhage simulation
NASA Astrophysics Data System (ADS)
Zhang, Song; Pruett, William Andrew; Hester, Robert
2015-01-01
In an emergency situation such as hemorrhage, doctors need to predict which patients need immediate treatment and care. This task is difficult because of the diverse response to hemorrhage in human population. Ensemble physiological simulations provide a means to sample a diverse range of subjects and may have a better chance of containing the correct solution. However, to reveal the patterns and trends from the ensemble simulation is a challenging task. We have developed a visualization framework for ensemble physiological simulations. The visualization helps users identify trends among ensemble members, classify ensemble member into subpopulations for analysis, and provide prediction to future events by matching a new patient's data to existing ensembles. We demonstrated the effectiveness of the visualization on simulated physiological data. The lessons learned here can be applied to clinically-collected physiological data in the future.
Ozçift, Akin
2011-05-01
Supervised classification algorithms are commonly used in the designing of computer-aided diagnosis systems. In this study, we present a resampling strategy based Random Forests (RF) ensemble classifier to improve diagnosis of cardiac arrhythmia. Random forests is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. In this way, an RF ensemble classifier performs better than a single tree from classification performance point of view. In general, multiclass datasets having unbalanced distribution of sample sizes are difficult to analyze in terms of class discrimination. Cardiac arrhythmia is such a dataset that has multiple classes with small sample sizes and it is therefore adequate to test our resampling based training strategy. The dataset contains 452 samples in fourteen types of arrhythmias and eleven of these classes have sample sizes less than 15. Our diagnosis strategy consists of two parts: (i) a correlation based feature selection algorithm is used to select relevant features from cardiac arrhythmia dataset. (ii) RF machine learning algorithm is used to evaluate the performance of selected features with and without simple random sampling to evaluate the efficiency of proposed training strategy. The resultant accuracy of the classifier is found to be 90.0% and this is a quite high diagnosis performance for cardiac arrhythmia. Furthermore, three case studies, i.e., thyroid, cardiotocography and audiology, are used to benchmark the effectiveness of the proposed method. The results of experiments demonstrated the efficiency of random sampling strategy in training RF ensemble classification algorithm. Copyright © 2011 Elsevier Ltd. All rights reserved.
On the generation of climate model ensembles
NASA Astrophysics Data System (ADS)
Haughton, Ned; Abramowitz, Gab; Pitman, Andy; Phipps, Steven J.
2014-10-01
Climate model ensembles are used to estimate uncertainty in future projections, typically by interpreting the ensemble distribution for a particular variable probabilistically. There are, however, different ways to produce climate model ensembles that yield different results, and therefore different probabilities for a future change in a variable. Perhaps equally importantly, there are different approaches to interpreting the ensemble distribution that lead to different conclusions. Here we use a reduced-resolution climate system model to compare three common ways to generate ensembles: initial conditions perturbation, physical parameter perturbation, and structural changes. Despite these three approaches conceptually representing very different categories of uncertainty within a modelling system, when comparing simulations to observations of surface air temperature they can be very difficult to separate. Using the twentieth century CMIP5 ensemble for comparison, we show that initial conditions ensembles, in theory representing internal variability, significantly underestimate observed variance. Structural ensembles, perhaps less surprisingly, exhibit over-dispersion in simulated variance. We argue that future climate model ensembles may need to include parameter or structural perturbation members in addition to perturbed initial conditions members to ensure that they sample uncertainty due to internal variability more completely. We note that where ensembles are over- or under-dispersive, such as for the CMIP5 ensemble, estimates of uncertainty need to be treated with care.
Program for narrow-band analysis of aircraft flyover noise using ensemble averaging techniques
NASA Technical Reports Server (NTRS)
Gridley, D.
1982-01-01
A package of computer programs was developed for analyzing acoustic data from an aircraft flyover. The package assumes the aircraft is flying at constant altitude and constant velocity in a fixed attitude over a linear array of ground microphones. Aircraft position is provided by radar and an option exists for including the effects of the aircraft's rigid-body attitude relative to the flight path. Time synchronization between radar and acoustic recording stations permits ensemble averaging techniques to be applied to the acoustic data thereby increasing the statistical accuracy of the acoustic results. Measured layered meteorological data obtained during the flyovers are used to compute propagation effects through the atmosphere. Final results are narrow-band spectra and directivities corrected for the flight environment to an equivalent static condition at a specified radius.
Quasi-most unstable modes: a window to 'À la carte' ensemble diversity?
NASA Astrophysics Data System (ADS)
Homar Santaner, Victor; Stensrud, David J.
2010-05-01
The atmospheric scientific community is nowadays facing the ambitious challenge of providing useful forecasts of atmospheric events that produce high societal impact. The low level of social resilience to false alarms creates tremendous pressure on forecasting offices to issue accurate, timely and reliable warnings.Currently, no operational numerical forecasting system is able to respond to the societal demand for high-resolution (in time and space) predictions in the 12-72h time span. The main reasons for such deficiencies are the lack of adequate observations and the high non-linearity of the numerical models that are currently used. The whole weather forecasting problem is intrinsically probabilistic and current methods aim at coping with the various sources of uncertainties and the error propagation throughout the forecasting system. This probabilistic perspective is often created by generating ensembles of deterministic predictions that are aimed at sampling the most important sources of uncertainty in the forecasting system. The ensemble generation/sampling strategy is a crucial aspect of their performance and various methods have been proposed. Although global forecasting offices have been using ensembles of perturbed initial conditions for medium-range operational forecasts since 1994, no consensus exists regarding the optimum sampling strategy for high resolution short-range ensemble forecasts. Bred vectors, however, have been hypothesized to better capture the growing modes in the highly nonlinear mesoscale dynamics of severe episodes than singular vectors or observation perturbations. Yet even this technique is not able to produce enough diversity in the ensembles to accurately and routinely predict extreme phenomena such as severe weather. Thus, we propose a new method to generate ensembles of initial conditions perturbations that is based on the breeding technique. Given a standard bred mode, a set of customized perturbations is derived with specified amplitudes and horizontal scales. This allows the ensemble to excite growing modes across a wider range of scales. Results show that this approach produces significantly more spread in the ensemble prediction than standard bred modes alone. Several examples that illustrate the benefits from this approach for severe weather forecasts will be provided.
Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.
Yu, Zhiwen; Chen, Hantao; You, Jane; Han, Guoqiang; Li, Le
2013-01-01
Cancer class discovery using biomolecular data is one of the most important tasks for cancer diagnosis and treatment. Tumor clustering from gene expression data provides a new way to perform cancer class discovery. Most of the existing research works adopt single-clustering algorithms to perform tumor clustering is from biomolecular data that lack robustness, stability, and accuracy. To further improve the performance of tumor clustering from biomolecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from biomolecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks (HFCEF), named as HFCEF-I, HFCEF-II, HFCEF-III, and HFCEF-IV, respectively, to identify samples that belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. HFCEFs adopt suitable consensus functions, such as the fuzzy c-means algorithm or the normalized cut algorithm (Ncut), to summarize generated fuzzy matrices, and obtain the final results. The experiments on real data sets from UCI machine learning repository and cancer gene expression profiles illustrate that 1) the proposed hybrid fuzzy cluster ensemble frameworks work well on real data sets, especially biomolecular data, and 2) the proposed approaches are able to provide more robust, stable, and accurate results when compared with the state-of-the-art single clustering algorithms and traditional cluster ensemble approaches.
Using simulation to interpret experimental data in terms of protein conformational ensembles.
Allison, Jane R
2017-04-01
In their biological environment, proteins are dynamic molecules, necessitating an ensemble structural description. Molecular dynamics simulations and solution-state experiments provide complimentary information in the form of atomically detailed coordinates and averaged or distributions of structural properties or related quantities. Recently, increases in the temporal and spatial scale of conformational sampling and comparison of the more diverse conformational ensembles thus generated have revealed the importance of sampling rare events. Excitingly, new methods based on maximum entropy and Bayesian inference are promising to provide a statistically sound mechanism for combining experimental data with molecular dynamics simulations. Copyright © 2016 Elsevier Ltd. All rights reserved.
Zheng, Weihua; Gallicchio, Emilio; Deng, Nanjie; Andrec, Michael; Levy, Ronald M.
2011-01-01
We present a new approach to study a multitude of folding pathways and different folding mechanisms for the 20-residue mini-protein Trp-Cage using the combined power of replica exchange molecular dynamics (REMD) simulations for conformational sampling, Transition Path Theory (TPT) for constructing folding pathways and stochastic simulations for sampling the pathways in a high dimensional structure space. REMD simulations of Trp-Cage with 16 replicas at temperatures between 270K and 566K are carried out with an all-atom force field (OPLSAA) and an implicit solvent model (AGBNP). The conformations sampled from all temperatures are collected. They form a discretized state space that can be used to model the folding process. The equilibrium population for each state at a target temperature can be calculated using the Weighted-Histogram-Analysis Method (WHAM). By connecting states with similar structures and creating edges satisfying detailed balance conditions, we construct a kinetic network that preserves the equilibrium population distribution of the state space. After defining the folded and unfolded macrostates, committor probabilities (Pfold) are calculated by solving a set of linear equations for each node in the network and pathways are extracted together with their fluxes using the TPT algorithm. By clustering the pathways into folding “tubes”, a more physically meaningful picture of the diversity of folding routes emerges. Stochastic simulations are carried out on the network and a procedure is developed to project sampled trajectories onto the folding tubes. The fluxes through the folding tubes calculated from the stochastic trajectories are in good agreement with the corresponding values obtained from the TPT analysis. The temperature dependence of the ensemble of Trp-Cage folding pathways is investigated. Above the folding temperature, a large number of diverse folding pathways with comparable fluxes flood the energy landscape. At low temperature, however, the folding transition is dominated by only a few localized pathways. PMID:21254767
Zheng, Weihua; Gallicchio, Emilio; Deng, Nanjie; Andrec, Michael; Levy, Ronald M
2011-02-17
We present a new approach to study a multitude of folding pathways and different folding mechanisms for the 20-residue mini-protein Trp-Cage using the combined power of replica exchange molecular dynamics (REMD) simulations for conformational sampling, transition path theory (TPT) for constructing folding pathways, and stochastic simulations for sampling the pathways in a high dimensional structure space. REMD simulations of Trp-Cage with 16 replicas at temperatures between 270 and 566 K are carried out with an all-atom force field (OPLSAA) and an implicit solvent model (AGBNP). The conformations sampled from all temperatures are collected. They form a discretized state space that can be used to model the folding process. The equilibrium population for each state at a target temperature can be calculated using the weighted-histogram-analysis method (WHAM). By connecting states with similar structures and creating edges satisfying detailed balance conditions, we construct a kinetic network that preserves the equilibrium population distribution of the state space. After defining the folded and unfolded macrostates, committor probabilities (P(fold)) are calculated by solving a set of linear equations for each node in the network and pathways are extracted together with their fluxes using the TPT algorithm. By clustering the pathways into folding "tubes", a more physically meaningful picture of the diversity of folding routes emerges. Stochastic simulations are carried out on the network, and a procedure is developed to project sampled trajectories onto the folding tubes. The fluxes through the folding tubes calculated from the stochastic trajectories are in good agreement with the corresponding values obtained from the TPT analysis. The temperature dependence of the ensemble of Trp-Cage folding pathways is investigated. Above the folding temperature, a large number of diverse folding pathways with comparable fluxes flood the energy landscape. At low temperature, however, the folding transition is dominated by only a few localized pathways.
2012-01-01
Background Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble? Results The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity. Conclusion Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway. PMID:23216969
Günther, Oliver P; Chen, Virginia; Freue, Gabriela Cohen; Balshaw, Robert F; Tebbutt, Scott J; Hollander, Zsuzsanna; Takhar, Mandeep; McMaster, W Robert; McManus, Bruce M; Keown, Paul A; Ng, Raymond T
2012-12-08
Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble? The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity. Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway.
Molecular dynamics simulations using temperature-enhanced essential dynamics replica exchange.
Kubitzki, Marcus B; de Groot, Bert L
2007-06-15
Today's standard molecular dynamics simulations of moderately sized biomolecular systems at full atomic resolution are typically limited to the nanosecond timescale and therefore suffer from limited conformational sampling. Efficient ensemble-preserving algorithms like replica exchange (REX) may alleviate this problem somewhat but are still computationally prohibitive due to the large number of degrees of freedom involved. Aiming at increased sampling efficiency, we present a novel simulation method combining the ideas of essential dynamics and REX. Unlike standard REX, in each replica only a selection of essential collective modes of a subsystem of interest (essential subspace) is coupled to a higher temperature, with the remainder of the system staying at a reference temperature, T(0). This selective excitation along with the replica framework permits efficient approximate ensemble-preserving conformational sampling and allows much larger temperature differences between replicas, thereby considerably enhancing sampling efficiency. Ensemble properties and sampling performance of the method are discussed using dialanine and guanylin test systems, with multi-microsecond molecular dynamics simulations of these test systems serving as references.
Multivariate localization methods for ensemble Kalman filtering
NASA Astrophysics Data System (ADS)
Roh, S.; Jun, M.; Szunyogh, I.; Genton, M. G.
2015-12-01
In ensemble Kalman filtering (EnKF), the small number of ensemble members that is feasible to use in a practical data assimilation application leads to sampling variability of the estimates of the background error covariances. The standard approach to reducing the effects of this sampling variability, which has also been found to be highly efficient in improving the performance of EnKF, is the localization of the estimates of the covariances. One family of localization techniques is based on taking the Schur (element-wise) product of the ensemble-based sample covariance matrix and a correlation matrix whose entries are obtained by the discretization of a distance-dependent correlation function. While the proper definition of the localization function for a single state variable has been extensively investigated, a rigorous definition of the localization function for multiple state variables that exist at the same locations has been seldom considered. This paper introduces two strategies for the construction of localization functions for multiple state variables. The proposed localization functions are tested by assimilating simulated observations experiments into the bivariate Lorenz 95 model with their help.
Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo
2016-01-01
Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble's output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) - k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer's disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.
Algorithms that Defy the Gravity of Learning Curve
2017-04-28
three nearest neighbour-based anomaly detectors, i.e., an ensemble of nearest neigh- bours, a recent nearest neighbour-based ensemble method called iNNE...streams. Note that the change in sample size does not alter the geometrical data characteristics discussed here. 3.1 Experimental Methodology ...need to be answered. 3.6 Comparison with conventional ensemble methods Given the theoretical results, the third aim of this project (i.e., identify the
Model dependence and its effect on ensemble projections in CMIP5
NASA Astrophysics Data System (ADS)
Abramowitz, G.; Bishop, C.
2013-12-01
Conceptually, the notion of model dependence within climate model ensembles is relatively simple - modelling groups share a literature base, parametrisations, data sets and even model code - the potential for dependence in sampling different climate futures is clear. How though can this conceptual problem inform a practical solution that demonstrably improves the ensemble mean and ensemble variance as an estimate of system uncertainty? While some research has already focused on error correlation or error covariance as a candidate to improve ensemble mean estimates, a complete definition of independence must at least implicitly subscribe to an ensemble interpretation paradigm, such as the 'truth-plus-error', 'indistinguishable', or more recently 'replicate Earth' paradigm. Using a definition of model dependence based on error covariance within the replicate Earth paradigm, this presentation will show that accounting for dependence in surface air temperature gives cooler projections in CMIP5 - by as much as 20% globally in some RCPs - although results differ significantly for each RCP, especially regionally. The fact that the change afforded by accounting for dependence across different RCPs is different is not an inconsistent result. Different numbers of submissions to each RCP by different modelling groups mean that differences in projections from different RCPs are not entirely about RCP forcing conditions - they also reflect different sampling strategies.
Rethinking the Default Construction of Multimodel Climate Ensembles
Rauser, Florian; Gleckler, Peter; Marotzke, Jochem
2015-07-21
Here, we discuss the current code of practice in the climate sciences to routinely create climate model ensembles as ensembles of opportunity from the newest phase of the Coupled Model Intercomparison Project (CMIP). We give a two-step argument to rethink this process. First, the differences between generations of ensembles corresponding to different CMIP phases in key climate quantities are not large enough to warrant an automatic separation into generational ensembles for CMIP3 and CMIP5. Second, we suggest that climate model ensembles cannot continue to be mere ensembles of opportunity but should always be based on a transparent scientific decision process.more » If ensembles can be constrained by observation, then they should be constructed as target ensembles that are specifically tailored to a physical question. If model ensembles cannot be constrained by observation, then they should be constructed as cross-generational ensembles, including all available model data to enhance structural model diversity and to better sample the underlying uncertainties. To facilitate this, CMIP should guide the necessarily ongoing process of updating experimental protocols for the evaluation and documentation of coupled models. Finally, with an emphasis on easy access to model data and facilitating the filtering of climate model data across all CMIP generations and experiments, our community could return to the underlying idea of using model data ensembles to improve uncertainty quantification, evaluation, and cross-institutional exchange.« less
Yang, Shan; Al-Hashimi, Hashim M.
2016-01-01
A growing number of studies employ time-averaged experimental data to determine dynamic ensembles of biomolecules. While it is well known that different ensembles can satisfy experimental data to within error, the extent and nature of these degeneracies, and their impact on the accuracy of the ensemble determination remains poorly understood. Here, we use simulations and a recently introduced metric for assessing ensemble similarity to explore degeneracies in determining ensembles using NMR residual dipolar couplings (RDCs) with specific application to A-form helices in RNA. Various target ensembles were constructed representing different domain-domain orientational distributions that are confined to a topologically restricted (<10%) conformational space. Five independent sets of ensemble averaged RDCs were then computed for each target ensemble and a ‘sample and select’ scheme used to identify degenerate ensembles that satisfy RDCs to within experimental uncertainty. We find that ensembles with different ensemble sizes and that can differ significantly from the target ensemble (by as much as ΣΩ ~ 0.4 where ΣΩ varies between 0 and 1 for maximum and minimum ensemble similarity, respectively) can satisfy the ensemble averaged RDCs. These deviations increase with the number of unique conformers and breadth of the target distribution, and result in significant uncertainty in determining conformational entropy (as large as 5 kcal/mol at T = 298 K). Nevertheless, the RDC-degenerate ensembles are biased towards populated regions of the target ensemble, and capture other essential features of the distribution, including the shape. Our results identify ensemble size as a major source of uncertainty in determining ensembles and suggest that NMR interactions such as RDCs and spin relaxation, on their own, do not carry the necessary information needed to determine conformational entropy at a useful level of precision. The framework introduced here provides a general approach for exploring degeneracies in ensemble determination for different types of experimental data. PMID:26131693
Methodology for Augmenting Existing Paths with Additional Parallel Transects
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilson, John E.
2013-09-30
Visual Sample Plan (VSP) is sample planning software that is used, among other purposes, to plan transect sampling paths to detect areas that were potentially used for munition training. This module was developed for application on a large site where existing roads and trails were to be used as primary sampling paths. Gap areas between these primary paths needed to found and covered with parallel transect paths. These gap areas represent areas on the site that are more than a specified distance from a primary path. These added parallel paths needed to optionally be connected together into a single path—themore » shortest path possible. The paths also needed to optionally be attached to existing primary paths, again with the shortest possible path. Finally, the process must be repeatable and predictable so that the same inputs (primary paths, specified distance, and path options) will result in the same set of new paths every time. This methodology was developed to meet those specifications.« less
NASA Astrophysics Data System (ADS)
Yuvchenko, S. A.; Ushakova, E. V.; Pavlova, M. V.; Alonova, M. V.; Zimnyakov, D. A.
2018-04-01
We consider the practical realization of a new optical probe method of the random media which is defined as the reference-free path length interferometry with the intensity moments analysis. A peculiarity in the statistics of the spectrally selected fluorescence radiation in laser-pumped dye-doped random medium is discussed. Previously established correlations between the second- and the third-order moments of the intensity fluctuations in the random interference patterns, the coherence function of the probe radiation, and the path difference probability density for the interfering partial waves in the medium are confirmed. The correlations were verified using the statistical analysis of the spectrally selected fluorescence radiation emitted by a laser-pumped dye-doped random medium. Water solution of Rhodamine 6G was applied as the doping fluorescent agent for the ensembles of the densely packed silica grains, which were pumped by the 532 nm radiation of a solid state laser. The spectrum of the mean path length for a random medium was reconstructed.
On Certain Wronskians of Multiple Orthogonal Polynomials
NASA Astrophysics Data System (ADS)
Zhang, Lun; Filipuk, Galina
2014-11-01
We consider determinants of Wronskian type whose entries are multiple orthogonal polynomials associated with a path connecting two multi-indices. By assuming that the weight functions form an algebraic Chebyshev (AT) system, we show that the polynomials represented by the Wronskians keep a constant sign in some cases, while in some other cases oscillatory behavior appears, which generalizes classical results for orthogonal polynomials due to Karlin and Szegő. There are two applications of our results. The first application arises from the observation that the m-th moment of the average characteristic polynomials for multiple orthogonal polynomial ensembles can be expressed as a Wronskian of the type II multiple orthogonal polynomials. Hence, it is straightforward to obtain the distinct behavior of the moments for odd and even m in a special multiple orthogonal ensemble - the AT ensemble. As the second application, we derive some Turán type inequalities for m! ultiple Hermite and multiple Laguerre polynomials (of two kinds). Finally, we study numerically the geometric configuration of zeros for the Wronskians of these multiple orthogonal polynomials. We observe that the zeros have regular configurations in the complex plane, which might be of independent interest.
Force Sensor Based Tool Condition Monitoring Using a Heterogeneous Ensemble Learning Model
Wang, Guofeng; Yang, Yinwei; Li, Zhimeng
2014-01-01
Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability. PMID:25405514
The NRL relocatable ocean/acoustic ensemble forecast system
NASA Astrophysics Data System (ADS)
Rowley, C.; Martin, P.; Cummings, J.; Jacobs, G.; Coelho, E.; Bishop, C.; Hong, X.; Peggion, G.; Fabre, J.
2009-04-01
A globally relocatable regional ocean nowcast/forecast system has been developed to support rapid implementation of new regional forecast domains. The system is in operational use at the Naval Oceanographic Office for a growing number of regional and coastal implementations. The new system is the basis for an ocean acoustic ensemble forecast and adaptive sampling capability. We present an overview of the forecast system and the ocean ensemble and adaptive sampling methods. The forecast system consists of core ocean data analysis and forecast modules, software for domain configuration, surface and boundary condition forcing processing, and job control, and global databases for ocean climatology, bathymetry, tides, and river locations and transports. The analysis component is the Navy Coupled Ocean Data Assimilation (NCODA) system, a 3D multivariate optimum interpolation system that produces simultaneous analyses of temperature, salinity, geopotential, and vector velocity using remotely-sensed SST, SSH, and sea ice concentration, plus in situ observations of temperature, salinity, and currents from ships, buoys, XBTs, CTDs, profiling floats, and autonomous gliders. The forecast component is the Navy Coastal Ocean Model (NCOM). The system supports one-way nesting and multiple assimilation methods. The ensemble system uses the ensemble transform technique with error variance estimates from the NCODA analysis to represent initial condition error. Perturbed surface forcing or an atmospheric ensemble is used to represent errors in surface forcing. The ensemble transform Kalman filter is used to assess the impact of adaptive observations on future analysis and forecast uncertainty for both ocean and acoustic properties.
Force sensor based tool condition monitoring using a heterogeneous ensemble learning model.
Wang, Guofeng; Yang, Yinwei; Li, Zhimeng
2014-11-14
Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability.
Zerbino, Daniel R.; Johnson, Nathan; Juetteman, Thomas; Sheppard, Dan; Wilder, Steven P.; Lavidas, Ilias; Nuhn, Michael; Perry, Emily; Raffaillac-Desfosses, Quentin; Sobral, Daniel; Keefe, Damian; Gräf, Stefan; Ahmed, Ikhlak; Kinsella, Rhoda; Pritchard, Bethan; Brent, Simon; Amode, Ridwan; Parker, Anne; Trevanion, Steven; Birney, Ewan; Dunham, Ian; Flicek, Paul
2016-01-01
New experimental techniques in epigenomics allow researchers to assay a diversity of highly dynamic features such as histone marks, DNA modifications or chromatin structure. The study of their fluctuations should provide insights into gene expression regulation, cell differentiation and disease. The Ensembl project collects and maintains the Ensembl regulation data resources on epigenetic marks, transcription factor binding and DNA methylation for human and mouse, as well as microarray probe mappings and annotations for a variety of chordate genomes. From this data, we produce a functional annotation of the regulatory elements along the human and mouse genomes with plans to expand to other species as data becomes available. Starting from well-studied cell lines, we will progressively expand our library of measurements to a greater variety of samples. Ensembl’s regulation resources provide a central and easy-to-query repository for reference epigenomes. As with all Ensembl data, it is freely available at http://www.ensembl.org, from the Perl and REST APIs and from the public Ensembl MySQL database server at ensembldb.ensembl.org. Database URL: http://www.ensembl.org PMID:26888907
Hefron, Ryan; Borghetti, Brett; Schubert Kabban, Christine; Christensen, James; Estepp, Justin
2018-04-26
Applying deep learning methods to electroencephalograph (EEG) data for cognitive state assessment has yielded improvements over previous modeling methods. However, research focused on cross-participant cognitive workload modeling using these techniques is underrepresented. We study the problem of cross-participant state estimation in a non-stimulus-locked task environment, where a trained model is used to make workload estimates on a new participant who is not represented in the training set. Using experimental data from the Multi-Attribute Task Battery (MATB) environment, a variety of deep neural network models are evaluated in the trade-space of computational efficiency, model accuracy, variance and temporal specificity yielding three important contributions: (1) The performance of ensembles of individually-trained models is statistically indistinguishable from group-trained methods at most sequence lengths. These ensembles can be trained for a fraction of the computational cost compared to group-trained methods and enable simpler model updates. (2) While increasing temporal sequence length improves mean accuracy, it is not sufficient to overcome distributional dissimilarities between individuals’ EEG data, as it results in statistically significant increases in cross-participant variance. (3) Compared to all other networks evaluated, a novel convolutional-recurrent model using multi-path subnetworks and bi-directional, residual recurrent layers resulted in statistically significant increases in predictive accuracy and decreases in cross-participant variance.
Hefron, Ryan; Borghetti, Brett; Schubert Kabban, Christine; Christensen, James; Estepp, Justin
2018-01-01
Applying deep learning methods to electroencephalograph (EEG) data for cognitive state assessment has yielded improvements over previous modeling methods. However, research focused on cross-participant cognitive workload modeling using these techniques is underrepresented. We study the problem of cross-participant state estimation in a non-stimulus-locked task environment, where a trained model is used to make workload estimates on a new participant who is not represented in the training set. Using experimental data from the Multi-Attribute Task Battery (MATB) environment, a variety of deep neural network models are evaluated in the trade-space of computational efficiency, model accuracy, variance and temporal specificity yielding three important contributions: (1) The performance of ensembles of individually-trained models is statistically indistinguishable from group-trained methods at most sequence lengths. These ensembles can be trained for a fraction of the computational cost compared to group-trained methods and enable simpler model updates. (2) While increasing temporal sequence length improves mean accuracy, it is not sufficient to overcome distributional dissimilarities between individuals’ EEG data, as it results in statistically significant increases in cross-participant variance. (3) Compared to all other networks evaluated, a novel convolutional-recurrent model using multi-path subnetworks and bi-directional, residual recurrent layers resulted in statistically significant increases in predictive accuracy and decreases in cross-participant variance. PMID:29701668
Genetic programming based ensemble system for microarray data classification.
Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To
2015-01-01
Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved.
Genetic Programming Based Ensemble System for Microarray Data Classification
Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To
2015-01-01
Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved. PMID:25810748
NASA Astrophysics Data System (ADS)
Saleh, F.; Ramaswamy, V.; Georgas, N.; Blumberg, A. F.; Wang, Y.
2016-12-01
Advances in computational resources and modeling techniques are opening the path to effectively integrate existing complex models. In the context of flood prediction, recent extreme events have demonstrated the importance of integrating components of the hydrosystem to better represent the interactions amongst different physical processes and phenomena. As such, there is a pressing need to develop holistic and cross-disciplinary modeling frameworks that effectively integrate existing models and better represent the operative dynamics. This work presents a novel Hydrologic-Hydraulic-Hydrodynamic Ensemble (H3E) flood prediction framework that operationally integrates existing predictive models representing coastal (New York Harbor Observing and Prediction System, NYHOPS), hydrologic (US Army Corps of Engineers Hydrologic Modeling System, HEC-HMS) and hydraulic (2-dimensional River Analysis System, HEC-RAS) components. The state-of-the-art framework is forced with 125 ensemble meteorological inputs from numerical weather prediction models including the Global Ensemble Forecast System, the European Centre for Medium-Range Weather Forecasts (ECMWF), the Canadian Meteorological Centre (CMC), the Short Range Ensemble Forecast (SREF) and the North American Mesoscale Forecast System (NAM). The framework produces, within a 96-hour forecast horizon, on-the-fly Google Earth flood maps that provide critical information for decision makers and emergency preparedness managers. The utility of the framework was demonstrated by retrospectively forecasting an extreme flood event, hurricane Sandy in the Passaic and Hackensack watersheds (New Jersey, USA). Hurricane Sandy caused significant damage to a number of critical facilities in this area including the New Jersey Transit's main storage and maintenance facility. The results of this work demonstrate that ensemble based frameworks provide improved flood predictions and useful information about associated uncertainties, thus improving the assessment of risks as when compared to a deterministic forecast. The work offers perspectives for short-term flood forecasts, flood mitigation strategies and best management practices for climate change scenarios.
Quantum storage of orbital angular momentum entanglement in an atomic ensemble.
Ding, Dong-Sheng; Zhang, Wei; Zhou, Zhi-Yuan; Shi, Shuai; Xiang, Guo-Yong; Wang, Xi-Shi; Jiang, Yun-Kun; Shi, Bao-Sen; Guo, Guang-Can
2015-02-06
Constructing a quantum memory for a photonic entanglement is vital for realizing quantum communication and network. Because of the inherent infinite dimension of orbital angular momentum (OAM), the photon's OAM has the potential for encoding a photon in a high-dimensional space, enabling the realization of high channel capacity communication. Photons entangled in orthogonal polarizations or optical paths had been stored in a different system, but there have been no reports on the storage of a photon pair entangled in OAM space. Here, we report the first experimental realization of storing an entangled OAM state through the Raman protocol in a cold atomic ensemble. We reconstruct the density matrix of an OAM entangled state with a fidelity of 90.3%±0.8% and obtain the Clauser-Horne-Shimony-Holt inequality parameter S of 2.41±0.06 after a programed storage time. All results clearly show the preservation of entanglement during the storage.
Saglam, Ali S; Chong, Lillian T
2016-01-14
An essential baseline for determining the extent to which electrostatic interactions enhance the kinetics of protein-protein association is the "basal" kon, which is the rate constant for association in the absence of electrostatic interactions. However, since such association events are beyond the milliseconds time scale, it has not been practical to compute the basal kon by directly simulating the association with flexible models. Here, we computed the basal kon for barnase and barstar, two of the most rapidly associating proteins, using highly efficient, flexible molecular simulations. These simulations involved (a) pseudoatomic protein models that reproduce the molecular shapes, electrostatic, and diffusion properties of all-atom models, and (b) application of the weighted ensemble path sampling strategy, which enhanced the efficiency of generating association events by >130-fold. We also examined the extent to which the computed basal kon is affected by inclusion of intermolecular hydrodynamic interactions in the simulations.
Wu, Xiongwu; Damjanovic, Ana; Brooks, Bernard R.
2013-01-01
This review provides a comprehensive description of the self-guided Langevin dynamics (SGLD) and the self-guided molecular dynamics (SGMD) methods and their applications. Example systems are included to provide guidance on optimal application of these methods in simulation studies. SGMD/SGLD has enhanced ability to overcome energy barriers and accelerate rare events to affordable time scales. It has been demonstrated that with moderate parameters, SGLD can routinely cross energy barriers of 20 kT at a rate that molecular dynamics (MD) or Langevin dynamics (LD) crosses 10 kT barriers. The core of these methods is the use of local averages of forces and momenta in a direct manner that can preserve the canonical ensemble. The use of such local averages results in methods where low frequency motion “borrows” energy from high frequency degrees of freedom when a barrier is approached and then returns that excess energy after a barrier is crossed. This self-guiding effect also results in an accelerated diffusion to enhance conformational sampling efficiency. The resulting ensemble with SGLD deviates in a small way from the canonical ensemble, and that deviation can be corrected with either an on-the-fly or a post processing reweighting procedure that provides an excellent canonical ensemble for systems with a limited number of accelerated degrees of freedom. Since reweighting procedures are generally not size extensive, a newer method, SGLDfp, uses local averages of both momenta and forces to preserve the ensemble without reweighting. The SGLDfp approach is size extensive and can be used to accelerate low frequency motion in large systems, or in systems with explicit solvent where solvent diffusion is also to be enhanced. Since these methods are direct and straightforward, they can be used in conjunction with many other sampling methods or free energy methods by simply replacing the integration of degrees of freedom that are normally sampled by MD or LD. PMID:23913991
Enzymatic reaction paths as determined by transition path sampling
NASA Astrophysics Data System (ADS)
Masterson, Jean Emily
Enzymes are biological catalysts capable of enhancing the rates of chemical reactions by many orders of magnitude as compared to solution chemistry. Since the catalytic power of enzymes routinely exceeds that of the best artificial catalysts available, there is much interest in understanding the complete nature of chemical barrier crossing in enzymatic reactions. Two specific questions pertaining to the source of enzymatic rate enhancements are investigated in this work. The first is the issue of how fast protein motions of an enzyme contribute to chemical barrier crossing. Our group has previously identified sub-picosecond protein motions, termed promoting vibrations (PVs), that dynamically modulate chemical transformation in several enzymes. In the case of human heart lactate dehydrogenase (hhLDH), prior studies have shown that a specific axis of residues undergoes a compressional fluctuation towards the active site, decreasing a hydride and a proton donor--acceptor distance on a sub-picosecond timescale to promote particle transfer. To more thoroughly understand the contribution of this dynamic motion to the enzymatic reaction coordinate of hhLDH, we conducted transition path sampling (TPS) using four versions of the enzymatic system: a wild type enzyme with natural isotopic abundance; a heavy enzyme where all the carbons, nitrogens, and non-exchangeable hydrogens were replaced with heavy isotopes; and two versions of the enzyme with mutations in the axis of PV residues. We generated four separate ensembles of reaction paths and analyzed each in terms of the reaction mechanism, time of barrier crossing, dynamics of the PV, and residues involved in the enzymatic reaction coordinate. We found that heavy isotopic substitution of hhLDH altered the sub-picosecond dynamics of the PV, changed the favored reaction mechanism, dramatically increased the time of barrier crossing, but did not have an effect on the specific residues involved in the PV. In the mutant systems, we observed changes in the reaction mechanism and altered contributions of the mutated residues to the enzymatic reaction coordinate, but we did not detect a substantial change in the time of barrier crossing. These results confirm the importance of maintaining the dynamics and structural scaffolding of the hhLDH PV in order to facilitate facile barrier passage. We also utilized TPS to investigate the possible role of fast protein dynamics in the enzymatic reaction coordinate of human dihydrofolate reductase (hsDHFR). We found that sub-picosecond dynamics of hsDHFR do contribute to the reaction coordinate, whereas this is not the case in the E. coli version of the enzyme. This result indicates a shift in the DHFR family to a more dynamic version of catalysis. The second inquiry we addressed in this thesis regarding enzymatic barrier passage concerns the variability of paths through reactive phase space for a given enzymatic reaction. We further investigated the hhLDH-catalyzed reaction using a high-perturbation TPS algorithm. Though we saw that alternate reaction paths were possible, the dominant reaction path we observed corresponded to that previously elucidated in prior hhLDH TPS studies. Since the additional reaction paths we observed were likely high-energy, these results indicate that only the dominant reaction path contributes significantly to the overall reaction rate. In conclusion, we show that the enzymes hhLDH and hsDHFR exhibit paths through reactive phase space where fast protein motions are involved in the enzymatic reaction coordinate and exhibit a non-negligible contribution to chemical barrier crossing.
Polarized ensembles of random pure states
NASA Astrophysics Data System (ADS)
Deelan Cunden, Fabio; Facchi, Paolo; Florio, Giuseppe
2013-08-01
A new family of polarized ensembles of random pure states is presented. These ensembles are obtained by linear superposition of two random pure states with suitable distributions, and are quite manageable. We will use the obtained results for two purposes: on the one hand we will be able to derive an efficient strategy for sampling states from isopurity manifolds. On the other, we will characterize the deviation of a pure quantum state from separability under the influence of noise.
Landsgesell, Jonas; Holm, Christian; Smiatek, Jens
2017-02-14
We present a novel method for the study of weak polyelectrolytes and general acid-base reactions in molecular dynamics and Monte Carlo simulations. The approach combines the advantages of the reaction ensemble and the Wang-Landau sampling method. Deprotonation and protonation reactions are simulated explicitly with the help of the reaction ensemble method, while the accurate sampling of the corresponding phase space is achieved by the Wang-Landau approach. The combination of both techniques provides a sufficient statistical accuracy such that meaningful estimates for the density of states and the partition sum can be obtained. With regard to these estimates, several thermodynamic observables like the heat capacity or reaction free energies can be calculated. We demonstrate that the computation times for the calculation of titration curves with a high statistical accuracy can be significantly decreased when compared to the original reaction ensemble method. The applicability of our approach is validated by the study of weak polyelectrolytes and their thermodynamic properties.
Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo
2016-01-01
Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble’s output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) − k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer’s disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases. PMID:26764911
Equilibrium energy spectrum of point vortex motion with remarks on ensemble choice and ergodicity
NASA Astrophysics Data System (ADS)
Esler, J. G.
2017-01-01
The dynamics and statistical mechanics of N chaotically evolving point vortices in the doubly periodic domain are revisited. The selection of the correct microcanonical ensemble for the system is first investigated. The numerical results of Weiss and McWilliams [Phys. Fluids A 3, 835 (1991), 10.1063/1.858014], who argued that the point vortex system with N =6 is nonergodic because of an apparent discrepancy between ensemble averages and dynamical time averages, are shown to be due to an incorrect ensemble definition. When the correct microcanonical ensemble is sampled, accounting for the vortex momentum constraint, time averages obtained from direct numerical simulation agree with ensemble averages within the sampling error of each calculation, i.e., there is no numerical evidence for nonergodicity. Further, in the N →∞ limit it is shown that the vortex momentum no longer constrains the long-time dynamics and therefore that the correct microcanonical ensemble for statistical mechanics is that associated with the entire constant energy hypersurface in phase space. Next, a recently developed technique is used to generate an explicit formula for the density of states function for the system, including for arbitrary distributions of vortex circulations. Exact formulas for the equilibrium energy spectrum, and for the probability density function of the energy in each Fourier mode, are then obtained. Results are compared with a series of direct numerical simulations with N =50 and excellent agreement is found, confirming the relevance of the results for interpretation of quantum and classical two-dimensional turbulence.
Shallow cumuli ensemble statistics for development of a stochastic parameterization
NASA Astrophysics Data System (ADS)
Sakradzija, Mirjana; Seifert, Axel; Heus, Thijs
2014-05-01
According to a conventional deterministic approach to the parameterization of moist convection in numerical atmospheric models, a given large scale forcing produces an unique response from the unresolved convective processes. This representation leaves out the small-scale variability of convection, as it is known from the empirical studies of deep and shallow convective cloud ensembles, there is a whole distribution of sub-grid states corresponding to the given large scale forcing. Moreover, this distribution gets broader with the increasing model resolution. This behavior is also consistent with our theoretical understanding of a coarse-grained nonlinear system. We propose an approach to represent the variability of the unresolved shallow-convective states, including the dependence of the sub-grid states distribution spread and shape on the model horizontal resolution. Starting from the Gibbs canonical ensemble theory, Craig and Cohen (2006) developed a theory for the fluctuations in a deep convective ensemble. The micro-states of a deep convective cloud ensemble are characterized by the cloud-base mass flux, which, according to the theory, is exponentially distributed (Boltzmann distribution). Following their work, we study the shallow cumulus ensemble statistics and the distribution of the cloud-base mass flux. We employ a Large-Eddy Simulation model (LES) and a cloud tracking algorithm, followed by a conditional sampling of clouds at the cloud base level, to retrieve the information about the individual cloud life cycles and the cloud ensemble as a whole. In the case of shallow cumulus cloud ensemble, the distribution of micro-states is a generalized exponential distribution. Based on the empirical and theoretical findings, a stochastic model has been developed to simulate the shallow convective cloud ensemble and to test the convective ensemble theory. Stochastic model simulates a compound random process, with the number of convective elements drawn from a Poisson distribution, and cloud properties sub-sampled from a generalized ensemble distribution. We study the role of the different cloud subtypes in a shallow convective ensemble and how the diverse cloud properties and cloud lifetimes affect the system macro-state. To what extent does the cloud-base mass flux distribution deviate from the simple Boltzmann distribution and how does it affect the results from the stochastic model? Is the memory, provided by the finite lifetime of individual clouds, of importance for the ensemble statistics? We also test for the minimal information given as an input to the stochastic model, able to reproduce the ensemble mean statistics and the variability in a convective ensemble. An important property of the resulting distribution of the sub-grid convective states is its scale-adaptivity - the smaller the grid-size, the broader the compound distribution of the sub-grid states.
Monte Carlo replica-exchange based ensemble docking of protein conformations.
Zhang, Zhe; Ehmann, Uwe; Zacharias, Martin
2017-05-01
A replica-exchange Monte Carlo (REMC) ensemble docking approach has been developed that allows efficient exploration of protein-protein docking geometries. In addition to Monte Carlo steps in translation and orientation of binding partners, possible conformational changes upon binding are included based on Monte Carlo selection of protein conformations stored as ordered pregenerated conformational ensembles. The conformational ensembles of each binding partner protein were generated by three different approaches starting from the unbound partner protein structure with a range spanning a root mean square deviation of 1-2.5 Å with respect to the unbound structure. Because MC sampling is performed to select appropriate partner conformations on the fly the approach is not limited by the number of conformations in the ensemble compared to ensemble docking of each conformer pair in ensemble cross docking. Although only a fraction of generated conformers was in closer agreement with the bound structure the REMC ensemble docking approach achieved improved docking results compared to REMC docking with only the unbound partner structures or using docking energy minimization methods. The approach has significant potential for further improvement in combination with more realistic structural ensembles and better docking scoring functions. Proteins 2017; 85:924-937. © 2016 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Modality-Driven Classification and Visualization of Ensemble Variance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bensema, Kevin; Gosink, Luke; Obermaier, Harald
Advances in computational power now enable domain scientists to address conceptual and parametric uncertainty by running simulations multiple times in order to sufficiently sample the uncertain input space. While this approach helps address conceptual and parametric uncertainties, the ensemble datasets produced by this technique present a special challenge to visualization researchers as the ensemble dataset records a distribution of possible values for each location in the domain. Contemporary visualization approaches that rely solely on summary statistics (e.g., mean and variance) cannot convey the detailed information encoded in ensemble distributions that are paramount to ensemble analysis; summary statistics provide no informationmore » about modality classification and modality persistence. To address this problem, we propose a novel technique that classifies high-variance locations based on the modality of the distribution of ensemble predictions. Additionally, we develop a set of confidence metrics to inform the end-user of the quality of fit between the distribution at a given location and its assigned class. We apply a similar method to time-varying ensembles to illustrate the relationship between peak variance and bimodal or multimodal behavior. These classification schemes enable a deeper understanding of the behavior of the ensemble members by distinguishing between distributions that can be described by a single tendency and distributions which reflect divergent trends in the ensemble.« less
Liquid Water from First Principles: Validation of Different Sampling Approaches
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mundy, C J; Kuo, W; Siepmann, J
2004-05-20
A series of first principles molecular dynamics and Monte Carlo simulations were carried out for liquid water to assess the validity and reproducibility of different sampling approaches. These simulations include Car-Parrinello molecular dynamics simulations using the program CPMD with different values of the fictitious electron mass in the microcanonical and canonical ensembles, Born-Oppenheimer molecular dynamics using the programs CPMD and CP2K in the microcanonical ensemble, and Metropolis Monte Carlo using CP2K in the canonical ensemble. With the exception of one simulation for 128 water molecules, all other simulations were carried out for systems consisting of 64 molecules. It is foundmore » that the structural and thermodynamic properties of these simulations are in excellent agreement with each other as long as adiabatic sampling is maintained in the Car-Parrinello molecular dynamics simulations either by choosing a sufficiently small fictitious mass in the microcanonical ensemble or by Nos{acute e}-Hoover thermostats in the canonical ensemble. Using the Becke-Lee-Yang-Parr exchange and correlation energy functionals and norm-conserving Troullier-Martins or Goedecker-Teter-Hutter pseudopotentials, simulations at a fixed density of 1.0 g/cm{sup 3} and a temperature close to 315 K yield a height of the first peak in the oxygen-oxygen radial distribution function of about 3.0, a classical constant-volume heat capacity of about 70 J K{sup -1} mol{sup -1}, and a self-diffusion constant of about 0.1 Angstroms{sup 2}/ps.« less
Self-averaging and weak ergodicity breaking of diffusion in heterogeneous media
NASA Astrophysics Data System (ADS)
Russian, Anna; Dentz, Marco; Gouze, Philippe
2017-08-01
Diffusion in natural and engineered media is quantified in terms of stochastic models for the heterogeneity-induced fluctuations of particle motion. However, fundamental properties such as ergodicity and self-averaging and their dependence on the disorder distribution are often not known. Here, we investigate these questions for diffusion in quenched disordered media characterized by spatially varying retardation properties, which account for particle retention due to physical or chemical interactions with the medium. We link self-averaging and ergodicity to the disorder sampling efficiency Rn, which quantifies the number of disorder realizations a noise ensemble may sample in a single disorder realization. Diffusion for disorder scenarios characterized by a finite mean transition time is ergodic and self-averaging for any dimension. The strength of the sample to sample fluctuations decreases with increasing spatial dimension. For an infinite mean transition time, particle motion is weakly ergodicity breaking in any dimension because single particles cannot sample the heterogeneity spectrum in finite time. However, even though the noise ensemble is not representative of the single-particle time statistics, subdiffusive motion in q ≥2 dimensions is self-averaging, which means that the noise ensemble in a single realization samples a representative part of the heterogeneity spectrum.
Locci, Antonio Mario; Cincotti, Alberto; Todde, Sara; Orrù, Roberto; Cao, Giacomo
2010-01-01
A novel methodology is proposed for investigating the effect of the pulsed electric current during the spark plasma sintering (SPS) of electrically conductive powders without potential misinterpretation of experimental results. First, ensemble configurations (geometry, size and material of the powder sample, die, plunger and spacers) are identified where the electric current is forced to flow only through either the sample or the die, so that the sample is heated either through the Joule effect or by thermal conduction, respectively. These ensemble configurations are selected using a recently proposed mathematical model of an SPS apparatus, which, once suitably modified, makes it possible to carry out detailed electrical and thermal analysis. Next, SPS experiments are conducted using the ensemble configurations theoretically identified. Using aluminum powders as a case study, we find that the temporal profiles of sample shrinkage, which indicate densification behavior, as well as the final density of the sample are clearly different when the electric current flows only through the sample or through the die containing it, whereas the temperature cycle and mechanical load are the same in both cases. PMID:27877354
Frictional behaviour of sandstone: A sample-size dependent triaxial investigation
NASA Astrophysics Data System (ADS)
Roshan, Hamid; Masoumi, Hossein; Regenauer-Lieb, Klaus
2017-01-01
Frictional behaviour of rocks from the initial stage of loading to final shear displacement along the formed shear plane has been widely investigated in the past. However the effect of sample size on such frictional behaviour has not attracted much attention. This is mainly related to the limitations in rock testing facilities as well as the complex mechanisms involved in sample-size dependent frictional behaviour of rocks. In this study, a suite of advanced triaxial experiments was performed on Gosford sandstone samples at different sizes and confining pressures. The post-peak response of the rock along the formed shear plane has been captured for the analysis with particular interest in sample-size dependency. Several important phenomena have been observed from the results of this study: a) the rate of transition from brittleness to ductility in rock is sample-size dependent where the relatively smaller samples showed faster transition toward ductility at any confining pressure; b) the sample size influences the angle of formed shear band and c) the friction coefficient of the formed shear plane is sample-size dependent where the relatively smaller sample exhibits lower friction coefficient compared to larger samples. We interpret our results in terms of a thermodynamics approach in which the frictional properties for finite deformation are viewed as encompassing a multitude of ephemeral slipping surfaces prior to the formation of the through going fracture. The final fracture itself is seen as a result of the self-organisation of a sufficiently large ensemble of micro-slip surfaces and therefore consistent in terms of the theory of thermodynamics. This assumption vindicates the use of classical rock mechanics experiments to constrain failure of pressure sensitive rocks and the future imaging of these micro-slips opens an exciting path for research in rock failure mechanisms.
SAChES: Scalable Adaptive Chain-Ensemble Sampling.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Swiler, Laura Painton; Ray, Jaideep; Ebeida, Mohamed Salah
We present the development of a parallel Markov Chain Monte Carlo (MCMC) method called SAChES, Scalable Adaptive Chain-Ensemble Sampling. This capability is targed to Bayesian calibration of com- putationally expensive simulation models. SAChES involves a hybrid of two methods: Differential Evo- lution Monte Carlo followed by Adaptive Metropolis. Both methods involve parallel chains. Differential evolution allows one to explore high-dimensional parameter spaces using loosely coupled (i.e., largely asynchronous) chains. Loose coupling allows the use of large chain ensembles, with far more chains than the number of parameters to explore. This reduces per-chain sampling burden, enables high-dimensional inversions and the usemore » of computationally expensive forward models. The large number of chains can also ameliorate the impact of silent-errors, which may affect only a few chains. The chain ensemble can also be sampled to provide an initial condition when an aberrant chain is re-spawned. Adaptive Metropolis takes the best points from the differential evolution and efficiently hones in on the poste- rior density. The multitude of chains in SAChES is leveraged to (1) enable efficient exploration of the parameter space; and (2) ensure robustness to silent errors which may be unavoidable in extreme-scale computational platforms of the future. This report outlines SAChES, describes four papers that are the result of the project, and discusses some additional results.« less
Exploring diversity in ensemble classification: Applications in large area land cover mapping
NASA Astrophysics Data System (ADS)
Mellor, Andrew; Boukir, Samia
2017-07-01
Ensemble classifiers, such as random forests, are now commonly applied in the field of remote sensing, and have been shown to perform better than single classifier systems, resulting in reduced generalisation error. Diversity across the members of ensemble classifiers is known to have a strong influence on classification performance - whereby classifier errors are uncorrelated and more uniformly distributed across ensemble members. The relationship between ensemble diversity and classification performance has not yet been fully explored in the fields of information science and machine learning and has never been examined in the field of remote sensing. This study is a novel exploration of ensemble diversity and its link to classification performance, applied to a multi-class canopy cover classification problem using random forests and multisource remote sensing and ancillary GIS data, across seven million hectares of diverse dry-sclerophyll dominated public forests in Victoria Australia. A particular emphasis is placed on analysing the relationship between ensemble diversity and ensemble margin - two key concepts in ensemble learning. The main novelty of our work is on boosting diversity by emphasizing the contribution of lower margin instances used in the learning process. Exploring the influence of tree pruning on diversity is also a new empirical analysis that contributes to a better understanding of ensemble performance. Results reveal insights into the trade-off between ensemble classification accuracy and diversity, and through the ensemble margin, demonstrate how inducing diversity by targeting lower margin training samples is a means of achieving better classifier performance for more difficult or rarer classes and reducing information redundancy in classification problems. Our findings inform strategies for collecting training data and designing and parameterising ensemble classifiers, such as random forests. This is particularly important in large area remote sensing applications, for which training data is costly and resource intensive to collect.
Classroom Environment as Related to Contest Ratings among High School Performing Ensembles.
ERIC Educational Resources Information Center
Hamann, Donald L.; And Others
1990-01-01
Examines influence of classroom environments, measured by the Classroom Environment Scale, Form R (CESR), on vocal and instrumental ensembles' musical achievement at festival contests. Using random sample, reveals subjects with higher scores on CESR scales of involvement, affiliation, teacher support, and organization received better contest…
Tiered Evaluation in Large Ensemble Settings.
ERIC Educational Resources Information Center
Scott, David
1998-01-01
Discusses the use of a tiered evaluation system (TES) that allows students to work at different levels, enables teachers to assess progress objectively, and presents students with appropriate challenges in the music ensembles. Focuses on how TES works and its advantages, considers the challenges and flexibility of TES, and provides samples. (CMK)
NASA Astrophysics Data System (ADS)
Annan, James; Hargreaves, Julia
2016-04-01
In order to perform any Bayesian processing of a model ensemble, we need a prior over the ensemble members. In the case of multimodel ensembles such as CMIP, the historical approach of ``model democracy'' (i.e. equal weight for all models in the sample) is no longer credible (if it ever was) due to model duplication and inbreeding. The question of ``model independence'' is central to the question of prior weights. However, although this question has been repeatedly raised, it has not yet been satisfactorily addressed. Here I will discuss the issue of independence and present a theoretical foundation for understanding and analysing the ensemble in this context. I will also present some simple examples showing how these ideas may be applied and developed.
Chodera, John D; Shirts, Michael R
2011-11-21
The widespread popularity of replica exchange and expanded ensemble algorithms for simulating complex molecular systems in chemistry and biophysics has generated much interest in discovering new ways to enhance the phase space mixing of these protocols in order to improve sampling of uncorrelated configurations. Here, we demonstrate how both of these classes of algorithms can be considered as special cases of Gibbs sampling within a Markov chain Monte Carlo framework. Gibbs sampling is a well-studied scheme in the field of statistical inference in which different random variables are alternately updated from conditional distributions. While the update of the conformational degrees of freedom by Metropolis Monte Carlo or molecular dynamics unavoidably generates correlated samples, we show how judicious updating of the thermodynamic state indices--corresponding to thermodynamic parameters such as temperature or alchemical coupling variables--can substantially increase mixing while still sampling from the desired distributions. We show how state update methods in common use can lead to suboptimal mixing, and present some simple, inexpensive alternatives that can increase mixing of the overall Markov chain, reducing simulation times necessary to obtain estimates of the desired precision. These improved schemes are demonstrated for several common applications, including an alchemical expanded ensemble simulation, parallel tempering, and multidimensional replica exchange umbrella sampling.
Ovis: A Framework for Visual Analysis of Ocean Forecast Ensembles.
Höllt, Thomas; Magdy, Ahmed; Zhan, Peng; Chen, Guoning; Gopalakrishnan, Ganesh; Hoteit, Ibrahim; Hansen, Charles D; Hadwiger, Markus
2014-08-01
We present a novel integrated visualization system that enables interactive visual analysis of ensemble simulations of the sea surface height that is used in ocean forecasting. The position of eddies can be derived directly from the sea surface height and our visualization approach enables their interactive exploration and analysis.The behavior of eddies is important in different application settings of which we present two in this paper. First, we show an application for interactive planning of placement as well as operation of off-shore structures using real-world ensemble simulation data of the Gulf of Mexico. Off-shore structures, such as those used for oil exploration, are vulnerable to hazards caused by eddies, and the oil and gas industry relies on ocean forecasts for efficient operations. We enable analysis of the spatial domain, as well as the temporal evolution, for planning the placement and operation of structures.Eddies are also important for marine life. They transport water over large distances and with it also heat and other physical properties as well as biological organisms. In the second application we present the usefulness of our tool, which could be used for planning the paths of autonomous underwater vehicles, so called gliders, for marine scientists to study simulation data of the largely unexplored Red Sea.
Nanoscale Electronic Conditioning for Improvement of Nanowire Light-Emitting-Diode Efficiency.
May, Brelon J; Belz, Matthew R; Ahamed, Arshad; Sarwar, A T M G; Selcu, Camelia M; Myers, Roberto C
2018-04-24
Commercial III-Nitride LEDs and lasers spanning visible and ultraviolet wavelengths are based on epitaxial films. Alternatively, nanowire-based III-Nitride optoelectronics offer the advantage of strain compliance and high crystalline quality growth on a variety of inexpensive substrates. However, nanowire LEDs exhibit an inherent property distribution, resulting in uneven current spreading through macroscopic devices that consist of millions of individual nanowire diodes connected in parallel. Despite being electrically connected, only a small fraction of nanowires, sometimes <1%, contribute to the electroluminescence (EL). Here, we show that a population of electrical shorts exists in the devices, consisting of a subset of low-resistance nanowires that pass a large portion of the total current in the ensemble devices. Burn-in electronic conditioning is performed by applying a short-term overload voltage; the nanoshorts experience very high current density, sufficient to render them open circuits, thereby forcing a new current path through more nanowire LEDs in an ensemble device. Current-voltage measurements of individual nanowires are acquired using conductive atomic force microscopy to observe the removal of nanoshorts using burn-in. In macroscopic devices, this results in a 33× increase in peak EL and reduced leakage current. Burn-in conditioning of nanowire ensembles therefore provides a straightforward method to mitigate nonuniformities inherent to nanowire devices.
NASA Astrophysics Data System (ADS)
Vogel, Thomas; Perez, Danny; Junghans, Christoph
2014-03-01
We show direct formal relationships between the Wang-Landau iteration [PRL 86, 2050 (2001)], metadynamics [PNAS 99, 12562 (2002)] and statistical temperature molecular dynamics [PRL 97, 050601 (2006)], the major Monte Carlo and molecular dynamics work horses for sampling from a generalized, multicanonical ensemble. We aim at helping to consolidate the developments in the different areas by indicating how methodological advancements can be transferred in a straightforward way, avoiding the parallel, largely independent, developments tracks observed in the past.
Kinetics and reaction coordinates of the reassembly of protein fragments via forward flux sampling.
Borrero, Ernesto E; Contreras Martínez, Lydia M; DeLisa, Matthew P; Escobedo, Fernando A
2010-05-19
We studied the mechanism of the reassembly and folding process of two fragments of a split lattice protein by using forward flux sampling (FFS). Our results confirmed previous thermodynamics and kinetics analyses that suggested that the disruption of the critical core (of an unsplit protein that folds by a nucleation mechanism) plays a key role in the reassembly mechanism of the split system. For several split systems derived from a parent 48-mer model, we estimated the reaction coordinates in terms of collective variables by using the FFS least-square estimation method and found that the reassembly transition is best described by a combination of the total number of native contacts, the number of interchain native contacts, and the total conformational energy of the split system. We also analyzed the transition path ensemble obtained from FFS simulations using the estimated reaction coordinates as order parameters to identify the microscopic features that differentiate the reassembly of the different split systems studied. We found that in the fastest folding split system, a balanced distribution of the original-core amino acids (of the unsplit system) between protein fragments propitiates interchain interactions at early stages of the folding process. Only this system exhibits a different reassembly mechanism from that of the unsplit protein, involving the formation of a different folding nucleus. In the slowest folding system, the concentration of the folding nucleus in one fragment causes its early prefolding, whereas the second fragment tends to remain as a detached random coil. We also show that the reassembly rate can be either increased or decreased by tuning interchain cooperativeness via the introduction of a single point mutation that either strengthens or weakens one of the native interchain contacts (prevalent in the transition state ensemble). Copyright (c) 2010 Biophysical Society. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Taniguchi, Kenji
2018-04-01
To investigate future variations in high-impact weather events, numerous samples are required. For the detailed assessment in a specific region, a high spatial resolution is also required. A simple ensemble simulation technique is proposed in this paper. In the proposed technique, new ensemble members were generated from one basic state vector and two perturbation vectors, which were obtained by lagged average forecasting simulations. Sensitivity experiments with different numbers of ensemble members, different simulation lengths, and different perturbation magnitudes were performed. Experimental application to a global warming study was also implemented for a typhoon event. Ensemble-mean results and ensemble spreads of total precipitation, atmospheric conditions showed similar characteristics across the sensitivity experiments. The frequencies of the maximum total and hourly precipitation also showed similar distributions. These results indicate the robustness of the proposed technique. On the other hand, considerable ensemble spread was found in each ensemble experiment. In addition, the results of the application to a global warming study showed possible variations in the future. These results indicate that the proposed technique is useful for investigating various meteorological phenomena and the impacts of global warming. The results of the ensemble simulations also enable the stochastic evaluation of differences in high-impact weather events. In addition, the impacts of a spectral nudging technique were also examined. The tracks of a typhoon were quite different between cases with and without spectral nudging; however, the ranges of the tracks among ensemble members were comparable. It indicates that spectral nudging does not necessarily suppress ensemble spread.
NASA Astrophysics Data System (ADS)
Fernández, J.; Frías, M. D.; Cabos, W. D.; Cofiño, A. S.; Domínguez, M.; Fita, L.; Gaertner, M. A.; García-Díez, M.; Gutiérrez, J. M.; Jiménez-Guerrero, P.; Liguori, G.; Montávez, J. P.; Romera, R.; Sánchez, E.
2018-03-01
We present an unprecedented ensemble of 196 future climate projections arising from different global and regional model intercomparison projects (MIPs): CMIP3, CMIP5, ENSEMBLES, ESCENA, EURO- and Med-CORDEX. This multi-MIP ensemble includes all regional climate model (RCM) projections publicly available to date, along with their driving global climate models (GCMs). We illustrate consistent and conflicting messages using continental Spain and the Balearic Islands as target region. The study considers near future (2021-2050) changes and their dependence on several uncertainty sources sampled in the multi-MIP ensemble: GCM, future scenario, internal variability, RCM, and spatial resolution. This initial work focuses on mean seasonal precipitation and temperature changes. The results show that the potential GCM-RCM combinations have been explored very unevenly, with favoured GCMs and large ensembles of a few RCMs that do not respond to any ensemble design. Therefore, the grand-ensemble is weighted towards a few models. The selection of a balanced, credible sub-ensemble is challenged in this study by illustrating several conflicting responses between the RCM and its driving GCM and among different RCMs. Sub-ensembles from different initiatives are dominated by different uncertainty sources, being the driving GCM the main contributor to uncertainty in the grand-ensemble. For this analysis of the near future changes, the emission scenario does not lead to a strong uncertainty. Despite the extra computational effort, for mean seasonal changes, the increase in resolution does not lead to important changes.
Calculating ensemble averaged descriptions of protein rigidity without sampling.
González, Luis C; Wang, Hui; Livesay, Dennis R; Jacobs, Donald J
2012-01-01
Previous works have demonstrated that protein rigidity is related to thermodynamic stability, especially under conditions that favor formation of native structure. Mechanical network rigidity properties of a single conformation are efficiently calculated using the integer body-bar Pebble Game (PG) algorithm. However, thermodynamic properties require averaging over many samples from the ensemble of accessible conformations to accurately account for fluctuations in network topology. We have developed a mean field Virtual Pebble Game (VPG) that represents the ensemble of networks by a single effective network. That is, all possible number of distance constraints (or bars) that can form between a pair of rigid bodies is replaced by the average number. The resulting effective network is viewed as having weighted edges, where the weight of an edge quantifies its capacity to absorb degrees of freedom. The VPG is interpreted as a flow problem on this effective network, which eliminates the need to sample. Across a nonredundant dataset of 272 protein structures, we apply the VPG to proteins for the first time. Our results show numerically and visually that the rigidity characterizations of the VPG accurately reflect the ensemble averaged [Formula: see text] properties. This result positions the VPG as an efficient alternative to understand the mechanical role that chemical interactions play in maintaining protein stability.
Enhanced conformational sampling to visualize a free-energy landscape of protein complex formation
Iida, Shinji; Nakamura, Haruki; Higo, Junichi
2016-01-01
We introduce various, recently developed, generalized ensemble methods, which are useful to sample various molecular configurations emerging in the process of protein–protein or protein–ligand binding. The methods introduced here are those that have been or will be applied to biomolecular binding, where the biomolecules are treated as flexible molecules expressed by an all-atom model in an explicit solvent. Sampling produces an ensemble of conformations (snapshots) that are thermodynamically probable at room temperature. Then, projection of those conformations to an abstract low-dimensional space generates a free-energy landscape. As an example, we show a landscape of homo-dimer formation of an endothelin-1-like molecule computed using a generalized ensemble method. The lowest free-energy cluster at room temperature coincided precisely with the experimentally determined complex structure. Two minor clusters were also found in the landscape, which were largely different from the native complex form. Although those clusters were isolated at room temperature, with rising temperature a pathway emerged linking the lowest and second-lowest free-energy clusters, and a further temperature increment connected all the clusters. This exemplifies that the generalized ensemble method is a powerful tool for computing the free-energy landscape, by which one can discuss the thermodynamic stability of clusters and the temperature dependence of the cluster networks. PMID:27288028
Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V
2012-01-01
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
NASA Astrophysics Data System (ADS)
Chardon, J.; Mathevet, T.; Le Lay, M.; Gailhard, J.
2012-04-01
In the context of a national energy company (EDF : Electricité de France), hydro-meteorological forecasts are necessary to ensure safety and security of installations, meet environmental standards and improve water ressources management and decision making. Hydrological ensemble forecasts allow a better representation of meteorological and hydrological forecasts uncertainties and improve human expertise of hydrological forecasts, which is essential to synthesize available informations, coming from different meteorological and hydrological models and human experience. An operational hydrological ensemble forecasting chain has been developed at EDF since 2008 and is being used since 2010 on more than 30 watersheds in France. This ensemble forecasting chain is characterized ensemble pre-processing (rainfall and temperature) and post-processing (streamflow), where a large human expertise is solicited. The aim of this paper is to compare 2 hydrological ensemble post-processing methods developed at EDF in order improve ensemble forecasts reliability (similar to Monatanari &Brath, 2004; Schaefli et al., 2007). The aim of the post-processing methods is to dress hydrological ensemble forecasts with hydrological model uncertainties, based on perfect forecasts. The first method (called empirical approach) is based on a statistical modelisation of empirical error of perfect forecasts, by streamflow sub-samples of quantile class and lead-time. The second method (called dynamical approach) is based on streamflow sub-samples of quantile class and streamflow variation, and lead-time. On a set of 20 watersheds used for operational forecasts, results show that both approaches are necessary to ensure a good post-processing of hydrological ensemble, allowing a good improvement of reliability, skill and sharpness of ensemble forecasts. The comparison of the empirical and dynamical approaches shows the limits of the empirical approach which is not able to take into account hydrological dynamic and processes, i. e. sample heterogeneity. For a same streamflow range corresponds different processes such as rising limbs or recession, where uncertainties are different. The dynamical approach improves reliability, skills and sharpness of forecasts and globally reduces confidence intervals width. When compared in details, the dynamical approach allows a noticeable reduction of confidence intervals during recessions where uncertainty is relatively lower and a slight increase of confidence intervals during rising limbs or snowmelt where uncertainty is greater. The dynamic approach, validated by forecaster's experience that considered the empirical approach not discriminative enough, improved forecaster's confidence and communication of uncertainties. Montanari, A. and Brath, A., (2004). A stochastic approach for assessing the uncertainty of rainfall-runoff simulations. Water Resources Research, 40, W01106, doi:10.1029/2003WR002540. Schaefli, B., Balin Talamba, D. and Musy, A., (2007). Quantifying hydrological modeling errors through a mixture of normal distributions. Journal of Hydrology, 332, 303-315.
The Influence of Internal Model Variability in GEOS-5 on Interhemispheric CO2 Exchange
NASA Technical Reports Server (NTRS)
Allen, Melissa; Erickson, David; Kendall, Wesley; Fu, Joshua; Ott, Leslie; Pawson, Steven
2012-01-01
An ensemble of eight atmospheric CO2 simulations was completed employing the National Aeronautics and Space Administration (NASA) Goddard Earth Observation System, Version 5 (GEOS-5) for the years 2000-2001, each with initial meteorological conditions corresponding to different days in January 2000 to examine internal model variability. Globally, the model runs show similar concentrations of CO2 for the two years, but in regions of high CO2 concentrations due to fossil fuel emissions, large differences among different model simulations appear. The phasing and amplitude of the CO2 cycle at Northern Hemisphere locations in all of the ensemble members is similar to that of surface observations. In several southern hemisphere locations, however, some of the GEOS-5 model CO2 cycles are out of phase by as much as four months, and large variations occur between the ensemble members. This result indicates that there is large sensitivity to transport in these regions. The differences vary by latitude-the most extreme differences in the Tropics and the least at the South Pole. Examples of these differences among the ensemble members with regard to CO2 uptake and respiration of the terrestrial biosphere and CO2 emissions due to fossil fuel emissions are shown at Cape Grim, Tasmania. Integration-based flow analysis of the atmospheric circulation in the model runs shows widely varying paths of flow into the Tasmania region among the models including sources from North America, South America, South Africa, South Asia and Indonesia. These results suggest that interhemispheric transport can be strongly influenced by internal model variability.
Statistical Methods in Ai: Rare Event Learning Using Associative Rules and Higher-Order Statistics
NASA Astrophysics Data System (ADS)
Iyer, V.; Shetty, S.; Iyengar, S. S.
2015-07-01
Rare event learning has not been actively researched since lately due to the unavailability of algorithms which deal with big samples. The research addresses spatio-temporal streams from multi-resolution sensors to find actionable items from a perspective of real-time algorithms. This computing framework is independent of the number of input samples, application domain, labelled or label-less streams. A sampling overlap algorithm such as Brooks-Iyengar is used for dealing with noisy sensor streams. We extend the existing noise pre-processing algorithms using Data-Cleaning trees. Pre-processing using ensemble of trees using bagging and multi-target regression showed robustness to random noise and missing data. As spatio-temporal streams are highly statistically correlated, we prove that a temporal window based sampling from sensor data streams converges after n samples using Hoeffding bounds. Which can be used for fast prediction of new samples in real-time. The Data-cleaning tree model uses a nonparametric node splitting technique, which can be learned in an iterative way which scales linearly in memory consumption for any size input stream. The improved task based ensemble extraction is compared with non-linear computation models using various SVM kernels for speed and accuracy. We show using empirical datasets the explicit rule learning computation is linear in time and is only dependent on the number of leafs present in the tree ensemble. The use of unpruned trees (t) in our proposed ensemble always yields minimum number (m) of leafs keeping pre-processing computation to n × t log m compared to N2 for Gram Matrix. We also show that the task based feature induction yields higher Qualify of Data (QoD) in the feature space compared to kernel methods using Gram Matrix.
Multiple-instance ensemble learning for hyperspectral images
NASA Astrophysics Data System (ADS)
Ergul, Ugur; Bilgin, Gokhan
2017-10-01
An ensemble framework for multiple-instance (MI) learning (MIL) is introduced for use in hyperspectral images (HSIs) by inspiring the bagging (bootstrap aggregation) method in ensemble learning. Ensemble-based bagging is performed by a small percentage of training samples, and MI bags are formed by a local windowing process with variable window sizes on selected instances. In addition to bootstrap aggregation, random subspace is another method used to diversify base classifiers. The proposed method is implemented using four MIL classification algorithms. The classifier model learning phase is carried out with MI bags, and the estimation phase is performed over single-test instances. In the experimental part of the study, two different HSIs that have ground-truth information are used, and comparative results are demonstrated with state-of-the-art classification methods. In general, the MI ensemble approach produces more compact results in terms of both diversity and error compared to equipollent non-MIL algorithms.
NASA Astrophysics Data System (ADS)
Fernández, J.; Primo, C.; Cofiño, A. S.; Gutiérrez, J. M.; Rodríguez, M. A.
2009-08-01
In a recent paper, Gutiérrez et al. (Nonlinear Process Geophys 15(1):109-114, 2008) introduced a new characterization of spatiotemporal error growth—the so called mean-variance logarithmic (MVL) diagram—and applied it to study ensemble prediction systems (EPS); in particular, they analyzed single-model ensembles obtained by perturbing the initial conditions. In the present work, the MVL diagram is applied to multi-model ensembles analyzing also the effect of model formulation differences. To this aim, the MVL diagram is systematically applied to the multi-model ensemble produced in the EU-funded DEMETER project. It is shown that the shared building blocks (atmospheric and ocean components) impose similar dynamics among different models and, thus, contribute to poorly sampling the model formulation uncertainty. This dynamical similarity should be taken into account, at least as a pre-screening process, before applying any objective weighting method.
Characterizing RNA ensembles from NMR data with kinematic models
Fonseca, Rasmus; Pachov, Dimitar V.; Bernauer, Julie; van den Bedem, Henry
2014-01-01
Functional mechanisms of biomolecules often manifest themselves precisely in transient conformational substates. Researchers have long sought to structurally characterize dynamic processes in non-coding RNA, combining experimental data with computer algorithms. However, adequate exploration of conformational space for these highly dynamic molecules, starting from static crystal structures, remains challenging. Here, we report a new conformational sampling procedure, KGSrna, which can efficiently probe the native ensemble of RNA molecules in solution. We found that KGSrna ensembles accurately represent the conformational landscapes of 3D RNA encoded by NMR proton chemical shifts. KGSrna resolves motionally averaged NMR data into structural contributions; when coupled with residual dipolar coupling data, a KGSrna ensemble revealed a previously uncharacterized transient excited state of the HIV-1 trans-activation response element stem–loop. Ensemble-based interpretations of averaged data can aid in formulating and testing dynamic, motion-based hypotheses of functional mechanisms in RNAs with broad implications for RNA engineering and therapeutic intervention. PMID:25114056
NASA Astrophysics Data System (ADS)
Booth, B. B. B.; Bernie, D.; McNeall, D.; Hawkins, E.; Caesar, J.; Boulton, C.; Friedlingstein, P.; Sexton, D.
2012-09-01
We compare future changes in global mean temperature in response to different future scenarios which, for the first time, arise from emission driven rather than concentration driven perturbed parameter ensemble of a Global Climate Model (GCM). These new GCM simulations sample uncertainties in atmospheric feedbacks, land carbon cycle, ocean physics and aerosol sulphur cycle processes. We find broader ranges of projected temperature responses arising when considering emission rather than concentration driven simulations (with 10-90 percentile ranges of 1.7 K for the aggressive mitigation scenario up to 3.9 K for the high end business as usual scenario). A small minority of simulations resulting from combinations of strong atmospheric feedbacks and carbon cycle responses show temperature increases in excess of 9 degrees (RCP8.5) and even under aggressive mitigation (RCP2.6) temperatures in excess of 4 K. While the simulations point to much larger temperature ranges for emission driven experiments, they do not change existing expectations (based on previous concentration driven experiments) on the timescale that different sources of uncertainty are important. The new simulations sample a range of future atmospheric concentrations for each emission scenario. Both in case of SRES A1B and the Representative Concentration Pathways (RCPs), the concentration pathways used to drive GCM ensembles lies towards the lower end of our simulated distribution. This design decision (a legecy of previous assessments) is likely to lead concentration driven experiments to under-sample strong feedback responses in concentration driven projections. Our ensemble of emission driven simulations span the global temperature response of other multi-model frameworks except at the low end, where combinations of low climate sensitivity and low carbon cycle feedbacks lead to responses outside our ensemble range. The ensemble simulates a number of high end responses which lie above the CMIP5 carbon cycle range. These high end simulations can be linked to sampling a number of stronger carbon cycle feedbacks and to sampling climate sensitivities above 4.5 K. This latter aspect highlights the priority in identifying real world climate sensitivity constraints which, if achieved, would lead to reductions on the uppper bound of projected global mean temperature change. The ensembles of simulations presented here provides a framework to explore relationships between present day observables and future changes while the large spread of future projected changes, highlights the ongoing need for such work.
Generalized ensemble method applied to study systems with strong first order transitions
Malolepsza, E.; Kim, J.; Keyes, T.
2015-09-28
At strong first-order phase transitions, the entropy versus energy or, at constant pressure, enthalpy, exhibits convex behavior, and the statistical temperature curve correspondingly exhibits an S-loop or back-bending. In the canonical and isothermal-isobaric ensembles, with temperature as the control variable, the probability density functions become bimodal with peaks localized outside of the S-loop region. Inside, states are unstable, and as a result simulation of equilibrium phase coexistence becomes impossible. To overcome this problem, a method was proposed by Kim, Keyes and Straub, where optimally designed generalized ensemble sampling was combined with replica exchange, and denoted generalized replica exchange method (gREM).more » This new technique uses parametrized effective sampling weights that lead to a unimodal energy distribution, transforming unstable states into stable ones. In the present study, the gREM, originally developed as a Monte Carlo algorithm, was implemented to work with molecular dynamics in an isobaric ensemble and coded into LAMMPS, a highly optimized open source molecular simulation package. Lastly, the method is illustrated in a study of the very strong solid/liquid transition in water.« less
Using Support Vector Machine Ensembles for Target Audience Classification on Twitter
Lo, Siaw Ling; Chiong, Raymond; Cornforth, David
2015-01-01
The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space. PMID:25874768
Using support vector machine ensembles for target audience classification on Twitter.
Lo, Siaw Ling; Chiong, Raymond; Cornforth, David
2015-01-01
The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space.
Generalized ensemble method applied to study systems with strong first order transitions
NASA Astrophysics Data System (ADS)
Małolepsza, E.; Kim, J.; Keyes, T.
2015-09-01
At strong first-order phase transitions, the entropy versus energy or, at constant pressure, enthalpy, exhibits convex behavior, and the statistical temperature curve correspondingly exhibits an S-loop or back-bending. In the canonical and isothermal-isobaric ensembles, with temperature as the control variable, the probability density functions become bimodal with peaks localized outside of the S-loop region. Inside, states are unstable, and as a result simulation of equilibrium phase coexistence becomes impossible. To overcome this problem, a method was proposed by Kim, Keyes and Straub [1], where optimally designed generalized ensemble sampling was combined with replica exchange, and denoted generalized replica exchange method (gREM). This new technique uses parametrized effective sampling weights that lead to a unimodal energy distribution, transforming unstable states into stable ones. In the present study, the gREM, originally developed as a Monte Carlo algorithm, was implemented to work with molecular dynamics in an isobaric ensemble and coded into LAMMPS, a highly optimized open source molecular simulation package. The method is illustrated in a study of the very strong solid/liquid transition in water.
Phipps, Eric T.; D'Elia, Marta; Edwards, Harold C.; ...
2017-04-18
In this study, quantifying simulation uncertainties is a critical component of rigorous predictive simulation. A key component of this is forward propagation of uncertainties in simulation input data to output quantities of interest. Typical approaches involve repeated sampling of the simulation over the uncertain input data, and can require numerous samples when accurately propagating uncertainties from large numbers of sources. Often simulation processes from sample to sample are similar and much of the data generated from each sample evaluation could be reused. We explore a new method for implementing sampling methods that simultaneously propagates groups of samples together in anmore » embedded fashion, which we call embedded ensemble propagation. We show how this approach takes advantage of properties of modern computer architectures to improve performance by enabling reuse between samples, reducing memory bandwidth requirements, improving memory access patterns, improving opportunities for fine-grained parallelization, and reducing communication costs. We describe a software technique for implementing embedded ensemble propagation based on the use of C++ templates and describe its integration with various scientific computing libraries within Trilinos. We demonstrate improved performance, portability and scalability for the approach applied to the simulation of partial differential equations on a variety of CPU, GPU, and accelerator architectures, including up to 131,072 cores on a Cray XK7 (Titan).« less
Nuclear Ensemble Approach with Importance Sampling.
Kossoski, Fábris; Barbatti, Mario
2018-06-12
We show that the importance sampling technique can effectively augment the range of problems where the nuclear ensemble approach can be applied. A sampling probability distribution function initially determines the collection of initial conditions for which calculations are performed, as usual. Then, results for a distinct target distribution are computed by introducing compensating importance sampling weights for each sampled point. This mapping between the two probability distributions can be performed whenever they are both explicitly constructed. Perhaps most notably, this procedure allows for the computation of temperature dependent observables. As a test case, we investigated the UV absorption spectra of phenol, which has been shown to have a marked temperature dependence. Application of the proposed technique to a range that covers 500 K provides results that converge to those obtained with conventional sampling. We further show that an overall improved rate of convergence is obtained when sampling is performed at intermediate temperatures. The comparison between calculated and the available measured cross sections is very satisfactory, as the main features of the spectra are correctly reproduced. As a second test case, one of Tully's classical models was revisited, and we show that the computation of dynamical observables also profits from the importance sampling technique. In summary, the strategy developed here can be employed to assess the role of temperature for any property calculated within the nuclear ensemble method, with the same computational cost as doing so for a single temperature.
Evolutionary Ensemble for In Silico Prediction of Ames Test Mutagenicity
NASA Astrophysics Data System (ADS)
Chen, Huanhuan; Yao, Xin
Driven by new regulations and animal welfare, the need to develop in silico models has increased recently as alternative approaches to safety assessment of chemicals without animal testing. This paper describes a novel machine learning ensemble approach to building an in silico model for the prediction of the Ames test mutagenicity, one of a battery of the most commonly used experimental in vitro and in vivo genotoxicity tests for safety evaluation of chemicals. Evolutionary random neural ensemble with negative correlation learning (ERNE) [1] was developed based on neural networks and evolutionary algorithms. ERNE combines the method of bootstrap sampling on training data with the method of random subspace feature selection to ensure diversity in creating individuals within an initial ensemble. Furthermore, while evolving individuals within the ensemble, it makes use of the negative correlation learning, enabling individual NNs to be trained as accurate as possible while still manage to maintain them as diverse as possible. Therefore, the resulting individuals in the final ensemble are capable of cooperating collectively to achieve better generalization of prediction. The empirical experiment suggest that ERNE is an effective ensemble approach for predicting the Ames test mutagenicity of chemicals.
Electrophoretic sample insertion. [device for uniformly distributing samples in flow path
NASA Technical Reports Server (NTRS)
Mccreight, L. R. (Inventor)
1974-01-01
Two conductive screens located in the flow path of an electrophoresis sample separation apparatus are charged electrically. The sample is introduced between the screens, and the charge is sufficient to disperse and hold the samples across the screens. When the charge is terminated, the samples are uniformly distributed in the flow path. Additionally, a first separation by charged properties has been accomplished.
NASA Astrophysics Data System (ADS)
Flores, A. N.; Entekhabi, D.; Bras, R. L.
2007-12-01
Soil hydraulic and thermal properties (SHTPs) affect both the rate of moisture redistribution in the soil column and the volumetric soil water capacity. Adequately constraining these properties through field and lab analysis to parameterize spatially-distributed hydrology models is often prohibitively expensive. Because SHTPs vary significantly at small spatial scales individual soil samples are also only reliably indicative of local conditions, and these properties remain a significant source of uncertainty in soil moisture and temperature estimation. In ensemble-based soil moisture data assimilation, uncertainty in the model-produced prior estimate due to associated uncertainty in SHTPs must be taken into account to avoid under-dispersive ensembles. To treat SHTP uncertainty for purposes of supplying inputs to a distributed watershed model we use the restricted pairing (RP) algorithm, an extension of Latin Hypercube (LH) sampling. The RP algorithm generates an arbitrary number of SHTP combinations by sampling the appropriate marginal distributions of the individual soil properties using the LH approach, while imposing a target rank correlation among the properties. A previously-published meta- database of 1309 soils representing 12 textural classes is used to fit appropriate marginal distributions to the properties and compute the target rank correlation structure, conditioned on soil texture. Given categorical soil textures, our implementation of the RP algorithm generates an arbitrarily-sized ensemble of realizations of the SHTPs required as input to the TIN-based Realtime Integrated Basin Simulator with vegetation dynamics (tRIBS+VEGGIE) distributed parameter ecohydrology model. Soil moisture ensembles simulated with RP- generated SHTPs exhibit less variance than ensembles simulated with SHTPs generated by a scheme that neglects correlation among properties. Neglecting correlation among SHTPs can lead to physically unrealistic combinations of parameters that exhibit implausible hydrologic behavior when input to the tRIBS+VEGGIE model.
Smith, Kyle K G; Poulsen, Jens Aage; Nyman, Gunnar; Rossky, Peter J
2015-06-28
We develop two classes of quasi-classical dynamics that are shown to conserve the initial quantum ensemble when used in combination with the Feynman-Kleinert approximation of the density operator. These dynamics are used to improve the Feynman-Kleinert implementation of the classical Wigner approximation for the evaluation of quantum time correlation functions known as Feynman-Kleinert linearized path-integral. As shown, both classes of dynamics are able to recover the exact classical and high temperature limits of the quantum time correlation function, while a subset is able to recover the exact harmonic limit. A comparison of the approximate quantum time correlation functions obtained from both classes of dynamics is made with the exact results for the challenging model problems of the quartic and double-well potentials. It is found that these dynamics provide a great improvement over the classical Wigner approximation, in which purely classical dynamics are used. In a special case, our first method becomes identical to centroid molecular dynamics.
Unimodular lattice triangulations as small-world and scale-free random graphs
NASA Astrophysics Data System (ADS)
Krüger, B.; Schmidt, E. M.; Mecke, K.
2015-02-01
Real-world networks, e.g., the social relations or world-wide-web graphs, exhibit both small-world and scale-free behaviour. We interpret lattice triangulations as planar graphs by identifying triangulation vertices with graph nodes and one-dimensional simplices with edges. Since these triangulations are ergodic with respect to a certain Pachner flip, applying different Monte Carlo simulations enables us to calculate average properties of random triangulations, as well as canonical ensemble averages, using an energy functional that is approximately the variance of the degree distribution. All considered triangulations have clustering coefficients comparable with real-world graphs; for the canonical ensemble there are inverse temperatures with small shortest path length independent of system size. Tuning the inverse temperature to a quasi-critical value leads to an indication of scale-free behaviour for degrees k≥slant 5. Using triangulations as a random graph model can improve the understanding of real-world networks, especially if the actual distance of the embedded nodes becomes important.
NASA Astrophysics Data System (ADS)
Wang, Yuanbing; Min, Jinzhong; Chen, Yaodeng; Huang, Xiang-Yu; Zeng, Mingjian; Li, Xin
2017-01-01
This study evaluates the performance of three-dimensional variational (3DVar) and a hybrid data assimilation system using time-lagged ensembles in a heavy rainfall event. The time-lagged ensembles are constructed by sampling from a moving time window of 3 h along a model trajectory, which is economical and easy to implement. The proposed hybrid data assimilation system introduces flow-dependent error covariance derived from time-lagged ensemble into variational cost function without significantly increasing computational cost. Single observation tests are performed to document characteristic of the hybrid system. The sensitivity of precipitation forecasts to ensemble covariance weight and localization scale is investigated. Additionally, the TLEn-Var is evaluated and compared to the ETKF(ensemble transformed Kalman filter)-based hybrid assimilation within a continuously cycling framework, through which new hybrid analyses are produced every 3 h over 10 days. The 24 h accumulated precipitation, moisture, wind are analyzed between 3DVar and the hybrid assimilation using time-lagged ensembles. Results show that model states and precipitation forecast skill are improved by the hybrid assimilation using time-lagged ensembles compared with 3DVar. Simulation of the precipitable water and structure of the wind are also improved. Cyclonic wind increments are generated near the rainfall center, leading to an improved precipitation forecast. This study indicates that the hybrid data assimilation using time-lagged ensembles seems like a viable alternative or supplement in the complex models for some weather service agencies that have limited computing resources to conduct large size of ensembles.
Selecting a climate model subset to optimise key ensemble properties
NASA Astrophysics Data System (ADS)
Herger, Nadja; Abramowitz, Gab; Knutti, Reto; Angélil, Oliver; Lehmann, Karsten; Sanderson, Benjamin M.
2018-02-01
End users studying impacts and risks caused by human-induced climate change are often presented with large multi-model ensembles of climate projections whose composition and size are arbitrarily determined. An efficient and versatile method that finds a subset which maintains certain key properties from the full ensemble is needed, but very little work has been done in this area. Therefore, users typically make their own somewhat subjective subset choices and commonly use the equally weighted model mean as a best estimate. However, different climate model simulations cannot necessarily be regarded as independent estimates due to the presence of duplicated code and shared development history. Here, we present an efficient and flexible tool that makes better use of the ensemble as a whole by finding a subset with improved mean performance compared to the multi-model mean while at the same time maintaining the spread and addressing the problem of model interdependence. Out-of-sample skill and reliability are demonstrated using model-as-truth experiments. This approach is illustrated with one set of optimisation criteria but we also highlight the flexibility of cost functions, depending on the focus of different users. The technique is useful for a range of applications that, for example, minimise present-day bias to obtain an accurate ensemble mean, reduce dependence in ensemble spread, maximise future spread, ensure good performance of individual models in an ensemble, reduce the ensemble size while maintaining important ensemble characteristics, or optimise several of these at the same time. As in any calibration exercise, the final ensemble is sensitive to the metric, observational product, and pre-processing steps used.
NASA Astrophysics Data System (ADS)
Janicka, Lucja; Szczepanik, Dominika; Borek, Karolina; Heese, Birgit; Stachlewska, Iwona S.
2018-04-01
The aerosol layers of different origin, suspended in the atmosphere on 9-11 August 2015 were observed with the PollyXT-UW lidar in Warsaw, Poland. The HYSPLIT ensemble backward trajectories indicate that the observed air-masses attribute to a few different sources, among others, possible transport paths from Ukraine, Slovakia, and Africa. In this paper, we attempt to analyse and discuss the properties of aerosol particles of different origin that were suspended over Warsaw during this event.
Otero, Cassi L.
2007-01-01
The U.S. Geological Survey, in cooperation with the San Antonio Water System, conducted a 4-year study during 2002?06 to identify major flow paths in the Edwards aquifer in northeastern Bexar and southern Comal Counties (study area). In the study area, faulting directs ground water into three hypothesized flow paths that move water, generally, from the southwest to the northeast. These flow paths are identified as the southern Comal flow path, the central Comal flow path, and the northern Comal flow path. Statistical correlations between water levels for six observation wells and between the water levels and discharges from Comal Springs and Hueco Springs yielded evidence for the hypothesized flow paths. Strong linear correlations were evident between the datasets from wells and springs within the same flow path and the datasets from wells in areas where flow between flow paths was suspected. Geochemical data (major ions, stable isotopes, sulfur hexafluoride, and tritium and helium) were used in graphical analyses to obtain evidence of the flow path from which wells or springs derive water. Major-ion geochemistry in samples from selected wells and springs showed relatively little variation. Samples from the southern Comal flow path were characterized by relatively high sulfate and chloride concentrations, possibly indicating that the water in the flow path was mixing with small amounts of saline water from the freshwater/saline-water transition zone. Samples from the central Comal flow path yielded the most varied major-ion geochemistry of the three hypothesized flow paths. Central Comal flow path samples were characterized, in general, by high calcium concentrations and low magnesium concentrations. Samples from the northern Comal flow path were characterized by relatively low sulfate and chloride concentrations and high magnesium concentrations. The high magnesium concentrations characteristic of northern Comal flow path samples from the recharge zone in Comal County might indicate that water from the Trinity aquifer is entering the Edwards aquifer in the subsurface. A graph of the relation between the stable isotopes deuterium and delta-18 oxygen showed that, except for samples collected following an unusually intense rain storm, there was not much variation in stable isotope values among the flow paths. In the study area deuterium ranged from -36.00 to -20.89 per mil and delta-18 oxygen ranged from -6.03 to -3.70 per mil. Excluding samples collected following the intense rain storm, the deuterium range in the study area was -33.00 to -20.89 per mil and the delta-18 oxygen range was -4.60 to -3.70 per mil. Two ground-water age-dating techniques, sulfur hexafluoride concentrations and tritium/helium-3 isotope ratios, were used to compute apparent ages (time since recharge occurred) of water samples collected in the study area. In general, the apparent ages computed by the two methods do not seem to indicate direction of flow. Apparent ages computed for water samples in northeastern Bexar and southern Comal Counties do not vary greatly except for some very young water in the recharge zone in central Comal County.
A comparison of resampling schemes for estimating model observer performance with small ensembles
NASA Astrophysics Data System (ADS)
Elshahaby, Fatma E. A.; Jha, Abhinav K.; Ghaly, Michael; Frey, Eric C.
2017-09-01
In objective assessment of image quality, an ensemble of images is used to compute the 1st and 2nd order statistics of the data. Often, only a finite number of images is available, leading to the issue of statistical variability in numerical observer performance. Resampling-based strategies can help overcome this issue. In this paper, we compared different combinations of resampling schemes (the leave-one-out (LOO) and the half-train/half-test (HT/HT)) and model observers (the conventional channelized Hotelling observer (CHO), channelized linear discriminant (CLD) and channelized quadratic discriminant). Observer performance was quantified by the area under the ROC curve (AUC). For a binary classification task and for each observer, the AUC value for an ensemble size of 2000 samples per class served as a gold standard for that observer. Results indicated that each observer yielded a different performance depending on the ensemble size and the resampling scheme. For a small ensemble size, the combination [CHO, HT/HT] had more accurate rankings than the combination [CHO, LOO]. Using the LOO scheme, the CLD and CHO had similar performance for large ensembles. However, the CLD outperformed the CHO and gave more accurate rankings for smaller ensembles. As the ensemble size decreased, the performance of the [CHO, LOO] combination seriously deteriorated as opposed to the [CLD, LOO] combination. Thus, it might be desirable to use the CLD with the LOO scheme when smaller ensemble size is available.
Creating ensembles of oblique decision trees with evolutionary algorithms and sampling
Cantu-Paz, Erick [Oakland, CA; Kamath, Chandrika [Tracy, CA
2006-06-13
A decision tree system that is part of a parallel object-oriented pattern recognition system, which in turn is part of an object oriented data mining system. A decision tree process includes the step of reading the data. If necessary, the data is sorted. A potential split of the data is evaluated according to some criterion. An initial split of the data is determined. The final split of the data is determined using evolutionary algorithms and statistical sampling techniques. The data is split. Multiple decision trees are combined in ensembles.
Classification of Odours for Mobile Robots Using an Ensemble of Linear Classifiers
NASA Astrophysics Data System (ADS)
Trincavelli, Marco; Coradeschi, Silvia; Loutfi, Amy
2009-05-01
This paper investigates the classification of odours using an electronic nose mounted on a mobile robot. The samples are collected as the robot explores the environment. Under such conditions, the sensor response differs from typical three phase sampling processes. In this paper, we focus particularly on the classification problem and how it is influenced by the movement of the robot. To cope with these influences, an algorithm consisting of an ensemble of classifiers is presented. Experimental results show that this algorithm increases classification performance compared to other traditional classification methods.
The Relationship between English Language Learner Status and Music Ensemble Participation
ERIC Educational Resources Information Center
Lorah, Julie A.; Sanders, Elizabeth A.; Morrison, Steven J.
2014-01-01
Authors of previous research have reported that U.S. English language learner (ELL) students participate in school-sponsored music ensembles (band, orchestra, and choir) at a lower rate than their native-English-speaking peers (non-ELLs). The current study examined this phenomenon using a nationally representative sample of U.S. 10th graders (14-…
NASA Astrophysics Data System (ADS)
Douglas, Jack
2014-03-01
One of the things that puzzled me when I was a PhD student working under Karl Freed was the curious unity between the theoretical descriptions of excluded volume interactions in polymers, the hydrodynamic properties of polymers in solution, and the critical properties of fluid mixtures, gases and diverse other materials (magnets, superfluids,etc.) when these problems were formally expressed in terms of Wiener path integration and the interactions treated through a combination of epsilon expansion and renormalization group (RG) theory. It seemed that only the interaction labels changed from one problem to the other. What do these problems have in common? Essential clues to these interrelations became apparent when Karl Freed, myself and Shi-Qing Wang together began to study polymers interacting with hyper-surfaces of continuously variable dimension where the Feynman perturbation expansions could be performed through infinite order so that we could really understand what the RG theory was doing. It is evidently simply a particular method for resuming perturbation theory, and former ambiguities no longer existed. An integral equation extension of this type of exact calculation to ``surfaces'' of arbitrary fixed shape finally revealed the central mathematical object that links these diverse physical models- the capacity of polymer chains, whose value vanishes at the critical dimension of 4 and whose magnitude is linked to the friction coefficient of polymer chains, the virial coefficient of polymers and the 4-point function of the phi-4 field theory,...Once this central object was recognized, it then became possible solve diverse problems in material science through the calculation of capacity, and related ``virials'' properties, through Monte Carlo sampling of random walk paths. The essential ideas of this computational method are discussed and some applications given to non-trivial problems: nanotubes treated as either rigid rods or ensembles worm-like chains having finite cross-section, DNA, nanoparticles with grafted chain layers and knotted polymers. The path-integration method, which grew up from research in Karl Freed's group, is evidently a powerful tool for computing basic transport properties of complex-shaped objects and should find increasing application in polymer science, nanotechnological applications and biology.
Hybrid Data Assimilation without Ensemble Filtering
NASA Technical Reports Server (NTRS)
Todling, Ricardo; Akkraoui, Amal El
2014-01-01
The Global Modeling and Assimilation Office is preparing to upgrade its three-dimensional variational system to a hybrid approach in which the ensemble is generated using a square-root ensemble Kalman filter (EnKF) and the variational problem is solved using the Grid-point Statistical Interpolation system. As in most EnKF applications, we found it necessary to employ a combination of multiplicative and additive inflations, to compensate for sampling and modeling errors, respectively and, to maintain the small-member ensemble solution close to the variational solution; we also found it necessary to re-center the members of the ensemble about the variational analysis. During tuning of the filter we have found re-centering and additive inflation to play a considerably larger role than expected, particularly in a dual-resolution context when the variational analysis is ran at larger resolution than the ensemble. This led us to consider a hybrid strategy in which the members of the ensemble are generated by simply converting the variational analysis to the resolution of the ensemble and applying additive inflation, thus bypassing the EnKF. Comparisons of this, so-called, filter-free hybrid procedure with an EnKF-based hybrid procedure and a control non-hybrid, traditional, scheme show both hybrid strategies to provide equally significant improvement over the control; more interestingly, the filter-free procedure was found to give qualitatively similar results to the EnKF-based procedure.
Enhanced conformational sampling to visualize a free-energy landscape of protein complex formation.
Iida, Shinji; Nakamura, Haruki; Higo, Junichi
2016-06-15
We introduce various, recently developed, generalized ensemble methods, which are useful to sample various molecular configurations emerging in the process of protein-protein or protein-ligand binding. The methods introduced here are those that have been or will be applied to biomolecular binding, where the biomolecules are treated as flexible molecules expressed by an all-atom model in an explicit solvent. Sampling produces an ensemble of conformations (snapshots) that are thermodynamically probable at room temperature. Then, projection of those conformations to an abstract low-dimensional space generates a free-energy landscape. As an example, we show a landscape of homo-dimer formation of an endothelin-1-like molecule computed using a generalized ensemble method. The lowest free-energy cluster at room temperature coincided precisely with the experimentally determined complex structure. Two minor clusters were also found in the landscape, which were largely different from the native complex form. Although those clusters were isolated at room temperature, with rising temperature a pathway emerged linking the lowest and second-lowest free-energy clusters, and a further temperature increment connected all the clusters. This exemplifies that the generalized ensemble method is a powerful tool for computing the free-energy landscape, by which one can discuss the thermodynamic stability of clusters and the temperature dependence of the cluster networks. © 2016 The Author(s).
Bayesian quantitative precipitation forecasts in terms of quantiles
NASA Astrophysics Data System (ADS)
Bentzien, Sabrina; Friederichs, Petra
2014-05-01
Ensemble prediction systems (EPS) for numerical weather predictions on the mesoscale are particularly developed to obtain probabilistic guidance for high impact weather. An EPS not only issues a deterministic future state of the atmosphere but a sample of possible future states. Ensemble postprocessing then translates such a sample of forecasts into probabilistic measures. This study focus on probabilistic quantitative precipitation forecasts in terms of quantiles. Quantiles are particular suitable to describe precipitation at various locations, since no assumption is required on the distribution of precipitation. The focus is on the prediction during high-impact events and related to the Volkswagen Stiftung funded project WEX-MOP (Mesoscale Weather Extremes - Theory, Spatial Modeling and Prediction). Quantile forecasts are derived from the raw ensemble and via quantile regression. Neighborhood method and time-lagging are effective tools to inexpensively increase the ensemble spread, which results in more reliable forecasts especially for extreme precipitation events. Since an EPS provides a large amount of potentially informative predictors, a variable selection is required in order to obtain a stable statistical model. A Bayesian formulation of quantile regression allows for inference about the selection of predictive covariates by the use of appropriate prior distributions. Moreover, the implementation of an additional process layer for the regression parameters accounts for spatial variations of the parameters. Bayesian quantile regression and its spatially adaptive extension is illustrated for the German-focused mesoscale weather prediction ensemble COSMO-DE-EPS, which runs (pre)operationally since December 2010 at the German Meteorological Service (DWD). Objective out-of-sample verification uses the quantile score (QS), a weighted absolute error between quantile forecasts and observations. The QS is a proper scoring function and can be decomposed into reliability, resolutions and uncertainty parts. A quantile reliability plot gives detailed insights in the predictive performance of the quantile forecasts.
Force-field functor theory: classical force-fields which reproduce equilibrium quantum distributions
Babbush, Ryan; Parkhill, John; Aspuru-Guzik, Alán
2013-01-01
Feynman and Hibbs were the first to variationally determine an effective potential whose associated classical canonical ensemble approximates the exact quantum partition function. We examine the existence of a map between the local potential and an effective classical potential which matches the exact quantum equilibrium density and partition function. The usefulness of such a mapping rests in its ability to readily improve Born-Oppenheimer potentials for use with classical sampling. We show that such a map is unique and must exist. To explore the feasibility of using this result to improve classical molecular mechanics, we numerically produce a map from a library of randomly generated one-dimensional potential/effective potential pairs then evaluate its performance on independent test problems. We also apply the map to simulate liquid para-hydrogen, finding that the resulting radial pair distribution functions agree well with path integral Monte Carlo simulations. The surprising accessibility and transferability of the technique suggest a quantitative route to adapting Born-Oppenheimer potentials, with a motivation similar in spirit to the powerful ideas and approximations of density functional theory. PMID:24790954
Sampling-based ensemble segmentation against inter-operator variability
NASA Astrophysics Data System (ADS)
Huo, Jing; Okada, Kazunori; Pope, Whitney; Brown, Matthew
2011-03-01
Inconsistency and a lack of reproducibility are commonly associated with semi-automated segmentation methods. In this study, we developed an ensemble approach to improve reproducibility and applied it to glioblastoma multiforme (GBM) brain tumor segmentation on T1-weigted contrast enhanced MR volumes. The proposed approach combines samplingbased simulations and ensemble segmentation into a single framework; it generates a set of segmentations by perturbing user initialization and user-specified internal parameters, then fuses the set of segmentations into a single consensus result. Three combination algorithms were applied: majority voting, averaging and expectation-maximization (EM). The reproducibility of the proposed framework was evaluated by a controlled experiment on 16 tumor cases from a multicenter drug trial. The ensemble framework had significantly better reproducibility than the individual base Otsu thresholding method (p<.001).
Quantifying selective alignment of ensemble nitrogen-vacancy centers in (111) diamond
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tahara, Kosuke; Ozawa, Hayato; Iwasaki, Takayuki
2015-11-09
Selective alignment of nitrogen-vacancy (NV) centers in diamond is an important technique towards its applications. Quantification of the alignment ratio is necessary to design the optimized diamond samples. However, this is not a straightforward problem for dense ensemble of the NV centers. We estimate the alignment ratio of ensemble NV centers along the [111] direction in (111) diamond by optically detected magnetic resonance measurements. Diamond films deposited by N{sub 2} doped chemical vapor deposition have NV center densities over 1 × 10{sup 15 }cm{sup −3} and alignment ratios over 75%. Although spin coherence time (T{sub 2}) is limited to a few μs bymore » electron spins of nitrogen impurities, the combination of the selective alignment and the high density can be a possible way to optimize NV-containing diamond samples for the sensing applications.« less
Detection of eardrum abnormalities using ensemble deep learning approaches
NASA Astrophysics Data System (ADS)
Senaras, Caglar; Moberly, Aaron C.; Teknos, Theodoros; Essig, Garth; Elmaraghy, Charles; Taj-Schaal, Nazhat; Yua, Lianbo; Gurcan, Metin N.
2018-02-01
In this study, we proposed an approach to report the condition of the eardrum as "normal" or "abnormal" by ensembling two different deep learning architectures. In the first network (Network 1), we applied transfer learning to the Inception V3 network by using 409 labeled samples. As a second network (Network 2), we designed a convolutional neural network to take advantage of auto-encoders by using additional 673 unlabeled eardrum samples. The individual classification accuracies of the Network 1 and Network 2 were calculated as 84.4%(+/- 12.1%) and 82.6% (+/- 11.3%), respectively. Only 32% of the errors of the two networks were the same, making it possible to combine two approaches to achieve better classification accuracy. The proposed ensemble method allows us to achieve robust classification because it has high accuracy (84.4%) with the lowest standard deviation (+/- 10.3%).
A Stochastic Diffusion Process for the Dirichlet Distribution
Bakosi, J.; Ristorcelli, J. R.
2013-03-01
The method of potential solutions of Fokker-Planck equations is used to develop a transport equation for the joint probability ofNcoupled stochastic variables with the Dirichlet distribution as its asymptotic solution. To ensure a bounded sample space, a coupled nonlinear diffusion process is required: the Wiener processes in the equivalent system of stochastic differential equations are multiplicative with coefficients dependent on all the stochastic variables. Individual samples of a discrete ensemble, obtained from the stochastic process, satisfy a unit-sum constraint at all times. The process may be used to represent realizations of a fluctuating ensemble ofNvariables subject to a conservation principle.more » Similar to the multivariate Wright-Fisher process, whose invariant is also Dirichlet, the univariate case yields a process whose invariant is the beta distribution. As a test of the results, Monte Carlo simulations are used to evolve numerical ensembles toward the invariant Dirichlet distribution.« less
Deng, Nan-jie; Dai, Wei
2013-01-01
Understanding how kinetics in the unfolded state affects protein folding is a fundamentally important yet less well-understood issue. Here we employ three different models to analyze the unfolded landscape and folding kinetics of the miniprotein Trp-cage. The first is a 208 μs explicit solvent molecular dynamics (MD) simulation from D. E. Shaw Research containing tens of folding events. The second is a Markov state model (MSM-MD) constructed from the same ultra-long MD simulation; MSM-MD can be used to generate thousands of folding events. The third is a Markov state model built from temperature replica exchange MD simulations in implicit solvent (MSM-REMD). All the models exhibit multiple folding pathways, and there is a good correspondence between the folding pathways from direct MD and those computed from the MSMs. The unfolded populations interconvert rapidly between extended and collapsed conformations on time scales ≤ 40 ns, compared with the folding time of ≈ 5 μs. The folding rates are independent of where the folding is initiated from within the unfolded ensemble. About 90 % of the unfolded states are sampled within the first 40 μs of the ultra-long MD trajectory, which on average explores ~27 % of the unfolded state ensemble between consecutive folding events. We clustered the folding pathways according to structural similarity into “tubes”, and kinetically partitioned the unfolded state into populations that fold along different tubes. From our analysis of the simulations and a simple kinetic model, we find that when the mixing within the unfolded state is comparable to or faster than folding, the folding waiting times for all the folding tubes are similar and the folding kinetics is essentially single exponential despite the presence of heterogeneous folding paths with non-uniform barriers. When the mixing is much slower than folding, different unfolded populations fold independently leading to non-exponential kinetics. A kinetic partition of the Trp-cage unfolded state is constructed which reveals that different unfolded populations have almost the same probability to fold along any of the multiple folding paths. We are investigating whether the results for the kinetics in the unfolded state of the twenty-residue Trp-cage is representative of larger single domain proteins. PMID:23705683
Kim, Jung-Hyun; Powell, Jeffery B; Roberge, Raymond J; Shepherd, Angie; Coca, Aitor
2014-01-01
The purpose of this study was to evaluate the predictive capability of fabric Total Heat Loss (THL) values on thermal stress that Personal Protective Equipment (PPE) ensemble wearers may encounter while performing work. A series of three tests, consisting of the Sweating Hot Plate (SHP) test on two sample fabrics and the Sweating Thermal Manikin (STM) and human performance tests on two single-layer encapsulating ensembles (fabric/ensemble A = low THL and B = high THL), was conducted to compare THL values between SHP and STM methods along with human thermophysiological responses to wearing the ensembles. In human testing, ten male subjects performed a treadmill exercise at 4.8 km and 3% incline for 60 min in two environmental conditions (mild = 22°C, 50% relative humidity (RH) and hot/humid = 35°C, 65% RH). The thermal and evaporative resistances were significantly higher on a fabric level as measured in the SHP test than on the ensemble level as measured in the STM test. Consequently the THL values were also significantly different for both fabric types (SHP vs. STM: 191.3 vs. 81.5 W/m(2) in fabric/ensemble A, and 909.3 vs. 149.9 W/m(2) in fabric/ensemble B (p < 0.001). Body temperature and heart rate response between ensembles A and B were consistently different in both environmental conditions (p < 0.001), which is attributed to significantly higher sweat evaporation in ensemble B than in A (p < 0.05), despite a greater sweat production in ensemble A (p < 0.001) in both environmental conditions. Further, elevation of microclimate temperature (p < 0.001) and humidity (p < 0.01) was significantly greater in ensemble A than in B. It was concluded that: (1) SHP test determined THL values are significantly different from the actual THL potential of the PPE ensemble tested on STM, (2) physiological benefits from wearing a more breathable PPE ensemble may not be feasible with incremental THL values (SHP test) less than approximately 150-200 W·m(2), and (3) the effects of thermal environments on a level of heat stress in PPE ensemble wearers are greater than ensemble thermal characteristics.
pE-DB: a database of structural ensembles of intrinsically disordered and of unfolded proteins.
Varadi, Mihaly; Kosol, Simone; Lebrun, Pierre; Valentini, Erica; Blackledge, Martin; Dunker, A Keith; Felli, Isabella C; Forman-Kay, Julie D; Kriwacki, Richard W; Pierattelli, Roberta; Sussman, Joel; Svergun, Dmitri I; Uversky, Vladimir N; Vendruscolo, Michele; Wishart, David; Wright, Peter E; Tompa, Peter
2014-01-01
The goal of pE-DB (http://pedb.vib.be) is to serve as an openly accessible database for the deposition of structural ensembles of intrinsically disordered proteins (IDPs) and of denatured proteins based on nuclear magnetic resonance spectroscopy, small-angle X-ray scattering and other data measured in solution. Owing to the inherent flexibility of IDPs, solution techniques are particularly appropriate for characterizing their biophysical properties, and structural ensembles in agreement with these data provide a convenient tool for describing the underlying conformational sampling. Database entries consist of (i) primary experimental data with descriptions of the acquisition methods and algorithms used for the ensemble calculations, and (ii) the structural ensembles consistent with these data, provided as a set of models in a Protein Data Bank format. PE-DB is open for submissions from the community, and is intended as a forum for disseminating the structural ensembles and the methodologies used to generate them. While the need to represent the IDP structures is clear, methods for determining and evaluating the structural ensembles are still evolving. The availability of the pE-DB database is expected to promote the development of new modeling methods and leads to a better understanding of how function arises from disordered states.
Total probabilities of ensemble runoff forecasts
NASA Astrophysics Data System (ADS)
Olav Skøien, Jon; Bogner, Konrad; Salamon, Peter; Smith, Paul; Pappenberger, Florian
2016-04-01
Ensemble forecasting has for a long time been used as a method in meteorological modelling to indicate the uncertainty of the forecasts. However, as the ensembles often exhibit both bias and dispersion errors, it is necessary to calibrate and post-process them. Two of the most common methods for this are Bayesian Model Averaging (Raftery et al., 2005) and Ensemble Model Output Statistics (EMOS) (Gneiting et al., 2005). There are also methods for regionalizing these methods (Berrocal et al., 2007) and for incorporating the correlation between lead times (Hemri et al., 2013). Engeland and Steinsland Engeland and Steinsland (2014) developed a framework which can estimate post-processing parameters which are different in space and time, but still can give a spatially and temporally consistent output. However, their method is computationally complex for our larger number of stations, and cannot directly be regionalized in the way we would like, so we suggest a different path below. The target of our work is to create a mean forecast with uncertainty bounds for a large number of locations in the framework of the European Flood Awareness System (EFAS - http://www.efas.eu) We are therefore more interested in improving the forecast skill for high-flows rather than the forecast skill of lower runoff levels. EFAS uses a combination of ensemble forecasts and deterministic forecasts from different forecasters to force a distributed hydrologic model and to compute runoff ensembles for each river pixel within the model domain. Instead of showing the mean and the variability of each forecast ensemble individually, we will now post-process all model outputs to find a total probability, the post-processed mean and uncertainty of all ensembles. The post-processing parameters are first calibrated for each calibration location, but assuring that they have some spatial correlation, by adding a spatial penalty in the calibration process. This can in some cases have a slight negative impact on the calibration error, but makes it easier to interpolate the post-processing parameters to uncalibrated locations. We also look into different methods for handling the non-normal distributions of runoff data and the effect of different data transformations on forecasts skills in general and for floods in particular. Berrocal, V. J., Raftery, A. E. and Gneiting, T.: Combining Spatial Statistical and Ensemble Information in Probabilistic Weather Forecasts, Mon. Weather Rev., 135(4), 1386-1402, doi:10.1175/MWR3341.1, 2007. Engeland, K. and Steinsland, I.: Probabilistic postprocessing models for flow forecasts for a system of catchments and several lead times, Water Resour. Res., 50(1), 182-197, doi:10.1002/2012WR012757, 2014. Gneiting, T., Raftery, A. E., Westveld, A. H. and Goldman, T.: Calibrated Probabilistic Forecasting Using Ensemble Model Output Statistics and Minimum CRPS Estimation, Mon. Weather Rev., 133(5), 1098-1118, doi:10.1175/MWR2904.1, 2005. Hemri, S., Fundel, F. and Zappa, M.: Simultaneous calibration of ensemble river flow predictions over an entire range of lead times, Water Resour. Res., 49(10), 6744-6755, doi:10.1002/wrcr.20542, 2013. Raftery, A. E., Gneiting, T., Balabdaoui, F. and Polakowski, M.: Using Bayesian Model Averaging to Calibrate Forecast Ensembles, Mon. Weather Rev., 133(5), 1155-1174, doi:10.1175/MWR2906.1, 2005.
NASA Astrophysics Data System (ADS)
Iachimciuc, Igor
The dissertation is in two parts, a theoretical study and a musical composition. In Part I the music of Gyorgy Kurtag is analyzed from the point of view of sound color. A brief description of what is understood by the term sound color, and various ways of achieving specific coloristic effects, are presented in the Introduction. An examination of Kurtag's approaches to the domain of sound color occupies the chapters that follow. The musical examples that are analyzed are selected from Kurtag's different compositional periods, showing a certain consistency in sound color techniques, the most important of which are already present in the String Quartet, Op. 1. The compositions selected for analysis are written for different ensembles, but regardless of the instrumentation, certain principles of the formation and organization of sound color remain the same. Rather than relying on extended instrumental techniques, Kurtag creates a large variety of sound colors using traditional means such as pitch material, register, density, rhythm, timbral combinations, dynamics, texture, spatial displacement of the instruments, and the overall musical context. Each sound color unit in Kurtag's music is a separate entity, conceived as a complete microcosm. Sound color units can either be juxtaposed as contrasting elements, forming sound color variations, or superimposed, often resulting in a Klangfarbenmelodie effect. Some of the same gestural figures (objets trouves) appear in different compositions, but with significant coloristic modifications. Thus, the principle of sound color variations is not only a strong organizational tool, but also a characteristic stylistic feature of the music of Gyorgy Kurtag. Part II, Leopard's Path (2010), for flute, clarinet, violin, cello, cimbalom, and piano, is an original composition inspired by the painting of Jesse Allen, a San Francisco based artist. The composition is conceived as a cycle of thirteen short movements. Ten of these movements are the musical interpretation of the objects presented in the painting, and are stylistically similar. These movements are scored for the entire ensemble. The other three movements, entitled Interludes, provide a stylistic contrast, and are not directly connected with the painting.
NASA Astrophysics Data System (ADS)
Forrester, Peter J.; Trinh, Allan K.
2018-05-01
The neighbourhood of the largest eigenvalue λmax in the Gaussian unitary ensemble (GUE) and Laguerre unitary ensemble (LUE) is referred to as the soft edge. It is known that there exists a particular centring and scaling such that the distribution of λmax tends to a universal form, with an error term bounded by 1/N2/3. We take up the problem of computing the exact functional form of the leading error term in a large N asymptotic expansion for both the GUE and LUE—two versions of the LUE are considered, one with the parameter a fixed and the other with a proportional to N. Both settings in the LUE case allow for an interpretation in terms of the distribution of a particular weighted path length in a model involving exponential variables on a rectangular grid, as the grid size gets large. We give operator theoretic forms of the corrections, which are corollaries of knowledge of the first two terms in the large N expansion of the scaled kernel and are readily computed using a method due to Bornemann. We also give expressions in terms of the solutions of particular systems of coupled differential equations, which provide an alternative method of computation. Both characterisations are well suited to a thinned generalisation of the original ensemble, whereby each eigenvalue is deleted independently with probability (1 - ξ). In Sec. V, we investigate using simulation the question of whether upon an appropriate centring and scaling a wider class of complex Hermitian random matrix ensembles have their leading correction to the distribution of λmax proportional to 1/N2/3.
Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods.
Notaro, Marco; Schubach, Max; Robinson, Peter N; Valentini, Giorgio
2017-10-12
The prediction of human gene-abnormal phenotype associations is a fundamental step toward the discovery of novel genes associated with human disorders, especially when no genes are known to be associated with a specific disease. In this context the Human Phenotype Ontology (HPO) provides a standard categorization of the abnormalities associated with human diseases. While the problem of the prediction of gene-disease associations has been widely investigated, the related problem of gene-phenotypic feature (i.e., HPO term) associations has been largely overlooked, even if for most human genes no HPO term associations are known and despite the increasing application of the HPO to relevant medical problems. Moreover most of the methods proposed in literature are not able to capture the hierarchical relationships between HPO terms, thus resulting in inconsistent and relatively inaccurate predictions. We present two hierarchical ensemble methods that we formally prove to provide biologically consistent predictions according to the hierarchical structure of the HPO. The modular structure of the proposed methods, that consists in a "flat" learning first step and a hierarchical combination of the predictions in the second step, allows the predictions of virtually any flat learning method to be enhanced. The experimental results show that hierarchical ensemble methods are able to predict novel associations between genes and abnormal phenotypes with results that are competitive with state-of-the-art algorithms and with a significant reduction of the computational complexity. Hierarchical ensembles are efficient computational methods that guarantee biologically meaningful predictions that obey the true path rule, and can be used as a tool to improve and make consistent the HPO terms predictions starting from virtually any flat learning method. The implementation of the proposed methods is available as an R package from the CRAN repository.
NASA Astrophysics Data System (ADS)
Yettella, Vineel; Kay, Jennifer E.
2017-09-01
The extratropical precipitation response to global warming is investigated within a 30-member initial condition climate model ensemble. As in observations, modeled cyclonic precipitation contributes a large fraction of extratropical precipitation, especially over the ocean and in the winter hemisphere. When compared to present day, the ensemble projects increased cyclone-associated precipitation under twenty-first century business-as-usual greenhouse gas forcing. While the cyclone-associated precipitation response is weaker in the near-future (2016-2035) than in the far-future (2081-2100), both future periods have similar patterns of response. Though cyclone frequency changes are important regionally, most of the increased cyclone-associated precipitation results from increased within-cyclone precipitation. Consistent with this result, cyclone-centric composites show statistically significant precipitation increases in all cyclone sectors. Decomposition into thermodynamic (mean cyclone water vapor path) and dynamic (mean cyclone wind speed) contributions shows that thermodynamics explains 92 and 95% of the near-future and far-future within-cyclone precipitation increases respectively. Surprisingly, the influence of dynamics on future cyclonic precipitation changes is negligible. In addition, the forced response exceeds internal variability in both future time periods. Overall, this work suggests that future cyclonic precipitation changes will result primarily from increased moisture availability in a warmer world, with secondary contributions from changes in cyclone frequency and cyclone dynamics.
Impact of distributions on the archetypes and prototypes in heterogeneous nanoparticle ensembles.
Fernandez, Michael; Wilson, Hugh F; Barnard, Amanda S
2017-01-05
The magnitude and complexity of the structural and functional data available on nanomaterials requires data analytics, statistical analysis and information technology to drive discovery. We demonstrate that multivariate statistical analysis can recognise the sets of truly significant nanostructures and their most relevant properties in heterogeneous ensembles with different probability distributions. The prototypical and archetypal nanostructures of five virtual ensembles of Si quantum dots (SiQDs) with Boltzmann, frequency, normal, Poisson and random distributions are identified using clustering and archetypal analysis, where we find that their diversity is defined by size and shape, regardless of the type of distribution. At the complex hull of the SiQD ensembles, simple configuration archetypes can efficiently describe a large number of SiQDs, whereas more complex shapes are needed to represent the average ordering of the ensembles. This approach provides a route towards the characterisation of computationally intractable virtual nanomaterial spaces, which can convert big data into smart data, and significantly reduce the workload to simulate experimentally relevant virtual samples.
Ensemble of classifiers for confidence-rated classification of NDE signal
NASA Astrophysics Data System (ADS)
Banerjee, Portia; Safdarnejad, Seyed; Udpa, Lalita; Udpa, Satish
2016-02-01
Ensemble of classifiers in general, aims to improve classification accuracy by combining results from multiple weak hypotheses into a single strong classifier through weighted majority voting. Improved versions of ensemble of classifiers generate self-rated confidence scores which estimate the reliability of each of its prediction and boost the classifier using these confidence-rated predictions. However, such a confidence metric is based only on the rate of correct classification. In existing works, although ensemble of classifiers has been widely used in computational intelligence, the effect of all factors of unreliability on the confidence of classification is highly overlooked. With relevance to NDE, classification results are affected by inherent ambiguity of classifica-tion, non-discriminative features, inadequate training samples and noise due to measurement. In this paper, we extend the existing ensemble classification by maximizing confidence of every classification decision in addition to minimizing the classification error. Initial results of the approach on data from eddy current inspection show improvement in classification performance of defect and non-defect indications.
Information flow in an atmospheric model and data assimilation
NASA Astrophysics Data System (ADS)
Yoon, Young-noh
2011-12-01
Weather forecasting consists of two processes, model integration and analysis (data assimilation). During the model integration, the state estimate produced by the analysis evolves to the next cycle time according to the atmospheric model to become the background estimate. The analysis then produces a new state estimate by combining the background state estimate with new observations, and the cycle repeats. In an ensemble Kalman filter, the probability distribution of the state estimate is represented by an ensemble of sample states, and the covariance matrix is calculated using the ensemble of sample states. We perform numerical experiments on toy atmospheric models introduced by Lorenz in 2005 to study the information flow in an atmospheric model in conjunction with ensemble Kalman filtering for data assimilation. This dissertation consists of two parts. The first part of this dissertation is about the propagation of information and the use of localization in ensemble Kalman filtering. If we can perform data assimilation locally by considering the observations and the state variables only near each grid point, then we can reduce the number of ensemble members necessary to cover the probability distribution of the state estimate, reducing the computational cost for the data assimilation and the model integration. Several localized versions of the ensemble Kalman filter have been proposed. Although tests applying such schemes have proven them to be extremely promising, a full basic understanding of the rationale and limitations of localization is currently lacking. We address these issues and elucidate the role played by chaotic wave dynamics in the propagation of information and the resulting impact on forecasts. The second part of this dissertation is about ensemble regional data assimilation using joint states. Assuming that we have a global model and a regional model of higher accuracy defined in a subregion inside the global region, we propose a data assimilation scheme that produces the analyses for the global and the regional model simultaneously, considering forecast information from both models. We show that our new data assimilation scheme produces better results both in the subregion and the global region than the data assimilation scheme that produces the analyses for the global and the regional model separately.
NASA Astrophysics Data System (ADS)
Booth, B. B. B.; Bernie, D.; McNeall, D.; Hawkins, E.; Caesar, J.; Boulton, C.; Friedlingstein, P.; Sexton, D. M. H.
2013-04-01
We compare future changes in global mean temperature in response to different future scenarios which, for the first time, arise from emission-driven rather than concentration-driven perturbed parameter ensemble of a global climate model (GCM). These new GCM simulations sample uncertainties in atmospheric feedbacks, land carbon cycle, ocean physics and aerosol sulphur cycle processes. We find broader ranges of projected temperature responses arising when considering emission rather than concentration-driven simulations (with 10-90th percentile ranges of 1.7 K for the aggressive mitigation scenario, up to 3.9 K for the high-end, business as usual scenario). A small minority of simulations resulting from combinations of strong atmospheric feedbacks and carbon cycle responses show temperature increases in excess of 9 K (RCP8.5) and even under aggressive mitigation (RCP2.6) temperatures in excess of 4 K. While the simulations point to much larger temperature ranges for emission-driven experiments, they do not change existing expectations (based on previous concentration-driven experiments) on the timescales over which different sources of uncertainty are important. The new simulations sample a range of future atmospheric concentrations for each emission scenario. Both in the case of SRES A1B and the Representative Concentration Pathways (RCPs), the concentration scenarios used to drive GCM ensembles, lies towards the lower end of our simulated distribution. This design decision (a legacy of previous assessments) is likely to lead concentration-driven experiments to under-sample strong feedback responses in future projections. Our ensemble of emission-driven simulations span the global temperature response of the CMIP5 emission-driven simulations, except at the low end. Combinations of low climate sensitivity and low carbon cycle feedbacks lead to a number of CMIP5 responses to lie below our ensemble range. The ensemble simulates a number of high-end responses which lie above the CMIP5 carbon cycle range. These high-end simulations can be linked to sampling a number of stronger carbon cycle feedbacks and to sampling climate sensitivities above 4.5 K. This latter aspect highlights the priority in identifying real-world climate-sensitivity constraints which, if achieved, would lead to reductions on the upper bound of projected global mean temperature change. The ensembles of simulations presented here provides a framework to explore relationships between present-day observables and future changes, while the large spread of future-projected changes highlights the ongoing need for such work.
Clustering cancer gene expression data by projective clustering ensemble
Yu, Xianxue; Yu, Guoxian
2017-01-01
Gene expression data analysis has paramount implications for gene treatments, cancer diagnosis and other domains. Clustering is an important and promising tool to analyze gene expression data. Gene expression data is often characterized by a large amount of genes but with limited samples, thus various projective clustering techniques and ensemble techniques have been suggested to combat with these challenges. However, it is rather challenging to synergy these two kinds of techniques together to avoid the curse of dimensionality problem and to boost the performance of gene expression data clustering. In this paper, we employ a projective clustering ensemble (PCE) to integrate the advantages of projective clustering and ensemble clustering, and to avoid the dilemma of combining multiple projective clusterings. Our experimental results on publicly available cancer gene expression data show PCE can improve the quality of clustering gene expression data by at least 4.5% (on average) than other related techniques, including dimensionality reduction based single clustering and ensemble approaches. The empirical study demonstrates that, to further boost the performance of clustering cancer gene expression data, it is necessary and promising to synergy projective clustering with ensemble clustering. PCE can serve as an effective alternative technique for clustering gene expression data. PMID:28234920
Inner Radiation Belt Dynamics and Climatology
NASA Astrophysics Data System (ADS)
Guild, T. B.; O'Brien, P. P.; Looper, M. D.
2012-12-01
We present preliminary results of inner belt proton data assimilation using an augmented version of the Selesnick et al. Inner Zone Model (SIZM). By varying modeled physics parameters and solar particle injection parameters to generate many ensembles of the inner belt, then optimizing the ensemble weights according to inner belt observations from SAMPEX/PET at LEO and HEO/DOS at high altitude, we obtain the best-fit state of the inner belt. We need to fully sample the range of solar proton injection sources among the ensemble members to ensure reasonable agreement between the model ensembles and observations. Once this is accomplished, we find the method is fairly robust. We will demonstrate the data assimilation by presenting an extended interval of solar proton injections and losses, illustrating how these short-term dynamics dominate long-term inner belt climatology.
An Optimization Principle for Deriving Nonequilibrium Statistical Models of Hamiltonian Dynamics
NASA Astrophysics Data System (ADS)
Turkington, Bruce
2013-08-01
A general method for deriving closed reduced models of Hamiltonian dynamical systems is developed using techniques from optimization and statistical estimation. Given a vector of resolved variables, selected to describe the macroscopic state of the system, a family of quasi-equilibrium probability densities on phase space corresponding to the resolved variables is employed as a statistical model, and the evolution of the mean resolved vector is estimated by optimizing over paths of these densities. Specifically, a cost function is constructed to quantify the lack-of-fit to the microscopic dynamics of any feasible path of densities from the statistical model; it is an ensemble-averaged, weighted, squared-norm of the residual that results from submitting the path of densities to the Liouville equation. The path that minimizes the time integral of the cost function determines the best-fit evolution of the mean resolved vector. The closed reduced equations satisfied by the optimal path are derived by Hamilton-Jacobi theory. When expressed in terms of the macroscopic variables, these equations have the generic structure of governing equations for nonequilibrium thermodynamics. In particular, the value function for the optimization principle coincides with the dissipation potential that defines the relation between thermodynamic forces and fluxes. The adjustable closure parameters in the best-fit reduced equations depend explicitly on the arbitrary weights that enter into the lack-of-fit cost function. Two particular model reductions are outlined to illustrate the general method. In each example the set of weights in the optimization principle contracts into a single effective closure parameter.
Adaptive sampling strategies with high-throughput molecular dynamics
NASA Astrophysics Data System (ADS)
Clementi, Cecilia
Despite recent significant hardware and software developments, the complete thermodynamic and kinetic characterization of large macromolecular complexes by molecular simulations still presents significant challenges. The high dimensionality of these systems and the complexity of the associated potential energy surfaces (creating multiple metastable regions connected by high free energy barriers) does not usually allow to adequately sample the relevant regions of their configurational space by means of a single, long Molecular Dynamics (MD) trajectory. Several different approaches have been proposed to tackle this sampling problem. We focus on the development of ensemble simulation strategies, where data from a large number of weakly coupled simulations are integrated to explore the configurational landscape of a complex system more efficiently. Ensemble methods are of increasing interest as the hardware roadmap is now mostly based on increasing core counts, rather than clock speeds. The main challenge in the development of an ensemble approach for efficient sampling is in the design of strategies to adaptively distribute the trajectories over the relevant regions of the systems' configurational space, without using any a priori information on the system global properties. We will discuss the definition of smart adaptive sampling approaches that can redirect computational resources towards unexplored yet relevant regions. Our approaches are based on new developments in dimensionality reduction for high dimensional dynamical systems, and optimal redistribution of resources. NSF CHE-1152344, NSF CHE-1265929, Welch Foundation C-1570.
Nullspace Sampling with Holonomic Constraints Reveals Molecular Mechanisms of Protein Gαs.
Pachov, Dimitar V; van den Bedem, Henry
2015-07-01
Proteins perform their function or interact with partners by exchanging between conformational substates on a wide range of spatiotemporal scales. Structurally characterizing these exchanges is challenging, both experimentally and computationally. Large, diffusional motions are often on timescales that are difficult to access with molecular dynamics simulations, especially for large proteins and their complexes. The low frequency modes of normal mode analysis (NMA) report on molecular fluctuations associated with biological activity. However, NMA is limited to a second order expansion about a minimum of the potential energy function, which limits opportunities to observe diffusional motions. By contrast, kino-geometric conformational sampling (KGS) permits large perturbations while maintaining the exact geometry of explicit conformational constraints, such as hydrogen bonds. Here, we extend KGS and show that a conformational ensemble of the α subunit Gαs of heterotrimeric stimulatory protein Gs exhibits structural features implicated in its activation pathway. Activation of protein Gs by G protein-coupled receptors (GPCRs) is associated with GDP release and large conformational changes of its α-helical domain. Our method reveals a coupled α-helical domain opening motion while, simultaneously, Gαs helix α5 samples an activated conformation. These motions are moderated in the activated state. The motion centers on a dynamic hub near the nucleotide-binding site of Gαs, and radiates to helix α4. We find that comparative NMA-based ensembles underestimate the amplitudes of the motion. Additionally, the ensembles fall short in predicting the accepted direction of the full activation pathway. Taken together, our findings suggest that nullspace sampling with explicit, holonomic constraints yields ensembles that illuminate molecular mechanisms involved in GDP release and protein Gs activation, and further establish conformational coupling between key structural elements of Gαs.
Nullspace Sampling with Holonomic Constraints Reveals Molecular Mechanisms of Protein Gαs
Pachov, Dimitar V.; van den Bedem, Henry
2015-01-01
Proteins perform their function or interact with partners by exchanging between conformational substates on a wide range of spatiotemporal scales. Structurally characterizing these exchanges is challenging, both experimentally and computationally. Large, diffusional motions are often on timescales that are difficult to access with molecular dynamics simulations, especially for large proteins and their complexes. The low frequency modes of normal mode analysis (NMA) report on molecular fluctuations associated with biological activity. However, NMA is limited to a second order expansion about a minimum of the potential energy function, which limits opportunities to observe diffusional motions. By contrast, kino-geometric conformational sampling (KGS) permits large perturbations while maintaining the exact geometry of explicit conformational constraints, such as hydrogen bonds. Here, we extend KGS and show that a conformational ensemble of the α subunit Gαs of heterotrimeric stimulatory protein Gs exhibits structural features implicated in its activation pathway. Activation of protein Gs by G protein-coupled receptors (GPCRs) is associated with GDP release and large conformational changes of its α-helical domain. Our method reveals a coupled α-helical domain opening motion while, simultaneously, Gαs helix α5 samples an activated conformation. These motions are moderated in the activated state. The motion centers on a dynamic hub near the nucleotide-binding site of Gαs, and radiates to helix α4. We find that comparative NMA-based ensembles underestimate the amplitudes of the motion. Additionally, the ensembles fall short in predicting the accepted direction of the full activation pathway. Taken together, our findings suggest that nullspace sampling with explicit, holonomic constraints yields ensembles that illuminate molecular mechanisms involved in GDP release and protein Gs activation, and further establish conformational coupling between key structural elements of Gαs. PMID:26218073
Ensemble-Biased Metadynamics: A Molecular Simulation Method to Sample Experimental Distributions
Marinelli, Fabrizio; Faraldo-Gómez, José D.
2015-01-01
We introduce an enhanced-sampling method for molecular dynamics (MD) simulations referred to as ensemble-biased metadynamics (EBMetaD). The method biases a conventional MD simulation to sample a molecular ensemble that is consistent with one or more probability distributions known a priori, e.g., experimental intramolecular distance distributions obtained by double electron-electron resonance or other spectroscopic techniques. To this end, EBMetaD adds an adaptive biasing potential throughout the simulation that discourages sampling of configurations inconsistent with the target probability distributions. The bias introduced is the minimum necessary to fulfill the target distributions, i.e., EBMetaD satisfies the maximum-entropy principle. Unlike other methods, EBMetaD does not require multiple simulation replicas or the introduction of Lagrange multipliers, and is therefore computationally efficient and straightforward in practice. We demonstrate the performance and accuracy of the method for a model system as well as for spin-labeled T4 lysozyme in explicit water, and show how EBMetaD reproduces three double electron-electron resonance distance distributions concurrently within a few tens of nanoseconds of simulation time. EBMetaD is integrated in the open-source PLUMED plug-in (www.plumed-code.org), and can be therefore readily used with multiple MD engines. PMID:26083917
Salmon, Loïc; Giambaşu, George M; Nikolova, Evgenia N; Petzold, Katja; Bhattacharya, Akash; Case, David A; Al-Hashimi, Hashim M
2015-10-14
Approaches that combine experimental data and computational molecular dynamics (MD) to determine atomic resolution ensembles of biomolecules require the measurement of abundant experimental data. NMR residual dipolar couplings (RDCs) carry rich dynamics information, however, difficulties in modulating overall alignment of nucleic acids have limited the ability to fully extract this information. We present a strategy for modulating RNA alignment that is based on introducing variable dynamic kinks in terminal helices. With this strategy, we measured seven sets of RDCs in a cUUCGg apical loop and used this rich data set to test the accuracy of an 0.8 μs MD simulation computed using the Amber ff10 force field as well as to determine an atomic resolution ensemble. The MD-generated ensemble quantitatively reproduces the measured RDCs, but selection of a sub-ensemble was required to satisfy the RDCs within error. The largest discrepancies between the RDC-selected and MD-generated ensembles are observed for the most flexible loop residues and backbone angles connecting the loop to the helix, with the RDC-selected ensemble resulting in more uniform dynamics. Comparison of the RDC-selected ensemble with NMR spin relaxation data suggests that the dynamics occurs on the ps-ns time scales as verified by measurements of R(1ρ) relaxation-dispersion data. The RDC-satisfying ensemble samples many conformations adopted by the hairpin in crystal structures indicating that intrinsic plasticity may play important roles in conformational adaptation. The approach presented here can be applied to test nucleic acid force fields and to characterize dynamics in diverse RNA motifs at atomic resolution.
NASA Astrophysics Data System (ADS)
Oh, Seok-Geun; Suh, Myoung-Seok
2017-07-01
The projection skills of five ensemble methods were analyzed according to simulation skills, training period, and ensemble members, using 198 sets of pseudo-simulation data (PSD) produced by random number generation assuming the simulated temperature of regional climate models. The PSD sets were classified into 18 categories according to the relative magnitude of bias, variance ratio, and correlation coefficient, where each category had 11 sets (including 1 truth set) with 50 samples. The ensemble methods used were as follows: equal weighted averaging without bias correction (EWA_NBC), EWA with bias correction (EWA_WBC), weighted ensemble averaging based on root mean square errors and correlation (WEA_RAC), WEA based on the Taylor score (WEA_Tay), and multivariate linear regression (Mul_Reg). The projection skills of the ensemble methods improved generally as compared with the best member for each category. However, their projection skills are significantly affected by the simulation skills of the ensemble member. The weighted ensemble methods showed better projection skills than non-weighted methods, in particular, for the PSD categories having systematic biases and various correlation coefficients. The EWA_NBC showed considerably lower projection skills than the other methods, in particular, for the PSD categories with systematic biases. Although Mul_Reg showed relatively good skills, it showed strong sensitivity to the PSD categories, training periods, and number of members. On the other hand, the WEA_Tay and WEA_RAC showed relatively superior skills in both the accuracy and reliability for all the sensitivity experiments. This indicates that WEA_Tay and WEA_RAC are applicable even for simulation data with systematic biases, a short training period, and a small number of ensemble members.
2010-01-01
Background The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics. Description The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl. Conclusions Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org. PMID:20459805
Note: A pure-sampling quantum Monte Carlo algorithm with independent Metropolis.
Vrbik, Jan; Ospadov, Egor; Rothstein, Stuart M
2016-07-14
Recently, Ospadov and Rothstein published a pure-sampling quantum Monte Carlo algorithm (PSQMC) that features an auxiliary Path Z that connects the midpoints of the current and proposed Paths X and Y, respectively. When sufficiently long, Path Z provides statistical independence of Paths X and Y. Under those conditions, the Metropolis decision used in PSQMC is done without any approximation, i.e., not requiring microscopic reversibility and without having to introduce any G(x → x'; τ) factors into its decision function. This is a unique feature that contrasts with all competing reptation algorithms in the literature. An example illustrates that dependence of Paths X and Y has adverse consequences for pure sampling.
Note: A pure-sampling quantum Monte Carlo algorithm with independent Metropolis
NASA Astrophysics Data System (ADS)
Vrbik, Jan; Ospadov, Egor; Rothstein, Stuart M.
2016-07-01
Recently, Ospadov and Rothstein published a pure-sampling quantum Monte Carlo algorithm (PSQMC) that features an auxiliary Path Z that connects the midpoints of the current and proposed Paths X and Y, respectively. When sufficiently long, Path Z provides statistical independence of Paths X and Y. Under those conditions, the Metropolis decision used in PSQMC is done without any approximation, i.e., not requiring microscopic reversibility and without having to introduce any G(x → x'; τ) factors into its decision function. This is a unique feature that contrasts with all competing reptation algorithms in the literature. An example illustrates that dependence of Paths X and Y has adverse consequences for pure sampling.
Manipulating mesoscopic multipartite entanglement with atom-light interfaces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stasinska, J.; Rodo, C.; Paganelli, S.
2009-12-15
Entanglement between two macroscopic atomic ensembles induced by measurement on an ancillary light system has proven to be a powerful method for engineering quantum memories and quantum state transfer. Here we investigate the feasibility of such methods for generation, manipulation, and detection of genuine multipartite entanglement (Greenberger-Horne-Zeilinger and clusterlike states) between mesoscopic atomic ensembles without the need of individual addressing of the samples. Our results extend in a nontrivial way the Einstein-Podolsky-Rosen entanglement between two macroscopic gas samples reported experimentally in [B. Julsgaard, A. Kozhekin, and E. Polzik, Nature (London) 413, 400 (2001)]. We find that under realistic conditions, amore » second orthogonal light pulse interacting with the atomic samples, can modify and even reverse the entangling action of the first one leaving the samples in a separable state.« less
Toward canonical ensemble distribution from self-guided Langevin dynamics simulation
NASA Astrophysics Data System (ADS)
Wu, Xiongwu; Brooks, Bernard R.
2011-04-01
This work derives a quantitative description of the conformational distribution in self-guided Langevin dynamics (SGLD) simulations. SGLD simulations employ guiding forces calculated from local average momentums to enhance low-frequency motion. This enhancement in low-frequency motion dramatically accelerates conformational search efficiency, but also induces certain perturbations in conformational distribution. Through the local averaging, we separate properties of molecular systems into low-frequency and high-frequency portions. The guiding force effect on the conformational distribution is quantitatively described using these low-frequency and high-frequency properties. This quantitative relation provides a way to convert between a canonical ensemble and a self-guided ensemble. Using example systems, we demonstrated how to utilize the relation to obtain canonical ensemble properties and conformational distributions from SGLD simulations. This development makes SGLD not only an efficient approach for conformational searching, but also an accurate means for conformational sampling.
Szathmáry, E
2000-01-01
Replicators of interest in chemistry, biology and culture are briefly surveyed from a conceptual point of view. Systems with limited heredity have only a limited evolutionary potential because the number of available types is too low. Chemical cycles, such as the formose reaction, are holistic replicators since replication is not based on the successive addition of modules. Replicator networks consisting of catalytic molecules (such as reflexively autocatalytic sets of proteins, or reproducing lipid vesicles) are hypothetical ensemble replicators, and their functioning rests on attractors of their dynamics. Ensemble replicators suffer from the paradox of specificity: while their abstract feasibility seems to require a high number of molecular types, the harmful effect of side reactions calls for a small system size. No satisfactory solution to this problem is known. Phenotypic replicators do not pass on their genotypes, only some aspects of the phenotype are transmitted. Phenotypic replicators with limited heredity include genetic membranes, prions and simple memetic systems. Memes in human culture are unlimited hereditary, phenotypic replicators, based on language. The typical path of evolution goes from limited to unlimited heredity, and from attractor-based to modular (digital) replicators. PMID:11127914
Szathmáry, E
2000-11-29
Replicators of interest in chemistry, biology and culture are briefly surveyed from a conceptual point of view. Systems with limited heredity have only a limited evolutionary potential because the number of available types is too low. Chemical cycles, such as the formose reaction, are holistic replicators since replication is not based on the successive addition of modules. Replicator networks consisting of catalytic molecules (such as reflexively autocatalytic sets of proteins, or reproducing lipid vesicles) are hypothetical ensemble replicators, and their functioning rests on attractors of their dynamics. Ensemble replicators suffer from the paradox of specificity: while their abstract feasibility seems to require a high number of molecular types, the harmful effect of side reactions calls for a small system size. No satisfactory solution to this problem is known. Phenotypic replicators do not pass on their genotypes, only some aspects of the phenotype are transmitted. Phenotypic replicators with limited heredity include genetic membranes, prions and simple memetic systems. Memes in human culture are unlimited hereditary, phenotypic replicators, based on language. The typical path of evolution goes from limited to unlimited heredity, and from attractor-based to modular (digital) replicators.
NASA Technical Reports Server (NTRS)
Jahshan, S. N.; Singleterry, R. C.
2001-01-01
The effect of random fuel redistribution on the eigenvalue of a one-speed reactor is investigated. An ensemble of such reactors that are identical to a homogeneous reference critical reactor except for the fissile isotope density distribution is constructed such that it meets a set of well-posed redistribution requirements. The average eigenvalue,
Modelling dynamics in protein crystal structures by ensemble refinement
Burnley, B Tom; Afonine, Pavel V; Adams, Paul D; Gros, Piet
2012-01-01
Single-structure models derived from X-ray data do not adequately account for the inherent, functionally important dynamics of protein molecules. We generated ensembles of structures by time-averaged refinement, where local molecular vibrations were sampled by molecular-dynamics (MD) simulation whilst global disorder was partitioned into an underlying overall translation–libration–screw (TLS) model. Modeling of 20 protein datasets at 1.1–3.1 Å resolution reduced cross-validated Rfree values by 0.3–4.9%, indicating that ensemble models fit the X-ray data better than single structures. The ensembles revealed that, while most proteins display a well-ordered core, some proteins exhibit a ‘molten core’ likely supporting functionally important dynamics in ligand binding, enzyme activity and protomer assembly. Order–disorder changes in HIV protease indicate a mechanism of entropy compensation for ordering the catalytic residues upon ligand binding by disordering specific core residues. Thus, ensemble refinement extracts dynamical details from the X-ray data that allow a more comprehensive understanding of structure–dynamics–function relationships. DOI: http://dx.doi.org/10.7554/eLife.00311.001 PMID:23251785
NASA Astrophysics Data System (ADS)
Soltanzadeh, I.; Azadi, M.; Vakili, G. A.
2011-07-01
Using Bayesian Model Averaging (BMA), an attempt was made to obtain calibrated probabilistic numerical forecasts of 2-m temperature over Iran. The ensemble employs three limited area models (WRF, MM5 and HRM), with WRF used with five different configurations. Initial and boundary conditions for MM5 and WRF are obtained from the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS) and for HRM the initial and boundary conditions come from analysis of Global Model Europe (GME) of the German Weather Service. The resulting ensemble of seven members was run for a period of 6 months (from December 2008 to May 2009) over Iran. The 48-h raw ensemble outputs were calibrated using BMA technique for 120 days using a 40 days training sample of forecasts and relative verification data. The calibrated probabilistic forecasts were assessed using rank histogram and attribute diagrams. Results showed that application of BMA improved the reliability of the raw ensemble. Using the weighted ensemble mean forecast as a deterministic forecast it was found that the deterministic-style BMA forecasts performed usually better than the best member's deterministic forecast.
Donovan, Rory M.; Tapia, Jose-Juan; Sullivan, Devin P.; Faeder, James R.; Murphy, Robert F.; Dittrich, Markus; Zuckerman, Daniel M.
2016-01-01
The long-term goal of connecting scales in biological simulation can be facilitated by scale-agnostic methods. We demonstrate that the weighted ensemble (WE) strategy, initially developed for molecular simulations, applies effectively to spatially resolved cell-scale simulations. The WE approach runs an ensemble of parallel trajectories with assigned weights and uses a statistical resampling strategy of replicating and pruning trajectories to focus computational effort on difficult-to-sample regions. The method can also generate unbiased estimates of non-equilibrium and equilibrium observables, sometimes with significantly less aggregate computing time than would be possible using standard parallelization. Here, we use WE to orchestrate particle-based kinetic Monte Carlo simulations, which include spatial geometry (e.g., of organelles, plasma membrane) and biochemical interactions among mobile molecular species. We study a series of models exhibiting spatial, temporal and biochemical complexity and show that although WE has important limitations, it can achieve performance significantly exceeding standard parallel simulation—by orders of magnitude for some observables. PMID:26845334
Probing RNA Native Conformational Ensembles with Structural Constraints.
Fonseca, Rasmus; van den Bedem, Henry; Bernauer, Julie
2016-05-01
Noncoding ribonucleic acids (RNA) play a critical role in a wide variety of cellular processes, ranging from regulating gene expression to post-translational modification and protein synthesis. Their activity is modulated by highly dynamic exchanges between three-dimensional conformational substates, which are difficult to characterize experimentally and computationally. Here, we present an innovative, entirely kinematic computational procedure to efficiently explore the native ensemble of RNA molecules. Our procedure projects degrees of freedom onto a subspace of conformation space defined by distance constraints in the tertiary structure. The dimensionality reduction enables efficient exploration of conformational space. We show that the conformational distributions obtained with our method broadly sample the conformational landscape observed in NMR experiments. Compared to normal mode analysis-based exploration, our procedure diffuses faster through the experimental ensemble while also accessing conformational substates to greater precision. Our results suggest that conformational sampling with a highly reduced but fully atomistic representation of noncoding RNA expresses key features of their dynamic nature.
Scalable and balanced dynamic hybrid data assimilation
NASA Astrophysics Data System (ADS)
Kauranne, Tuomo; Amour, Idrissa; Gunia, Martin; Kallio, Kari; Lepistö, Ahti; Koponen, Sampsa
2017-04-01
Scalability of complex weather forecasting suites is dependent on the technical tools available for implementing highly parallel computational kernels, but to an equally large extent also on the dependence patterns between various components of the suite, such as observation processing, data assimilation and the forecast model. Scalability is a particular challenge for 4D variational assimilation methods that necessarily couple the forecast model into the assimilation process and subject this combination to an inherently serial quasi-Newton minimization process. Ensemble based assimilation methods are naturally more parallel, but large models force ensemble sizes to be small and that results in poor assimilation accuracy, somewhat akin to shooting with a shotgun in a million-dimensional space. The Variational Ensemble Kalman Filter (VEnKF) is an ensemble method that can attain the accuracy of 4D variational data assimilation with a small ensemble size. It achieves this by processing a Gaussian approximation of the current error covariance distribution, instead of a set of ensemble members, analogously to the Extended Kalman Filter EKF. Ensemble members are re-sampled every time a new set of observations is processed from a new approximation of that Gaussian distribution which makes VEnKF a dynamic assimilation method. After this a smoothing step is applied that turns VEnKF into a dynamic Variational Ensemble Kalman Smoother VEnKS. In this smoothing step, the same process is iterated with frequent re-sampling of the ensemble but now using past iterations as surrogate observations until the end result is a smooth and balanced model trajectory. In principle, VEnKF could suffer from similar scalability issues as 4D-Var. However, this can be avoided by isolating the forecast model completely from the minimization process by implementing the latter as a wrapper code whose only link to the model is calling for many parallel and totally independent model runs, all of them implemented as parallel model runs themselves. The only bottleneck in the process is the gathering and scattering of initial and final model state snapshots before and after the parallel runs which requires a very efficient and low-latency communication network. However, the volume of data communicated is small and the intervening minimization steps are only 3D-Var, which means their computational load is negligible compared with the fully parallel model runs. We present example results of scalable VEnKF with the 4D lake and shallow sea model COHERENS, assimilating simultaneously continuous in situ measurements in a single point and infrequent satellite images that cover a whole lake, with the fully scalable VEnKF.
Comparison of different deep learning approaches for parotid gland segmentation from CT images
NASA Astrophysics Data System (ADS)
Hänsch, Annika; Schwier, Michael; Gass, Tobias; Morgas, Tomasz; Haas, Benjamin; Klein, Jan; Hahn, Horst K.
2018-02-01
The segmentation of target structures and organs at risk is a crucial and very time-consuming step in radiotherapy planning. Good automatic methods can significantly reduce the time clinicians have to spend on this task. Due to its variability in shape and often low contrast to surrounding structures, segmentation of the parotid gland is especially challenging. Motivated by the recent success of deep learning, we study different deep learning approaches for parotid gland segmentation. Particularly, we compare 2D, 2D ensemble and 3D U-Net approaches and find that the 2D U-Net ensemble yields the best results with a mean Dice score of 0.817 on our test data. The ensemble approach reduces false positives without the need for an automatic region of interest detection. We also apply our trained 2D U-Net ensemble to segment the test data of the 2015 MICCAI head and neck auto-segmentation challenge. With a mean Dice score of 0.861, our classifier exceeds the highest mean score in the challenge. This shows that the method generalizes well onto data from independent sites. Since appropriate reference annotations are essential for training but often difficult and expensive to obtain, it is important to know how many samples are needed to properly train a neural network. We evaluate the classifier performance after training with differently sized training sets (50-450) and find that 250 cases (without using extensive data augmentation) are sufficient to obtain good results with the 2D ensemble. Adding more samples does not significantly improve the Dice score of the segmentations.
H theorem for generalized entropic forms within a master-equation framework
NASA Astrophysics Data System (ADS)
Casas, Gabriela A.; Nobre, Fernando D.; Curado, Evaldo M. F.
2016-03-01
The H theorem is proven for generalized entropic forms, in the case of a discrete set of states. The associated probability distributions evolve in time according to a master equation, for which the corresponding transition rates depend on these entropic forms. An important equation describing the time evolution of the transition rates and probabilities in such a way as to drive the system towards an equilibrium state is found. In the particular case of Boltzmann-Gibbs entropy, it is shown that this equation is satisfied in the microcanonical ensemble only for symmetric probability transition rates, characterizing a single path to the equilibrium state. This equation fulfils the proof of the H theorem for generalized entropic forms, associated with systems characterized by complex dynamics, e.g., presenting nonsymmetric probability transition rates and more than one path towards the same equilibrium state. Some examples considering generalized entropies of the literature are discussed, showing that they should be applicable to a wide range of natural phenomena, mainly those within the realm of complex systems.
Self-stabilized narrow-bandwidth and high-fidelity entangled photons generated from cold atoms
NASA Astrophysics Data System (ADS)
Yu, Y. C.; Ding, D. S.; Dong, M. X.; Shi, S.; Zhang, W.; Shi, B. S.
2018-04-01
Entangled photon pairs are critically important in fundamental quantum mechanics research as well as in many areas within the field of quantum information, such as quantum communication, quantum computation, and quantum cryptography. Previous demonstrations of entangled photons based on atomic ensembles were achieved by using a reference laser to stabilize the phase of two spontaneous four-wave mixing paths. Here, we demonstrate a convenient and efficient scheme to generate polarization-entangled photons with a narrow bandwidth of 57.2 ±1.6 MHz and a high-fidelity of 96.3 ±0.8 % by using a phase self-stabilized multiplexing system formed by two beam displacers and two half-wave plates where the relative phase between the different signal paths can be eliminated completely. It is possible to stabilize an entangled photon pair for a long time with this system and produce all four Bell states, making this a vital step forward in the field of quantum information.
Path-integral simulation of solids.
Herrero, C P; Ramírez, R
2014-06-11
The path-integral formulation of the statistical mechanics of quantum many-body systems is described, with the purpose of introducing practical techniques for the simulation of solids. Monte Carlo and molecular dynamics methods for distinguishable quantum particles are presented, with particular attention to the isothermal-isobaric ensemble. Applications of these computational techniques to different types of solids are reviewed, including noble-gas solids (helium and heavier elements), group-IV materials (diamond and elemental semiconductors), and molecular solids (with emphasis on hydrogen and ice). Structural, vibrational, and thermodynamic properties of these materials are discussed. Applications also include point defects in solids (structure and diffusion), as well as nuclear quantum effects in solid surfaces and adsorbates. Different phenomena are discussed, as solid-to-solid and orientational phase transitions, rates of quantum processes, classical-to-quantum crossover, and various finite-temperature anharmonic effects (thermal expansion, isotopic effects, electron-phonon interactions). Nuclear quantum effects are most remarkable in the presence of light atoms, so that especial emphasis is laid on solids containing hydrogen as a constituent element or as an impurity.
Park, Wooram; Liu, Yan; Zhou, Yu; Moses, Matthew; Chirikjian, Gregory S
2008-04-11
A nonholonomic system subjected to external noise from the environment, or internal noise in its own actuators, will evolve in a stochastic manner described by an ensemble of trajectories. This ensemble of trajectories is equivalent to the solution of a Fokker-Planck equation that typically evolves on a Lie group. If the most likely state of such a system is to be estimated, and plans for subsequent motions from the current state are to be made so as to move the system to a desired state with high probability, then modeling how the probability density of the system evolves is critical. Methods for solving Fokker-Planck equations that evolve on Lie groups then become important. Such equations can be solved using the operational properties of group Fourier transforms in which irreducible unitary representation (IUR) matrices play a critical role. Therefore, we develop a simple approach for the numerical approximation of all the IUR matrices for two of the groups of most interest in robotics: the rotation group in three-dimensional space, SO(3), and the Euclidean motion group of the plane, SE(2). This approach uses the exponential mapping from the Lie algebras of these groups, and takes advantage of the sparse nature of the Lie algebra representation matrices. Other techniques for density estimation on groups are also explored. The computed densities are applied in the context of probabilistic path planning for kinematic cart in the plane and flexible needle steering in three-dimensional space. In these examples the injection of artificial noise into the computational models (rather than noise in the actual physical systems) serves as a tool to search the configuration spaces and plan paths. Finally, we illustrate how density estimation problems arise in the characterization of physical noise in orientational sensors such as gyroscopes.
Improved graphite furnace atomizer
Siemer, D.D.
1983-05-18
A graphite furnace atomizer for use in graphite furnace atomic absorption spectroscopy is described wherein the heating elements are affixed near the optical path and away from the point of sample deposition, so that when the sample is volatilized the spectroscopic temperature at the optical path is at least that of the volatilization temperature, whereby analyteconcomitant complex formation is advantageously reduced. The atomizer may be elongated along its axis to increase the distance between the optical path and the sample deposition point. Also, the atomizer may be elongated along the axis of the optical path, whereby its analytical sensitivity is greatly increased.
Sørbye, Sveinung Wergeland; Pedersen, Mette Kristin; Ekeberg, Bente; Williams, Merete E Johansen; Sauer, Torill; Chen, Ying
2017-01-01
The Norwegian Cervical Cancer Screening Program recommends screening every 3 years for women between 25 and 69 years of age. There is a large difference in the percentage of unsatisfactory samples between laboratories that use different brands of liquid-based cytology. We wished to examine if inadequate ThinPrep samples could be satisfactory by processing them with the SurePath protocol. A total of 187 inadequate ThinPrep specimens from the Department of Clinical Pathology at University Hospital of North Norway were sent to Akershus University Hospital for conversion to SurePath medium. Ninety-one (48.7%) were processed through the automated "gynecologic" application for cervix cytology samples, and 96 (51.3%) were processed with the "nongynecological" automatic program. Out of 187 samples that had been unsatisfactory by ThinPrep, 93 (49.7%) were satisfactory after being converted to SurePath. The rate of satisfactory cytology was 36.6% and 62.5% for samples run through the "gynecology" program and "nongynecology" program, respectively. Of the 93 samples that became satisfactory after conversion from ThinPrep to SurePath, 80 (86.0%) were screened as normal while 13 samples (14.0%) were given an abnormal diagnosis, which included 5 atypical squamous cells of undetermined significance, 5 low-grade squamous intraepithelial lesion, 2 atypical glandular cells not otherwise specified, and 1 atypical squamous cells cannot exclude high-grade squamous intraepithelial lesion. A total of 2.1% (4/187) of the women got a diagnosis of cervical intraepithelial neoplasia 2 or higher at a later follow-up. Converting cytology samples from ThinPrep to SurePath processing can reduce the number of unsatisfactory samples. The samples should be run through the "nongynecology" program to ensure an adequate number of cells.
Rare behavior of growth processes via umbrella sampling of trajectories
NASA Astrophysics Data System (ADS)
Klymko, Katherine; Geissler, Phillip L.; Garrahan, Juan P.; Whitelam, Stephen
2018-03-01
We compute probability distributions of trajectory observables for reversible and irreversible growth processes. These results reveal a correspondence between reversible and irreversible processes, at particular points in parameter space, in terms of their typical and atypical trajectories. Thus key features of growth processes can be insensitive to the precise form of the rate constants used to generate them, recalling the insensitivity to microscopic details of certain equilibrium behavior. We obtained these results using a sampling method, inspired by the "s -ensemble" large-deviation formalism, that amounts to umbrella sampling in trajectory space. The method is a simple variant of existing approaches, and applies to ensembles of trajectories controlled by the total number of events. It can be used to determine large-deviation rate functions for trajectory observables in or out of equilibrium.
Transition to collective oscillations in finite Kuramoto ensembles
NASA Astrophysics Data System (ADS)
Peter, Franziska; Pikovsky, Arkady
2018-03-01
We present an alternative approach to finite-size effects around the synchronization transition in the standard Kuramoto model. Our main focus lies on the conditions under which a collective oscillatory mode is well defined. For this purpose, the minimal value of the amplitude of the complex Kuramoto order parameter appears as a proper indicator. The dependence of this minimum on coupling strength varies due to sampling variations and correlates with the sample kurtosis of the natural frequency distribution. The skewness of the frequency sample determines the frequency of the resulting collective mode. The effects of kurtosis and skewness hold in the thermodynamic limit of infinite ensembles. We prove this by integrating a self-consistency equation for the complex Kuramoto order parameter for two families of distributions with controlled kurtosis and skewness, respectively.
Zhu, Guanhua; Liu, Wei; Bao, Chenglong; Tong, Dudu; Ji, Hui; Shen, Zuowei; Yang, Daiwen; Lu, Lanyuan
2018-05-01
The structural variations of multidomain proteins with flexible parts mediate many biological processes, and a structure ensemble can be determined by selecting a weighted combination of representative structures from a simulated structure pool, producing the best fit to experimental constraints such as interatomic distance. In this study, a hybrid structure-based and physics-based atomistic force field with an efficient sampling strategy is adopted to simulate a model di-domain protein against experimental paramagnetic relaxation enhancement (PRE) data that correspond to distance constraints. The molecular dynamics simulations produce a wide range of conformations depicted on a protein energy landscape. Subsequently, a conformational ensemble recovered with low-energy structures and the minimum-size restraint is identified in good agreement with experimental PRE rates, and the result is also supported by chemical shift perturbations and small-angle X-ray scattering data. It is illustrated that the regularizations of energy and ensemble-size prevent an arbitrary interpretation of protein conformations. Moreover, energy is found to serve as a critical control to refine the structure pool and prevent data overfitting, because the absence of energy regularization exposes ensemble construction to the noise from high-energy structures and causes a more ambiguous representation of protein conformations. Finally, we perform structure-ensemble optimizations with a topology-based structure pool, to enhance the understanding on the ensemble results from different sources of pool candidates. © 2018 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Yin, Dong-shan; Gao, Yu-ping; Zhao, Shu-hong
2017-07-01
Millisecond pulsars can generate another type of time scale that is totally independent of the atomic time scale, because the physical mechanisms of the pulsar time scale and the atomic time scale are quite different from each other. Usually the pulsar timing observations are not evenly sampled, and the internals between two data points range from several hours to more than half a month. Further more, these data sets are sparse. All this makes it difficult to generate an ensemble pulsar time scale. Hence, a new algorithm to calculate the ensemble pulsar time scale is proposed. Firstly, a cubic spline interpolation is used to densify the data set, and make the intervals between data points uniform. Then, the Vondrak filter is employed to smooth the data set, and get rid of the high-frequency noises, and finally the weighted average method is adopted to generate the ensemble pulsar time scale. The newly released NANOGRAV (North American Nanohertz Observatory for Gravitational Waves) 9-year data set is used to generate the ensemble pulsar time scale. This data set includes the 9-year observational data of 37 millisecond pulsars observed by the 100-meter Green Bank telescope and the 305-meter Arecibo telescope. It is found that the algorithm used in this paper can reduce effectively the influence caused by the noises in pulsar timing residuals, and improve the long-term stability of the ensemble pulsar time scale. Results indicate that the long-term (> 1 yr) stability of the ensemble pulsar time scale is better than 3.4 × 10-15.
Zhang, Zhe; Schindler, Christina E. M.; Lange, Oliver F.; Zacharias, Martin
2015-01-01
The high-resolution refinement of docked protein-protein complexes can provide valuable structural and mechanistic insight into protein complex formation complementing experiment. Monte Carlo (MC) based approaches are frequently applied to sample putative interaction geometries of proteins including also possible conformational changes of the binding partners. In order to explore efficiency improvements of the MC sampling, several enhanced sampling techniques, including temperature or Hamiltonian replica exchange and well-tempered ensemble approaches, have been combined with the MC method and were evaluated on 20 protein complexes using unbound partner structures. The well-tempered ensemble method combined with a 2-dimensional temperature and Hamiltonian replica exchange scheme (WTE-H-REMC) was identified as the most efficient search strategy. Comparison with prolonged MC searches indicates that the WTE-H-REMC approach requires approximately 5 times fewer MC steps to identify near native docking geometries compared to conventional MC searches. PMID:26053419
Zheng, Jingjing; Truhlar, Donald G
2012-01-01
Complex molecules often have many structures (conformations) of the reactants and the transition states, and these structures may be connected by coupled-mode torsions and pseudorotations; some but not all structures may have hydrogen bonds in the transition state or reagents. A quantitative theory of the reaction rates of complex molecules must take account of these structures, their coupled-mode nature, their qualitatively different character, and the possibility of merging reaction paths at high temperature. We have recently developed a coupled-mode theory called multi-structural variational transition state theory (MS-VTST) and an extension, called multi-path variational transition state theory (MP-VTST), that includes a treatment of the differences in the multi-dimensional tunneling paths and their contributions to the reaction rate. The MP-VTST method was presented for unimolecular reactions in the original paper and has now been extended to bimolecular reactions. The MS-VTST and MP-VTST formulations of variational transition state theory include multi-faceted configuration-space dividing surfaces to define the variational transition state. They occupy an intermediate position between single-conformation variational transition state theory (VTST), which has been used successfully for small molecules, and ensemble-averaged variational transition state theory (EA-VTST), which has been used successfully for enzyme kinetics. The theories are illustrated and compared here by application to three thermal rate constants for reactions of ethanol with hydroxyl radical--reactions with 4, 6, and 14 saddle points.
Core self-evaluations and work engagement: Testing a perception, action, and development path.
Tims, Maria; Akkermans, Jos
2017-01-01
Core self-evaluations (CSE) have predictive value for important work outcomes such as job satisfaction and job performance. However, little is known about the mechanisms that may explain these relationships. The purpose of the present study is to contribute to CSE theory by proposing and subsequently providing a first test of theoretically relevant mediating paths through which CSE may be related to work engagement. Based on approach/avoidance motivation and Job Demands-Resources theory, we examined a perception (via job characteristics), action (via job crafting), and development path (via career competencies). Two independent samples were obtained from employees working in Germany and The Netherlands (N = 303 and N = 404, respectively). When taking all mediators into account, results showed that the perception path represented by autonomy and social support played a minor role in the relationship between CSE and work engagement. Specifically, autonomy did not function as a mediator in both samples while social support played a marginally significant role in the CSE-work engagement relationship in sample 1 and received full support in sample 2. The action path exemplified by job crafting mediated the relationship between CSE and work engagement in both samples. Finally, the development path operationalized with career competencies mediated the relationship between CSE and work engagement in sample 1. The study presents evidence for an action and development path over and above the often tested perception path to explain how CSE is related to work engagement. This is one of the first studies to propose and show that CSE not only influences perceptions but also triggers employee actions and developmental strategies that relate to work engagement.
Core self-evaluations and work engagement: Testing a perception, action, and development path
Akkermans, Jos
2017-01-01
Core self-evaluations (CSE) have predictive value for important work outcomes such as job satisfaction and job performance. However, little is known about the mechanisms that may explain these relationships. The purpose of the present study is to contribute to CSE theory by proposing and subsequently providing a first test of theoretically relevant mediating paths through which CSE may be related to work engagement. Based on approach/avoidance motivation and Job Demands-Resources theory, we examined a perception (via job characteristics), action (via job crafting), and development path (via career competencies). Two independent samples were obtained from employees working in Germany and The Netherlands (N = 303 and N = 404, respectively). When taking all mediators into account, results showed that the perception path represented by autonomy and social support played a minor role in the relationship between CSE and work engagement. Specifically, autonomy did not function as a mediator in both samples while social support played a marginally significant role in the CSE–work engagement relationship in sample 1 and received full support in sample 2. The action path exemplified by job crafting mediated the relationship between CSE and work engagement in both samples. Finally, the development path operationalized with career competencies mediated the relationship between CSE and work engagement in sample 1. The study presents evidence for an action and development path over and above the often tested perception path to explain how CSE is related to work engagement. This is one of the first studies to propose and show that CSE not only influences perceptions but also triggers employee actions and developmental strategies that relate to work engagement. PMID:28787464
Conformational and functional analysis of molecular dynamics trajectories by Self-Organising Maps
2011-01-01
Background Molecular dynamics (MD) simulations are powerful tools to investigate the conformational dynamics of proteins that is often a critical element of their function. Identification of functionally relevant conformations is generally done clustering the large ensemble of structures that are generated. Recently, Self-Organising Maps (SOMs) were reported performing more accurately and providing more consistent results than traditional clustering algorithms in various data mining problems. We present a novel strategy to analyse and compare conformational ensembles of protein domains using a two-level approach that combines SOMs and hierarchical clustering. Results The conformational dynamics of the α-spectrin SH3 protein domain and six single mutants were analysed by MD simulations. The Cα's Cartesian coordinates of conformations sampled in the essential space were used as input data vectors for SOM training, then complete linkage clustering was performed on the SOM prototype vectors. A specific protocol to optimize a SOM for structural ensembles was proposed: the optimal SOM was selected by means of a Taguchi experimental design plan applied to different data sets, and the optimal sampling rate of the MD trajectory was selected. The proposed two-level approach was applied to single trajectories of the SH3 domain independently as well as to groups of them at the same time. The results demonstrated the potential of this approach in the analysis of large ensembles of molecular structures: the possibility of producing a topological mapping of the conformational space in a simple 2D visualisation, as well as of effectively highlighting differences in the conformational dynamics directly related to biological functions. Conclusions The use of a two-level approach combining SOMs and hierarchical clustering for conformational analysis of structural ensembles of proteins was proposed. It can easily be extended to other study cases and to conformational ensembles from other sources. PMID:21569575
Implementation of unsteady sampling procedures for the parallel direct simulation Monte Carlo method
NASA Astrophysics Data System (ADS)
Cave, H. M.; Tseng, K.-C.; Wu, J.-S.; Jermy, M. C.; Huang, J.-C.; Krumdieck, S. P.
2008-06-01
An unsteady sampling routine for a general parallel direct simulation Monte Carlo method called PDSC is introduced, allowing the simulation of time-dependent flow problems in the near continuum range. A post-processing procedure called DSMC rapid ensemble averaging method (DREAM) is developed to improve the statistical scatter in the results while minimising both memory and simulation time. This method builds an ensemble average of repeated runs over small number of sampling intervals prior to the sampling point of interest by restarting the flow using either a Maxwellian distribution based on macroscopic properties for near equilibrium flows (DREAM-I) or output instantaneous particle data obtained by the original unsteady sampling of PDSC for strongly non-equilibrium flows (DREAM-II). The method is validated by simulating shock tube flow and the development of simple Couette flow. Unsteady PDSC is found to accurately predict the flow field in both cases with significantly reduced run-times over single processor code and DREAM greatly reduces the statistical scatter in the results while maintaining accurate particle velocity distributions. Simulations are then conducted of two applications involving the interaction of shocks over wedges. The results of these simulations are compared to experimental data and simulations from the literature where there these are available. In general, it was found that 10 ensembled runs of DREAM processing could reduce the statistical uncertainty in the raw PDSC data by 2.5-3.3 times, based on the limited number of cases in the present study.
NASA Astrophysics Data System (ADS)
Monroe, Jacob I.; Shirts, Michael R.
2014-04-01
Molecular containers such as cucurbit[7]uril (CB7) and the octa-acid (OA) host are ideal simplified model test systems for optimizing and analyzing methods for computing free energies of binding intended for use with biologically relevant protein-ligand complexes. To this end, we have performed initially blind free energy calculations to determine the free energies of binding for ligands of both the CB7 and OA hosts. A subset of the selected guest molecules were those included in the SAMPL4 prediction challenge. Using expanded ensemble simulations in the dimension of coupling host-guest intermolecular interactions, we are able to show that our estimates in most cases can be demonstrated to fully converge and that the errors in our estimates are due almost entirely to the assigned force field parameters and the choice of environmental conditions used to model experiment. We confirm the convergence through the use of alternative simulation methodologies and thermodynamic pathways, analyzing sampled conformations, and directly observing changes of the free energy with respect to simulation time. Our results demonstrate the benefits of enhanced sampling of multiple local free energy minima made possible by the use of expanded ensemble molecular dynamics and may indicate the presence of significant problems with current transferable force fields for organic molecules when used for calculating binding affinities, especially in non-protein chemistries.
Monroe, Jacob I; Shirts, Michael R
2014-04-01
Molecular containers such as cucurbit[7]uril (CB7) and the octa-acid (OA) host are ideal simplified model test systems for optimizing and analyzing methods for computing free energies of binding intended for use with biologically relevant protein-ligand complexes. To this end, we have performed initially blind free energy calculations to determine the free energies of binding for ligands of both the CB7 and OA hosts. A subset of the selected guest molecules were those included in the SAMPL4 prediction challenge. Using expanded ensemble simulations in the dimension of coupling host-guest intermolecular interactions, we are able to show that our estimates in most cases can be demonstrated to fully converge and that the errors in our estimates are due almost entirely to the assigned force field parameters and the choice of environmental conditions used to model experiment. We confirm the convergence through the use of alternative simulation methodologies and thermodynamic pathways, analyzing sampled conformations, and directly observing changes of the free energy with respect to simulation time. Our results demonstrate the benefits of enhanced sampling of multiple local free energy minima made possible by the use of expanded ensemble molecular dynamics and may indicate the presence of significant problems with current transferable force fields for organic molecules when used for calculating binding affinities, especially in non-protein chemistries.
Tunable ion-photon entanglement in an optical cavity.
Stute, A; Casabone, B; Schindler, P; Monz, T; Schmidt, P O; Brandstätter, B; Northup, T E; Blatt, R
2012-05-23
Proposed quantum networks require both a quantum interface between light and matter and the coherent control of quantum states. A quantum interface can be realized by entangling the state of a single photon with the state of an atomic or solid-state quantum memory, as demonstrated in recent experiments with trapped ions, neutral atoms, atomic ensembles and nitrogen-vacancy spins. The entangling interaction couples an initial quantum memory state to two possible light-matter states, and the atomic level structure of the memory determines the available coupling paths. In previous work, the transition parameters of these paths determined the phase and amplitude of the final entangled state, unless the memory was initially prepared in a superposition state (a step that requires coherent control). Here we report fully tunable entanglement between a single (40)Ca(+) ion and the polarization state of a single photon within an optical resonator. Our method, based on a bichromatic, cavity-mediated Raman transition, allows us to select two coupling paths and adjust their relative phase and amplitude. The cavity setting enables intrinsically deterministic, high-fidelity generation of any two-qubit entangled state. This approach is applicable to a broad range of candidate systems and thus is a promising method for distributing information within quantum networks.
ANALYSIS OF SAMPLING TECHNIQUES FOR IMBALANCED DATA: AN N=648 ADNI STUDY
Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin; Thompson, Paul M.; Ye, Jieping
2013-01-01
Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer’s disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and under sampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1). a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2). sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results. PMID:24176869
Fine tuning classical and quantum molecular dynamics using a generalized Langevin equation
NASA Astrophysics Data System (ADS)
Rossi, Mariana; Kapil, Venkat; Ceriotti, Michele
2018-03-01
Generalized Langevin Equation (GLE) thermostats have been used very effectively as a tool to manipulate and optimize the sampling of thermodynamic ensembles and the associated static properties. Here we show that a similar, exquisite level of control can be achieved for the dynamical properties computed from thermostatted trajectories. We develop quantitative measures of the disturbance induced by the GLE to the Hamiltonian dynamics of a harmonic oscillator, and show that these analytical results accurately predict the behavior of strongly anharmonic systems. We also show that it is possible to correct, to a significant extent, the effects of the GLE term onto the corresponding microcanonical dynamics, which puts on more solid grounds the use of non-equilibrium Langevin dynamics to approximate quantum nuclear effects and could help improve the prediction of dynamical quantities from techniques that use a Langevin term to stabilize dynamics. Finally we address the use of thermostats in the context of approximate path-integral-based models of quantum nuclear dynamics. We demonstrate that a custom-tailored GLE can alleviate some of the artifacts associated with these techniques, improving the quality of results for the modeling of vibrational dynamics of molecules, liquids, and solids.
Langevin Dynamics with Spatial Correlations as a Model for Electron-Phonon Coupling
NASA Astrophysics Data System (ADS)
Tamm, A.; Caro, M.; Caro, A.; Samolyuk, G.; Klintenberg, M.; Correa, A. A.
2018-05-01
Stochastic Langevin dynamics has been traditionally used as a tool to describe nonequilibrium processes. When utilized in systems with collective modes, traditional Langevin dynamics relaxes all modes indiscriminately, regardless of their wavelength. We propose a generalization of Langevin dynamics that can capture a differential coupling between collective modes and the bath, by introducing spatial correlations in the random forces. This allows modeling the electronic subsystem in a metal as a generalized Langevin bath endowed with a concept of locality, greatly improving the capabilities of the two-temperature model. The specific form proposed here for the spatial correlations produces a physical wave-vector and polarization dependency of the relaxation produced by the electron-phonon coupling in a solid. We show that the resulting model can be used for describing the path to equilibration of ions and electrons and also as a thermostat to sample the equilibrium canonical ensemble. By extension, the family of models presented here can be applied in general to any dense system, solids, alloys, and dense plasmas. As an example, we apply the model to study the nonequilibrium dynamics of an electron-ion two-temperature Ni crystal.
MERCURY IN CRUDE OIL PROCESSED IN THE UNITED STATES (2004)
The mean and range of concentrations of mercury in crude oil processed in the U.S. were investigated using two analytical methods. The sample ensemble consisted of 329 samples from 170 separate crude oil streams that are processed by U.S. refineries. Samples were retrieved imme...
NASA Astrophysics Data System (ADS)
Hayata, Tomoya; Hidaka, Yoshimasa; Noumi, Toshifumi; Hongo, Masaru
2015-09-01
We derive relativistic hydrodynamics from quantum field theories by assuming that the density operator is given by a local Gibbs distribution at initial time. We decompose the energy-momentum tensor and particle current into nondissipative and dissipative parts, and analyze their time evolution in detail. Performing the path-integral formulation of the local Gibbs distribution, we microscopically derive the generating functional for the nondissipative hydrodynamics. We also construct a basis to study dissipative corrections. In particular, we derive the first-order dissipative hydrodynamic equations without a choice of frame such as the Landau-Lifshitz or Eckart frame.
Coherent transmission of an ultrasonic shock wave through a multiple scattering medium.
Viard, Nicolas; Giammarinaro, Bruno; Derode, Arnaud; Barrière, Christophe
2013-08-01
We report measurements of the transmitted coherent (ensemble-averaged) wave resulting from the interaction of an ultrasonic shock wave with a two-dimensional random medium. Despite multiple scattering, the coherent waveform clearly shows the steepening that is typical of nonlinear harmonic generation. This is taken advantage of to measure the elastic mean free path and group velocity over a broad frequency range (2-15 MHz) in only one experiment. Experimental results are found to be in good agreement with a linear theoretical model taking into account spatial correlations between scatterers. These results show that nonlinearity and multiple scattering are both present, yet uncoupled.
Diffusion of strongly magnetized cosmic ray particles in a turbulent medium
NASA Technical Reports Server (NTRS)
Ptuskin, V. S.
1985-01-01
Cosmic ray (CR) propagation in a turbulent medium is usually considered in the diffusion approximation. Here, the diffusion equation is obtained for strongly magnetized particles in the general form. The influence of a large-scale random magnetic field on CR propagation in interstellar medium is discussed. Cosmic rays are assumed to propagate in a medium with a regular field H and an ensemble of random MHD waves. The energy density of waves on scales smaller than the free path 1 of CR particles is small. The collision integral of the general form which describes interaction between relativistic particles and waves in the quasilinear approximation is used.
Path integrals, the ABL rule and the three-box paradox
NASA Astrophysics Data System (ADS)
Sokolovski, D.; Puerto Giménez, I.; Sala Mayato, R.
2008-10-01
The three-box problem is analysed in terms of virtual pathways, interference between which is destroyed by a number of intermediate measurements. The Aharonov-Bergmann-Lebowitz (ABL) rule is shown to be a particular case of Feynman's recipe for assigning probabilities to exclusive alternatives. The ‘paradoxical’ features of the three box case arise in an attempt to attribute, in contradiction to the uncertainty principle, properties pertaining to different ensembles produced by different intermediate measurements to the same particle. The effect can be mimicked by a classical system, provided an observation is made to perturb the system in a non-local manner.
Treating Sample Covariances for Use in Strongly Coupled Atmosphere-Ocean Data Assimilation
NASA Astrophysics Data System (ADS)
Smith, Polly J.; Lawless, Amos S.; Nichols, Nancy K.
2018-01-01
Strongly coupled data assimilation requires cross-domain forecast error covariances; information from ensembles can be used, but limited sampling means that ensemble derived error covariances are routinely rank deficient and/or ill-conditioned and marred by noise. Thus, they require modification before they can be incorporated into a standard assimilation framework. Here we compare methods for improving the rank and conditioning of multivariate sample error covariance matrices for coupled atmosphere-ocean data assimilation. The first method, reconditioning, alters the matrix eigenvalues directly; this preserves the correlation structures but does not remove sampling noise. We show that it is better to recondition the correlation matrix rather than the covariance matrix as this prevents small but dynamically important modes from being lost. The second method, model state-space localization via the Schur product, effectively removes sample noise but can dampen small cross-correlation signals. A combination that exploits the merits of each is found to offer an effective alternative.
Hopkins, Carl
2011-05-01
In architectural acoustics, noise control and environmental noise, there are often steady-state signals for which it is necessary to measure the spatial average, sound pressure level inside rooms. This requires using fixed microphone positions, mechanical scanning devices, or manual scanning. In comparison with mechanical scanning devices, the human body allows manual scanning to trace out complex geometrical paths in three-dimensional space. To determine the efficacy of manual scanning paths in terms of an equivalent number of uncorrelated samples, an analytical approach is solved numerically. The benchmark used to assess these paths is a minimum of five uncorrelated fixed microphone positions at frequencies above 200 Hz. For paths involving an operator walking across the room, potential problems exist with walking noise and non-uniform scanning speeds. Hence, paths are considered based on a fixed standing position or rotation of the body about a fixed point. In empty rooms, it is shown that a circle, helix, or cylindrical-type path satisfy the benchmark requirement with the latter two paths being highly efficient at generating large number of uncorrelated samples. In furnished rooms where there is limited space for the operator to move, an efficient path comprises three semicircles with 45°-60° separations.
Sampling the isothermal-isobaric ensemble by Langevin dynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gao, Xingyu; Institute of Applied Physics and Computational Mathematics, Fenghao East Road 2, Beijing 100094; CAEP Software Center for High Performance Numerical Simulation, Huayuan Road 6, Beijing 100088
2016-03-28
We present a new method of conducting fully flexible-cell molecular dynamics simulation in isothermal-isobaric ensemble based on Langevin equations of motion. The stochastic coupling to all particle and cell degrees of freedoms is introduced in a correct way, in the sense that the stationary configurational distribution is proved to be consistent with that of the isothermal-isobaric ensemble. In order to apply the proposed method in computer simulations, a second order symmetric numerical integration scheme is developed by Trotter’s splitting of the single-step propagator. Moreover, a practical guide of choosing working parameters is suggested for user specified thermo- and baro-coupling timemore » scales. The method and software implementation are carefully validated by a numerical example.« less
Gibbs Ensembles for Nearly Compatible and Incompatible Conditional Models
Chen, Shyh-Huei; Wang, Yuchung J.
2010-01-01
Gibbs sampler has been used exclusively for compatible conditionals that converge to a unique invariant joint distribution. However, conditional models are not always compatible. In this paper, a Gibbs sampling-based approach — Gibbs ensemble —is proposed to search for a joint distribution that deviates least from a prescribed set of conditional distributions. The algorithm can be easily scalable such that it can handle large data sets of high dimensionality. Using simulated data, we show that the proposed approach provides joint distributions that are less discrepant from the incompatible conditionals than those obtained by other methods discussed in the literature. The ensemble approach is also applied to a data set regarding geno-polymorphism and response to chemotherapy in patients with metastatic colorectal PMID:21286232
Lahiri, A; Roy, Abhijit Guha; Sheet, Debdoot; Biswas, Prabir Kumar
2016-08-01
Automated segmentation of retinal blood vessels in label-free fundus images entails a pivotal role in computed aided diagnosis of ophthalmic pathologies, viz., diabetic retinopathy, hypertensive disorders and cardiovascular diseases. The challenge remains active in medical image analysis research due to varied distribution of blood vessels, which manifest variations in their dimensions of physical appearance against a noisy background. In this paper we formulate the segmentation challenge as a classification task. Specifically, we employ unsupervised hierarchical feature learning using ensemble of two level of sparsely trained denoised stacked autoencoder. First level training with bootstrap samples ensures decoupling and second level ensemble formed by different network architectures ensures architectural revision. We show that ensemble training of auto-encoders fosters diversity in learning dictionary of visual kernels for vessel segmentation. SoftMax classifier is used for fine tuning each member autoencoder and multiple strategies are explored for 2-level fusion of ensemble members. On DRIVE dataset, we achieve maximum average accuracy of 95.33% with an impressively low standard deviation of 0.003 and Kappa agreement coefficient of 0.708. Comparison with other major algorithms substantiates the high efficacy of our model.
MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering
Kim, Eun-Youn; Kim, Seon-Young; Ashlock, Daniel; Nam, Dougu
2009-01-01
Background Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. Results We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. Conclusion The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors. PMID:19698124
NASA Astrophysics Data System (ADS)
Li, Ning; McLaughlin, Dennis; Kinzelbach, Wolfgang; Li, WenPeng; Dong, XinGuang
2015-10-01
Model uncertainty needs to be quantified to provide objective assessments of the reliability of model predictions and of the risk associated with management decisions that rely on these predictions. This is particularly true in water resource studies that depend on model-based assessments of alternative management strategies. In recent decades, Bayesian data assimilation methods have been widely used in hydrology to assess uncertain model parameters and predictions. In this case study, a particular data assimilation algorithm, the Ensemble Smoother with Multiple Data Assimilation (ESMDA) (Emerick and Reynolds, 2012), is used to derive posterior samples of uncertain model parameters and forecasts for a distributed hydrological model of Yanqi basin, China. This model is constructed using MIKESHE/MIKE11software, which provides for coupling between surface and subsurface processes (DHI, 2011a-d). The random samples in the posterior parameter ensemble are obtained by using measurements to update 50 prior parameter samples generated with a Latin Hypercube Sampling (LHS) procedure. The posterior forecast samples are obtained from model runs that use the corresponding posterior parameter samples. Two iterative sample update methods are considered: one based on an a perturbed observation Kalman filter update and one based on a square root Kalman filter update. These alternatives give nearly the same results and converge in only two iterations. The uncertain parameters considered include hydraulic conductivities, drainage and river leakage factors, van Genuchten soil property parameters, and dispersion coefficients. The results show that the uncertainty in many of the parameters is reduced during the smoother updating process, reflecting information obtained from the observations. Some of the parameters are insensitive and do not benefit from measurement information. The correlation coefficients among certain parameters increase in each iteration, although they generally stay below 0.50.
The role of model dynamics in ensemble Kalman filter performance for chaotic systems
Ng, G.-H.C.; McLaughlin, D.; Entekhabi, D.; Ahanin, A.
2011-01-01
The ensemble Kalman filter (EnKF) is susceptible to losing track of observations, or 'diverging', when applied to large chaotic systems such as atmospheric and ocean models. Past studies have demonstrated the adverse impact of sampling error during the filter's update step. We examine how system dynamics affect EnKF performance, and whether the absence of certain dynamic features in the ensemble may lead to divergence. The EnKF is applied to a simple chaotic model, and ensembles are checked against singular vectors of the tangent linear model, corresponding to short-term growth and Lyapunov vectors, corresponding to long-term growth. Results show that the ensemble strongly aligns itself with the subspace spanned by unstable Lyapunov vectors. Furthermore, the filter avoids divergence only if the full linearized long-term unstable subspace is spanned. However, short-term dynamics also become important as non-linearity in the system increases. Non-linear movement prevents errors in the long-term stable subspace from decaying indefinitely. If these errors then undergo linear intermittent growth, a small ensemble may fail to properly represent all important modes, causing filter divergence. A combination of long and short-term growth dynamics are thus critical to EnKF performance. These findings can help in developing practical robust filters based on model dynamics. ?? 2011 The Authors Tellus A ?? 2011 John Wiley & Sons A/S.
NASA Astrophysics Data System (ADS)
Wang, S.; Huang, G. H.; Baetz, B. W.; Huang, W.
2015-11-01
This paper presents a polynomial chaos ensemble hydrologic prediction system (PCEHPS) for an efficient and robust uncertainty assessment of model parameters and predictions, in which possibilistic reasoning is infused into probabilistic parameter inference with simultaneous consideration of randomness and fuzziness. The PCEHPS is developed through a two-stage factorial polynomial chaos expansion (PCE) framework, which consists of an ensemble of PCEs to approximate the behavior of the hydrologic model, significantly speeding up the exhaustive sampling of the parameter space. Multiple hypothesis testing is then conducted to construct an ensemble of reduced-dimensionality PCEs with only the most influential terms, which is meaningful for achieving uncertainty reduction and further acceleration of parameter inference. The PCEHPS is applied to the Xiangxi River watershed in China to demonstrate its validity and applicability. A detailed comparison between the HYMOD hydrologic model, the ensemble of PCEs, and the ensemble of reduced PCEs is performed in terms of accuracy and efficiency. Results reveal temporal and spatial variations in parameter sensitivities due to the dynamic behavior of hydrologic systems, and the effects (magnitude and direction) of parametric interactions depending on different hydrological metrics. The case study demonstrates that the PCEHPS is capable not only of capturing both expert knowledge and probabilistic information in the calibration process, but also of implementing an acceleration of more than 10 times faster than the hydrologic model without compromising the predictive accuracy.
Selecting climate simulations for impact studies based on multivariate patterns of climate change.
Mendlik, Thomas; Gobiet, Andreas
In climate change impact research it is crucial to carefully select the meteorological input for impact models. We present a method for model selection that enables the user to shrink the ensemble to a few representative members, conserving the model spread and accounting for model similarity. This is done in three steps: First, using principal component analysis for a multitude of meteorological parameters, to find common patterns of climate change within the multi-model ensemble. Second, detecting model similarities with regard to these multivariate patterns using cluster analysis. And third, sampling models from each cluster, to generate a subset of representative simulations. We present an application based on the ENSEMBLES regional multi-model ensemble with the aim to provide input for a variety of climate impact studies. We find that the two most dominant patterns of climate change relate to temperature and humidity patterns. The ensemble can be reduced from 25 to 5 simulations while still maintaining its essential characteristics. Having such a representative subset of simulations reduces computational costs for climate impact modeling and enhances the quality of the ensemble at the same time, as it prevents double-counting of dependent simulations that would lead to biased statistics. The online version of this article (doi:10.1007/s10584-015-1582-0) contains supplementary material, which is available to authorized users.
Chen, Zhiru; Hong, Wenxue
2016-02-01
Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier.
NASA Astrophysics Data System (ADS)
Reisner, J. M.; Dubey, M. K.
2010-12-01
To both quantify and reduce uncertainty in ice activation parameterizations for stratus clouds occurring in the temperature range between -5 to -10 C ensemble simulations of an ISDAC golden case have been conducted. To formulate the ensemble, three parameters found within an ice activation model have been sampled using a Latin hypercube technique over a parameter range that induces large variability in both number and mass of ice. The ice activation model is contained within a Lagrangian cloud model that simulates particle number as a function of radius for cloud ice, snow, graupel, cloud, and rain particles. A unique aspect of this model is that it produces very low levels of numerical diffusion that enable the model to accurately resolve the sharp cloud edges associated with the ISDAC stratus deck. Another important aspect of the model is that near the cloud edges the number of particles can be significantly increased to reduce sampling errors and accurately resolve physical processes such as collision-coalescence that occur in this region. Thus, given these relatively low numerical errors, as compared to traditional bin models, the sensitivity of a stratus deck to changes in parameters found within the activation model can be examined without fear of numerical contamination. Likewise, once the ensemble has been completed, ISDAC observations can be incorporated into a Kalman filter to optimally estimate the ice activation parameters and reduce overall model uncertainty. Hence, this work will highlight the ability of an ensemble Kalman filter system coupled to a highly accurate numerical model to estimate important parameters found within microphysical parameterizations containing high uncertainty.
Ensemble Feature Learning of Genomic Data Using Support Vector Machine
Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R.; Braytee, Ali; Kennedy, Paul J.
2016-01-01
The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. PMID:27304923
Equilibrium Ensembles for Insulin Folding from Bias-Exchange Metadynamics.
Singh, Richa; Bansal, Rohit; Rathore, Anurag Singh; Goel, Gaurav
2017-04-25
Earliest events in the aggregation process, such as single molecule reconfiguration, are extremely important and the most difficult to characterize in experiments. To this end, we have used well-tempered bias exchange metadynamics simulations to determine the equilibrium ensembles of an insulin molecule under amyloidogenic conditions of low pH and high temperature. A bin-based clustering method that uses statistics accumulated in bias exchange metadynamics trajectories was employed to construct a detailed thermodynamic and kinetic model of insulin folding. The highest lifetime, lowest free-energy ensemble identified consisted of native conformations adopted by a folded insulin monomer in solution, namely, the R-, the R f -, and the T-states of insulin. The lowest free-energy structure had a root mean square deviation of only 0.15 nm from native x-ray structure. The second longest-lived metastable state was an unfolded, compact monomer with little similarity to the native structure. We have identified three additional long-lived, metastable states from the bin-based model. We then carried out an exhaustive structural characterization of metastable states on the basis of tertiary contact maps and per-residue accessible surface areas. We have also determined the lowest free-energy path between two longest-lived metastable states and confirm earlier findings of non-two-state folding for insulin through a folding intermediate. The ensemble containing the monomeric intermediate retained 58% of native hydrophobic contacts, however, accompanied by a complete loss of native secondary structure. We have discussed the relative importance of nativelike versus nonnative tertiary contacts for the folding transition. We also provide a simple measure to determine the importance of an individual residue for folding transition. Finally, we have compared and contrasted this intermediate with experimental data obtained in spectroscopic, crystallographic, and calorimetric measurements during early stages of insulin aggregation. We have also determined stability of monomeric insulin by incubation at a very low concentration to isolate protein-protein interaction effects. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.
The Ensemble Space Weather Modeling System (eSWMS): Status, Capabilities and Challenges
NASA Astrophysics Data System (ADS)
Fry, C. D.; Eccles, J. V.; Reich, J. P.
2010-12-01
Marking a milestone in space weather forecasting, the Space Weather Modeling System (SWMS) successfully completed validation testing in advance of operational testing at Air Force Weather Agency’s primary space weather production center. This is the first coupling of stand-alone, physics-based space weather models that are currently in operations at AFWA supporting the warfighter. Significant development effort went into ensuring the component models were portable and scalable while maintaining consistent results across diverse high performance computing platforms. Coupling was accomplished under the Earth System Modeling Framework (ESMF). The coupled space weather models are the Hakamada-Akasofu-Fry version 2 (HAFv2) solar wind model and GAIM1, the ionospheric forecast component of the Global Assimilation of Ionospheric Measurements (GAIM) model. The SWMS was developed by team members from AFWA, Explorations Physics International, Inc. (EXPI) and Space Environment Corporation (SEC). The successful development of the SWMS provides new capabilities beyond enabling extended lead-time, data-driven ionospheric forecasts. These include ingesting diverse data sets at higher resolution, incorporating denser computational grids at finer time steps, and performing probability-based ensemble forecasts. Work of the SWMS development team now focuses on implementing the ensemble-based probability forecast capability by feeding multiple scenarios of 5 days of solar wind forecasts to the GAIM1 model based on the variation of the input fields to the HAFv2 model. The ensemble SWMS (eSWMS) will provide the most-likely space weather scenario with uncertainty estimates for important forecast fields. The eSWMS will allow DoD mission planners to consider the effects of space weather on their systems with more advance warning than is currently possible. The payoff is enhanced, tailored support to the warfighter with improved capabilities, such as point-to-point HF propagation forecasts, single-frequency GPS error corrections, and high cadence, high-resolution Space Situational Awareness (SSA) products. We present the current status of eSWMS, its capabilities, limitations and path of transition to operational use.
NASA Astrophysics Data System (ADS)
Resseguier, V.; Memin, E.; Chapron, B.; Fox-Kemper, B.
2017-12-01
In order to better observe and predict geophysical flows, ensemble-based data assimilation methods are of high importance. In such methods, an ensemble of random realizations represents the variety of the simulated flow's likely behaviors. For this purpose, randomness needs to be introduced in a suitable way and physically-based stochastic subgrid parametrizations are promising paths. This talk will propose a new kind of such a parametrization referred to as modeling under location uncertainty. The fluid velocity is decomposed into a resolved large-scale component and an aliased small-scale one. The first component is possibly random but time-correlated whereas the second is white-in-time but spatially-correlated and possibly inhomogeneous and anisotropic. With such a velocity, the material derivative of any - possibly active - tracer is modified. Three new terms appear: a correction of the large-scale advection, a multiplicative noise and a possibly heterogeneous and anisotropic diffusion. This parameterization naturally ensures attractive properties such as energy conservation for each realization. Additionally, this stochastic material derivative and the associated Reynolds' transport theorem offer a systematic method to derive stochastic models. In particular, we will discuss the consequences of the Quasi-Geostrophic assumptions in our framework. Depending on the turbulence amount, different models with different physical behaviors are obtained. Under strong turbulence assumptions, a simplified diagnosis of frontolysis and frontogenesis at the surface of the ocean is possible in this framework. A Surface Quasi-Geostrophic (SQG) model with a weaker noise influence has also been simulated. A single realization better represents small scales than a deterministic SQG model at the same resolution. Moreover, an ensemble accurately predicts extreme events, bifurcations as well as the amplitudes and the positions of the simulation errors. Figure 1 highlights this last result and compares it to the strong error underestimation of an ensemble simulated from the deterministic dynamic with random initial conditions.
A Flexible Approach for the Statistical Visualization of Ensemble Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Potter, K.; Wilson, A.; Bremer, P.
2009-09-29
Scientists are increasingly moving towards ensemble data sets to explore relationships present in dynamic systems. Ensemble data sets combine spatio-temporal simulation results generated using multiple numerical models, sampled input conditions and perturbed parameters. While ensemble data sets are a powerful tool for mitigating uncertainty, they pose significant visualization and analysis challenges due to their complexity. We present a collection of overview and statistical displays linked through a high level of interactivity to provide a framework for gaining key scientific insight into the distribution of the simulation results as well as the uncertainty associated with the data. In contrast to methodsmore » that present large amounts of diverse information in a single display, we argue that combining multiple linked statistical displays yields a clearer presentation of the data and facilitates a greater level of visual data analysis. We demonstrate this approach using driving problems from climate modeling and meteorology and discuss generalizations to other fields.« less
Toussaint, Renaud; Pride, Steven R
2002-09-01
This is the first of a series of three articles that treats fracture localization as a critical phenomenon. This first article establishes a statistical mechanics based on ensemble averages when fluctuations through time play no role in defining the ensemble. Ensembles are obtained by dividing a huge rock sample into many mesoscopic volumes. Because rocks are a disordered collection of grains in cohesive contact, we expect that once shear strain is applied and cracks begin to arrive in the system, the mesoscopic volumes will have a wide distribution of different crack states. These mesoscopic volumes are the members of our ensembles. We determine the probability of observing a mesoscopic volume to be in a given crack state by maximizing Shannon's measure of the emergent-crack disorder subject to constraints coming from the energy balance of brittle fracture. The laws of thermodynamics, the partition function, and the quantification of temperature are obtained for such cracking systems.
Measurement-induced entanglement for excitation stored in remote atomic ensembles.
Chou, C W; de Riedmatten, H; Felinto, D; Polyakov, S V; van Enk, S J; Kimble, H J
2005-12-08
A critical requirement for diverse applications in quantum information science is the capability to disseminate quantum resources over complex quantum networks. For example, the coherent distribution of entangled quantum states together with quantum memory (for storing the states) can enable scalable architectures for quantum computation, communication and metrology. Here we report observations of entanglement between two atomic ensembles located in distinct, spatially separated set-ups. Quantum interference in the detection of a photon emitted by one of the samples projects the otherwise independent ensembles into an entangled state with one joint excitation stored remotely in 10(5) atoms at each site. After a programmable delay, we confirm entanglement by mapping the state of the atoms to optical fields and measuring mutual coherences and photon statistics for these fields. We thereby determine a quantitative lower bound for the entanglement of the joint state of the ensembles. Our observations represent significant progress in the ability to distribute and store entangled quantum states.
Ensembles vs. information theory: supporting science under uncertainty
NASA Astrophysics Data System (ADS)
Nearing, Grey S.; Gupta, Hoshin V.
2018-05-01
Multi-model ensembles are one of the most common ways to deal with epistemic uncertainty in hydrology. This is a problem because there is no known way to sample models such that the resulting ensemble admits a measure that has any systematic (i.e., asymptotic, bounded, or consistent) relationship with uncertainty. Multi-model ensembles are effectively sensitivity analyses and cannot - even partially - quantify uncertainty. One consequence of this is that multi-model approaches cannot support a consistent scientific method - in particular, multi-model approaches yield unbounded errors in inference. In contrast, information theory supports a coherent hypothesis test that is robust to (i.e., bounded under) arbitrary epistemic uncertainty. This paper may be understood as advocating a procedure for hypothesis testing that does not require quantifying uncertainty, but is coherent and reliable (i.e., bounded) in the presence of arbitrary (unknown and unknowable) uncertainty. We conclude by offering some suggestions about how this proposed philosophy of science suggests new ways to conceptualize and construct simulation models of complex, dynamical systems.
Systems and methods for analyzing liquids under vacuum
Yu, Xiao-Ying; Yang, Li; Cowin, James P.; Iedema, Martin J.; Zhu, Zihua
2013-10-15
Systems and methods for supporting a liquid against a vacuum pressure in a chamber can enable analysis of the liquid surface using vacuum-based chemical analysis instruments. No electrical or fluid connections are required to pass through the chamber walls. The systems can include a reservoir, a pump, and a liquid flow path. The reservoir contains a liquid-phase sample. The pump drives flow of the sample from the reservoir, through the liquid flow path, and back to the reservoir. The flow of the sample is not substantially driven by a differential between pressures inside and outside of the liquid flow path. An aperture in the liquid flow path exposes a stable portion of the liquid-phase sample to the vacuum pressure within the chamber. The radius, or size, of the aperture is less than or equal to a critical value required to support a meniscus of the liquid-phase sample by surface tension.
Miklós, István; Darling, Aaron E
2009-06-22
Inversions are among the most common mutations acting on the order and orientation of genes in a genome, and polynomial-time algorithms exist to obtain a minimal length series of inversions that transform one genome arrangement to another. However, the minimum length series of inversions (the optimal sorting path) is often not unique as many such optimal sorting paths exist. If we assume that all optimal sorting paths are equally likely, then statistical inference on genome arrangement history must account for all such sorting paths and not just a single estimate. No deterministic polynomial algorithm is known to count the number of optimal sorting paths nor sample from the uniform distribution of optimal sorting paths. Here, we propose a stochastic method that uniformly samples the set of all optimal sorting paths. Our method uses a novel formulation of parallel Markov chain Monte Carlo. In practice, our method can quickly estimate the total number of optimal sorting paths. We introduce a variant of our approach in which short inversions are modeled to be more likely, and we show how the method can be used to estimate the distribution of inversion lengths and breakpoint usage in pathogenic Yersinia pestis. The proposed method has been implemented in a program called "MC4Inversion." We draw comparison of MC4Inversion to the sampler implemented in BADGER and a previously described importance sampling (IS) technique. We find that on high-divergence data sets, MC4Inversion finds more optimal sorting paths per second than BADGER and the IS technique and simultaneously avoids bias inherent in the IS technique.
Multiensemble Markov models of molecular thermodynamics and kinetics.
Wu, Hao; Paul, Fabian; Wehmeyer, Christoph; Noé, Frank
2016-06-07
We introduce the general transition-based reweighting analysis method (TRAM), a statistically optimal approach to integrate both unbiased and biased molecular dynamics simulations, such as umbrella sampling or replica exchange. TRAM estimates a multiensemble Markov model (MEMM) with full thermodynamic and kinetic information at all ensembles. The approach combines the benefits of Markov state models-clustering of high-dimensional spaces and modeling of complex many-state systems-with those of the multistate Bennett acceptance ratio of exploiting biased or high-temperature ensembles to accelerate rare-event sampling. TRAM does not depend on any rate model in addition to the widely used Markov state model approximation, but uses only fundamental relations such as detailed balance and binless reweighting of configurations between ensembles. Previous methods, including the multistate Bennett acceptance ratio, discrete TRAM, and Markov state models are special cases and can be derived from the TRAM equations. TRAM is demonstrated by efficiently computing MEMMs in cases where other estimators break down, including the full thermodynamics and rare-event kinetics from high-dimensional simulation data of an all-atom protein-ligand binding model.
Multiensemble Markov models of molecular thermodynamics and kinetics
Wu, Hao; Paul, Fabian; Noé, Frank
2016-01-01
We introduce the general transition-based reweighting analysis method (TRAM), a statistically optimal approach to integrate both unbiased and biased molecular dynamics simulations, such as umbrella sampling or replica exchange. TRAM estimates a multiensemble Markov model (MEMM) with full thermodynamic and kinetic information at all ensembles. The approach combines the benefits of Markov state models—clustering of high-dimensional spaces and modeling of complex many-state systems—with those of the multistate Bennett acceptance ratio of exploiting biased or high-temperature ensembles to accelerate rare-event sampling. TRAM does not depend on any rate model in addition to the widely used Markov state model approximation, but uses only fundamental relations such as detailed balance and binless reweighting of configurations between ensembles. Previous methods, including the multistate Bennett acceptance ratio, discrete TRAM, and Markov state models are special cases and can be derived from the TRAM equations. TRAM is demonstrated by efficiently computing MEMMs in cases where other estimators break down, including the full thermodynamics and rare-event kinetics from high-dimensional simulation data of an all-atom protein–ligand binding model. PMID:27226302
Ensembles of radial basis function networks for spectroscopic detection of cervical precancer
NASA Technical Reports Server (NTRS)
Tumer, K.; Ramanujam, N.; Ghosh, J.; Richards-Kortum, R.
1998-01-01
The mortality related to cervical cancer can be substantially reduced through early detection and treatment. However, current detection techniques, such as Pap smear and colposcopy, fail to achieve a concurrently high sensitivity and specificity. In vivo fluorescence spectroscopy is a technique which quickly, noninvasively and quantitatively probes the biochemical and morphological changes that occur in precancerous tissue. A multivariate statistical algorithm was used to extract clinically useful information from tissue spectra acquired from 361 cervical sites from 95 patients at 337-, 380-, and 460-nm excitation wavelengths. The multivariate statistical analysis was also employed to reduce the number of fluorescence excitation-emission wavelength pairs required to discriminate healthy tissue samples from precancerous tissue samples. The use of connectionist methods such as multilayered perceptrons, radial basis function (RBF) networks, and ensembles of such networks was investigated. RBF ensemble algorithms based on fluorescence spectra potentially provide automated and near real-time implementation of precancer detection in the hands of nonexperts. The results are more reliable, direct, and accurate than those achieved by either human experts or multivariate statistical algorithms.
Wright, James T.
1986-01-01
A bilateral circuit is operable for transmitting signals in two directions without generation of ringing due to feedback caused by the insertion of the circuit. The circuit may include gain for each of the signals to provide a bidirectional amplifier. The signals are passed through two separate paths, with a unidirectional amplifier in each path. A controlled sampling device is provided in each path for sampling the two signals. Any feedback loop between the two signals is disrupted by providing a phase displacement between the control signals for the two sampling devices.
Wright, J.T.
1984-02-02
A bilateral circuit is operable for transmitting signals in two directions without generation of ringing due to feedback caused by the insertion of the circuit. The circuit may include gain for each of the signals to provide a bidirectional amplifier. The signals are passed through two separate paths, with a unidirectional amplifier in each path. A controlled sampling device is provided in each path for sampling the two signals. Any feedback loop between the two signals is disrupted by providing a phase displacement between the control signals for the two sampling devices.
Sørbye, Sveinung Wergeland; Pedersen, Mette Kristin; Ekeberg, Bente; Williams, Merete E. Johansen; Sauer, Torill; Chen, Ying
2017-01-01
Background: The Norwegian Cervical Cancer Screening Program recommends screening every 3 years for women between 25 and 69 years of age. There is a large difference in the percentage of unsatisfactory samples between laboratories that use different brands of liquid-based cytology. We wished to examine if inadequate ThinPrep samples could be satisfactory by processing them with the SurePath protocol. Materials and Methods: A total of 187 inadequate ThinPrep specimens from the Department of Clinical Pathology at University Hospital of North Norway were sent to Akershus University Hospital for conversion to SurePath medium. Ninety-one (48.7%) were processed through the automated “gynecologic” application for cervix cytology samples, and 96 (51.3%) were processed with the “nongynecological” automatic program. Results: Out of 187 samples that had been unsatisfactory by ThinPrep, 93 (49.7%) were satisfactory after being converted to SurePath. The rate of satisfactory cytology was 36.6% and 62.5% for samples run through the “gynecology” program and “nongynecology” program, respectively. Of the 93 samples that became satisfactory after conversion from ThinPrep to SurePath, 80 (86.0%) were screened as normal while 13 samples (14.0%) were given an abnormal diagnosis, which included 5 atypical squamous cells of undetermined significance, 5 low-grade squamous intraepithelial lesion, 2 atypical glandular cells not otherwise specified, and 1 atypical squamous cells cannot exclude high-grade squamous intraepithelial lesion. A total of 2.1% (4/187) of the women got a diagnosis of cervical intraepithelial neoplasia 2 or higher at a later follow-up. Conclusions: Converting cytology samples from ThinPrep to SurePath processing can reduce the number of unsatisfactory samples. The samples should be run through the “nongynecology” program to ensure an adequate number of cells. PMID:28900466
Multilevel ensemble Kalman filtering
Hoel, Hakon; Law, Kody J. H.; Tempone, Raul
2016-06-14
This study embeds a multilevel Monte Carlo sampling strategy into the Monte Carlo step of the ensemble Kalman filter (EnKF) in the setting of finite dimensional signal evolution and noisy discrete-time observations. The signal dynamics is assumed to be governed by a stochastic differential equation (SDE), and a hierarchy of time grids is introduced for multilevel numerical integration of that SDE. Finally, the resulting multilevel EnKF is proved to asymptotically outperform EnKF in terms of computational cost versus approximation accuracy. The theoretical results are illustrated numerically.
A GLM Post-processor to Adjust Ensemble Forecast Traces
NASA Astrophysics Data System (ADS)
Thiemann, M.; Day, G. N.; Schaake, J. C.; Draijer, S.; Wang, L.
2011-12-01
The skill of hydrologic ensemble forecasts has improved in the last years through a better understanding of climate variability, better climate forecasts and new data assimilation techniques. Having been extensively utilized for probabilistic water supply forecasting, interest is developing to utilize these forecasts in operational decision making. Hydrologic ensemble forecast members typically have inherent biases in flow timing and volume caused by (1) structural errors in the models used, (2) systematic errors in the data used to calibrate those models, (3) uncertain initial hydrologic conditions, and (4) uncertainties in the forcing datasets. Furthermore, hydrologic models have often not been developed for operational decision points and ensemble forecasts are thus not always available where needed. A statistical post-processor can be used to address these issues. The post-processor should (1) correct for systematic biases in flow timing and volume, (2) preserve the skill of the available raw forecasts, (3) preserve spatial and temporal correlation as well as the uncertainty in the forecasted flow data, (4) produce adjusted forecast ensembles that represent the variability of the observed hydrograph to be predicted, and (5) preserve individual forecast traces as equally likely. The post-processor should also allow for the translation of available ensemble forecasts to hydrologically similar locations where forecasts are not available. This paper introduces an ensemble post-processor (EPP) developed in support of New York City water supply operations. The EPP employs a general linear model (GLM) to (1) adjust available ensemble forecast traces and (2) create new ensembles for (nearby) locations where only historic flow observations are available. The EPP is calibrated by developing daily and aggregated statistical relationships form historical flow observations and model simulations. These are then used in operation to obtain the conditional probability density function (PDF) of the observations to be predicted, thus jointly adjusting individual ensemble members. These steps are executed in a normalized transformed space ('z'-space) to account for the strong non-linearity in the flow observations involved. A data window centered on each calibration date is used to minimize impacts from sampling errors and data noise. Testing on datasets from California and New York suggests that the EPP can successfully minimize biases in ensemble forecasts, while preserving the raw forecast skill in a 'days to weeks' forecast horizon and reproducing the variability of climatology for 'weeks to years' forecast horizons.
NASA Astrophysics Data System (ADS)
Kadoura, Ahmad; Sun, Shuyu; Salama, Amgad
2014-08-01
Accurate determination of thermodynamic properties of petroleum reservoir fluids is of great interest to many applications, especially in petroleum engineering and chemical engineering. Molecular simulation has many appealing features, especially its requirement of fewer tuned parameters but yet better predicting capability; however it is well known that molecular simulation is very CPU expensive, as compared to equation of state approaches. We have recently introduced an efficient thermodynamically consistent technique to regenerate rapidly Monte Carlo Markov Chains (MCMCs) at different thermodynamic conditions from the existing data points that have been pre-computed with expensive classical simulation. This technique can speed up the simulation more than a million times, making the regenerated molecular simulation almost as fast as equation of state approaches. In this paper, this technique is first briefly reviewed and then numerically investigated in its capability of predicting ensemble averages of primary quantities at different neighboring thermodynamic conditions to the original simulated MCMCs. Moreover, this extrapolation technique is extended to predict second derivative properties (e.g. heat capacity and fluid compressibility). The method works by reweighting and reconstructing generated MCMCs in canonical ensemble for Lennard-Jones particles. In this paper, system's potential energy, pressure, isochoric heat capacity and isothermal compressibility along isochors, isotherms and paths of changing temperature and density from the original simulated points were extrapolated. Finally, an optimized set of Lennard-Jones parameters (ε, σ) for single site models were proposed for methane, nitrogen and carbon monoxide.
Feature selection for the classification of traced neurons.
López-Cabrera, José D; Lorenzo-Ginori, Juan V
2018-06-01
The great availability of computational tools to calculate the properties of traced neurons leads to the existence of many descriptors which allow the automated classification of neurons from these reconstructions. This situation determines the necessity to eliminate irrelevant features as well as making a selection of the most appropriate among them, in order to improve the quality of the classification obtained. The dataset used contains a total of 318 traced neurons, classified by human experts in 192 GABAergic interneurons and 126 pyramidal cells. The features were extracted by means of the L-measure software, which is one of the most used computational tools in neuroinformatics to quantify traced neurons. We review some current feature selection techniques as filter, wrapper, embedded and ensemble methods. The stability of the feature selection methods was measured. For the ensemble methods, several aggregation methods based on different metrics were applied to combine the subsets obtained during the feature selection process. The subsets obtained applying feature selection methods were evaluated using supervised classifiers, among which Random Forest, C4.5, SVM, Naïve Bayes, Knn, Decision Table and the Logistic classifier were used as classification algorithms. Feature selection methods of types filter, embedded, wrappers and ensembles were compared and the subsets returned were tested in classification tasks for different classification algorithms. L-measure features EucDistanceSD, PathDistanceSD, Branch_pathlengthAve, Branch_pathlengthSD and EucDistanceAve were present in more than 60% of the selected subsets which provides evidence about their importance in the classification of this neurons. Copyright © 2018 Elsevier B.V. All rights reserved.
Analysis of sampling techniques for imbalanced data: An n = 648 ADNI study.
Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin; Thompson, Paul M; Ye, Jieping
2014-02-15
Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer's disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and undersampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1) a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2) sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results. © 2013 Elsevier Inc. All rights reserved.
Gradient Echo Quantum Memory in Warm Atomic Vapor
Pinel, Olivier; Hosseini, Mahdi; Sparkes, Ben M.; Everett, Jesse L.; Higginbottom, Daniel; Campbell, Geoff T.; Lam, Ping Koy; Buchler, Ben C.
2013-01-01
Gradient echo memory (GEM) is a protocol for storing optical quantum states of light in atomic ensembles. The primary motivation for such a technology is that quantum key distribution (QKD), which uses Heisenberg uncertainty to guarantee security of cryptographic keys, is limited in transmission distance. The development of a quantum repeater is a possible path to extend QKD range, but a repeater will need a quantum memory. In our experiments we use a gas of rubidium 87 vapor that is contained in a warm gas cell. This makes the scheme particularly simple. It is also a highly versatile scheme that enables in-memory refinement of the stored state, such as frequency shifting and bandwidth manipulation. The basis of the GEM protocol is to absorb the light into an ensemble of atoms that has been prepared in a magnetic field gradient. The reversal of this gradient leads to rephasing of the atomic polarization and thus recall of the stored optical state. We will outline how we prepare the atoms and this gradient and also describe some of the pitfalls that need to be avoided, in particular four-wave mixing, which can give rise to optical gain. PMID:24300586
Gradient echo quantum memory in warm atomic vapor.
Pinel, Olivier; Hosseini, Mahdi; Sparkes, Ben M; Everett, Jesse L; Higginbottom, Daniel; Campbell, Geoff T; Lam, Ping Koy; Buchler, Ben C
2013-11-11
Gradient echo memory (GEM) is a protocol for storing optical quantum states of light in atomic ensembles. The primary motivation for such a technology is that quantum key distribution (QKD), which uses Heisenberg uncertainty to guarantee security of cryptographic keys, is limited in transmission distance. The development of a quantum repeater is a possible path to extend QKD range, but a repeater will need a quantum memory. In our experiments we use a gas of rubidium 87 vapor that is contained in a warm gas cell. This makes the scheme particularly simple. It is also a highly versatile scheme that enables in-memory refinement of the stored state, such as frequency shifting and bandwidth manipulation. The basis of the GEM protocol is to absorb the light into an ensemble of atoms that has been prepared in a magnetic field gradient. The reversal of this gradient leads to rephasing of the atomic polarization and thus recall of the stored optical state. We will outline how we prepare the atoms and this gradient and also describe some of the pitfalls that need to be avoided, in particular four-wave mixing, which can give rise to optical gain.
Exploring the propagation of relativistic quantum wavepackets in the trajectory-based formulation
NASA Astrophysics Data System (ADS)
Tsai, Hung-Ming; Poirier, Bill
2016-03-01
In the context of nonrelativistic quantum mechanics, Gaussian wavepacket solutions of the time-dependent Schrödinger equation provide useful physical insight. This is not the case for relativistic quantum mechanics, however, for which both the Klein-Gordon and Dirac wave equations result in strange and counterintuitive wavepacket behaviors, even for free-particle Gaussians. These behaviors include zitterbewegung and other interference effects. As a potential remedy, this paper explores a new trajectory-based formulation of quantum mechanics, in which the wavefunction plays no role [Phys. Rev. X, 4, 040002 (2014)]. Quantum states are represented as ensembles of trajectories, whose mutual interaction is the source of all quantum effects observed in nature—suggesting a “many interacting worlds” interpretation. It is shown that the relativistic generalization of the trajectory-based formulation results in well-behaved free-particle Gaussian wavepacket solutions. In particular, probability density is positive and well-localized everywhere, and its spatial integral is conserved over time—in any inertial frame. Finally, the ensemble-averaged wavepacket motion is along a straight line path through spacetime. In this manner, the pathologies of the wave-based relativistic quantum theory, as applied to wavepacket propagation, are avoided.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Crossno, Patricia J.; Gittinger, Jaxon; Hunt, Warren L.
Slycat™ is a web-based system for performing data analysis and visualization of potentially large quantities of remote, high-dimensional data. Slycat™ specializes in working with ensemble data. An ensemble is a group of related data sets, which typically consists of a set of simulation runs exploring the same problem space. An ensemble can be thought of as a set of samples within a multi-variate domain, where each sample is a vector whose value defines a point in high-dimensional space. To understand and describe the underlying problem being modeled in the simulations, ensemble analysis looks for shared behaviors and common features acrossmore » the group of runs. Additionally, ensemble analysis tries to quantify differences found in any members that deviate from the rest of the group. The Slycat™ system integrates data management, scalable analysis, and visualization. Results are viewed remotely on a user’s desktop via commodity web clients using a multi-tiered hierarchy of computation and data storage, as shown in Figure 1. Our goal is to operate on data as close to the source as possible, thereby reducing time and storage costs associated with data movement. Consequently, we are working to develop parallel analysis capabilities that operate on High Performance Computing (HPC) platforms, to explore approaches for reducing data size, and to implement strategies for staging computation across the Slycat™ hierarchy. Within Slycat™, data and visual analysis are organized around projects, which are shared by a project team. Project members are explicitly added, each with a designated set of permissions. Although users sign-in to access Slycat™, individual accounts are not maintained. Instead, authentication is used to determine project access. Within projects, Slycat™ models capture analysis results and enable data exploration through various visual representations. Although for scientists each simulation run is a model of real-world phenomena given certain conditions, we use the term model to refer to our modeling of the ensemble data, not the physics. Different model types often provide complementary perspectives on data features when analyzing the same data set. Each model visualizes data at several levels of abstraction, allowing the user to range from viewing the ensemble holistically to accessing numeric parameter values for a single run. Bookmarks provide a mechanism for sharing results, enabling interesting model states to be labeled and saved.« less
Diffusing-wave spectroscopy in a standard dynamic light scattering setup
NASA Astrophysics Data System (ADS)
Fahimi, Zahra; Aangenendt, Frank J.; Voudouris, Panayiotis; Mattsson, Johan; Wyss, Hans M.
2017-12-01
Diffusing-wave spectroscopy (DWS) extends dynamic light scattering measurements to samples with strong multiple scattering. DWS treats the transport of photons through turbid samples as a diffusion process, thereby making it possible to extract the dynamics of scatterers from measured correlation functions. The analysis of DWS data requires knowledge of the path length distribution of photons traveling through the sample. While for flat sample cells this path length distribution can be readily calculated and expressed in analytical form; no such expression is available for cylindrical sample cells. DWS measurements have therefore typically relied on dedicated setups that use flat sample cells. Here we show how DWS measurements, in particular DWS-based microrheology measurements, can be performed in standard dynamic light scattering setups that use cylindrical sample cells. To do so we perform simple random-walk simulations that yield numerical predictions of the path length distribution as a function of both the transport mean free path and the detection angle. This information is used in experiments to extract the mean-square displacement of tracer particles in the material, as well as the corresponding frequency-dependent viscoelastic response. An important advantage of our approach is that by performing measurements at different detection angles, the average path length through the sample can be varied. For measurements performed on a single sample cell, this gives access to a wider range of length and time scales than obtained in a conventional DWS setup. Such angle-dependent measurements also offer an important consistency check, as for all detection angles the DWS analysis should yield the same tracer dynamics, even though the respective path length distributions are very different. We validate our approach by performing measurements both on aqueous suspensions of tracer particles and on solidlike gelatin samples, for which we find our DWS-based microrheology data to be in good agreement with rheological measurements performed on the same samples.
A Maximum Entropy Method for Particle Filtering
NASA Astrophysics Data System (ADS)
Eyink, Gregory L.; Kim, Sangil
2006-06-01
Standard ensemble or particle filtering schemes do not properly represent states of low priori probability when the number of available samples is too small, as is often the case in practical applications. We introduce here a set of parametric resampling methods to solve this problem. Motivated by a general H-theorem for relative entropy, we construct parametric models for the filter distributions as maximum-entropy/minimum-information models consistent with moments of the particle ensemble. When the prior distributions are modeled as mixtures of Gaussians, our method naturally generalizes the ensemble Kalman filter to systems with highly non-Gaussian statistics. We apply the new particle filters presented here to two simple test cases: a one-dimensional diffusion process in a double-well potential and the three-dimensional chaotic dynamical system of Lorenz.
Unbiased, scalable sampling of protein loop conformations from probabilistic priors.
Zhang, Yajia; Hauser, Kris
2013-01-01
Protein loops are flexible structures that are intimately tied to function, but understanding loop motion and generating loop conformation ensembles remain significant computational challenges. Discrete search techniques scale poorly to large loops, optimization and molecular dynamics techniques are prone to local minima, and inverse kinematics techniques can only incorporate structural preferences in adhoc fashion. This paper presents Sub-Loop Inverse Kinematics Monte Carlo (SLIKMC), a new Markov chain Monte Carlo algorithm for generating conformations of closed loops according to experimentally available, heterogeneous structural preferences. Our simulation experiments demonstrate that the method computes high-scoring conformations of large loops (>10 residues) orders of magnitude faster than standard Monte Carlo and discrete search techniques. Two new developments contribute to the scalability of the new method. First, structural preferences are specified via a probabilistic graphical model (PGM) that links conformation variables, spatial variables (e.g., atom positions), constraints and prior information in a unified framework. The method uses a sparse PGM that exploits locality of interactions between atoms and residues. Second, a novel method for sampling sub-loops is developed to generate statistically unbiased samples of probability densities restricted by loop-closure constraints. Numerical experiments confirm that SLIKMC generates conformation ensembles that are statistically consistent with specified structural preferences. Protein conformations with 100+ residues are sampled on standard PC hardware in seconds. Application to proteins involved in ion-binding demonstrate its potential as a tool for loop ensemble generation and missing structure completion.
Unbiased, scalable sampling of protein loop conformations from probabilistic priors
2013-01-01
Background Protein loops are flexible structures that are intimately tied to function, but understanding loop motion and generating loop conformation ensembles remain significant computational challenges. Discrete search techniques scale poorly to large loops, optimization and molecular dynamics techniques are prone to local minima, and inverse kinematics techniques can only incorporate structural preferences in adhoc fashion. This paper presents Sub-Loop Inverse Kinematics Monte Carlo (SLIKMC), a new Markov chain Monte Carlo algorithm for generating conformations of closed loops according to experimentally available, heterogeneous structural preferences. Results Our simulation experiments demonstrate that the method computes high-scoring conformations of large loops (>10 residues) orders of magnitude faster than standard Monte Carlo and discrete search techniques. Two new developments contribute to the scalability of the new method. First, structural preferences are specified via a probabilistic graphical model (PGM) that links conformation variables, spatial variables (e.g., atom positions), constraints and prior information in a unified framework. The method uses a sparse PGM that exploits locality of interactions between atoms and residues. Second, a novel method for sampling sub-loops is developed to generate statistically unbiased samples of probability densities restricted by loop-closure constraints. Conclusion Numerical experiments confirm that SLIKMC generates conformation ensembles that are statistically consistent with specified structural preferences. Protein conformations with 100+ residues are sampled on standard PC hardware in seconds. Application to proteins involved in ion-binding demonstrate its potential as a tool for loop ensemble generation and missing structure completion. PMID:24565175
NASA Astrophysics Data System (ADS)
Tang, Jian; Qiao, Junfei; Wu, ZhiWei; Chai, Tianyou; Zhang, Jian; Yu, Wen
2018-01-01
Frequency spectral data of mechanical vibration and acoustic signals relate to difficult-to-measure production quality and quantity parameters of complex industrial processes. A selective ensemble (SEN) algorithm can be used to build a soft sensor model of these process parameters by fusing valued information selectively from different perspectives. However, a combination of several optimized ensemble sub-models with SEN cannot guarantee the best prediction model. In this study, we use several techniques to construct mechanical vibration and acoustic frequency spectra of a data-driven industrial process parameter model based on selective fusion multi-condition samples and multi-source features. Multi-layer SEN (MLSEN) strategy is used to simulate the domain expert cognitive process. Genetic algorithm and kernel partial least squares are used to construct the inside-layer SEN sub-model based on each mechanical vibration and acoustic frequency spectral feature subset. Branch-and-bound and adaptive weighted fusion algorithms are integrated to select and combine outputs of the inside-layer SEN sub-models. Then, the outside-layer SEN is constructed. Thus, "sub-sampling training examples"-based and "manipulating input features"-based ensemble construction methods are integrated, thereby realizing the selective information fusion process based on multi-condition history samples and multi-source input features. This novel approach is applied to a laboratory-scale ball mill grinding process. A comparison with other methods indicates that the proposed MLSEN approach effectively models mechanical vibration and acoustic signals.
Darling, Aaron E.
2009-01-01
Inversions are among the most common mutations acting on the order and orientation of genes in a genome, and polynomial-time algorithms exist to obtain a minimal length series of inversions that transform one genome arrangement to another. However, the minimum length series of inversions (the optimal sorting path) is often not unique as many such optimal sorting paths exist. If we assume that all optimal sorting paths are equally likely, then statistical inference on genome arrangement history must account for all such sorting paths and not just a single estimate. No deterministic polynomial algorithm is known to count the number of optimal sorting paths nor sample from the uniform distribution of optimal sorting paths. Here, we propose a stochastic method that uniformly samples the set of all optimal sorting paths. Our method uses a novel formulation of parallel Markov chain Monte Carlo. In practice, our method can quickly estimate the total number of optimal sorting paths. We introduce a variant of our approach in which short inversions are modeled to be more likely, and we show how the method can be used to estimate the distribution of inversion lengths and breakpoint usage in pathogenic Yersinia pestis. The proposed method has been implemented in a program called “MC4Inversion.” We draw comparison of MC4Inversion to the sampler implemented in BADGER and a previously described importance sampling (IS) technique. We find that on high-divergence data sets, MC4Inversion finds more optimal sorting paths per second than BADGER and the IS technique and simultaneously avoids bias inherent in the IS technique. PMID:20333186
Statistical Symbolic Execution with Informed Sampling
NASA Technical Reports Server (NTRS)
Filieri, Antonio; Pasareanu, Corina S.; Visser, Willem; Geldenhuys, Jaco
2014-01-01
Symbolic execution techniques have been proposed recently for the probabilistic analysis of programs. These techniques seek to quantify the likelihood of reaching program events of interest, e.g., assert violations. They have many promising applications but have scalability issues due to high computational demand. To address this challenge, we propose a statistical symbolic execution technique that performs Monte Carlo sampling of the symbolic program paths and uses the obtained information for Bayesian estimation and hypothesis testing with respect to the probability of reaching the target events. To speed up the convergence of the statistical analysis, we propose Informed Sampling, an iterative symbolic execution that first explores the paths that have high statistical significance, prunes them from the state space and guides the execution towards less likely paths. The technique combines Bayesian estimation with a partial exact analysis for the pruned paths leading to provably improved convergence of the statistical analysis. We have implemented statistical symbolic execution with in- formed sampling in the Symbolic PathFinder tool. We show experimentally that the informed sampling obtains more precise results and converges faster than a purely statistical analysis and may also be more efficient than an exact symbolic analysis. When the latter does not terminate symbolic execution with informed sampling can give meaningful results under the same time and memory limits.
Compressed sensing of hyperspectral images based on scrambled block Hadamard ensemble
NASA Astrophysics Data System (ADS)
Wang, Li; Feng, Yan
2016-11-01
A fast measurement matrix based on scrambled block Hadamard ensemble for compressed sensing (CS) of hyperspectral images (HSI) is investigated. The proposed measurement matrix offers several attractive features. First, the proposed measurement matrix possesses Gaussian behavior, which illustrates that the matrix is universal and requires a near-optimal number of samples for exact reconstruction. In addition, it could be easily implemented in the optical domain due to its integer-valued elements. More importantly, the measurement matrix only needs small memory for storage in the sampling process. Experimental results on HSIs reveal that the reconstruction performance of the proposed measurement matrix is comparable or better than Gaussian matrix and Bernoulli matrix using different reconstruction algorithms while consuming less computational time. The proposed matrix could be used in CS of HSI, which would save the storage memory on board, improve the sampling efficiency, and ameliorate the reconstruction quality.
Optimized nested Markov chain Monte Carlo sampling: theory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Coe, Joshua D; Shaw, M Sam; Sewell, Thomas D
2009-01-01
Metropolis Monte Carlo sampling of a reference potential is used to build a Markov chain in the isothermal-isobaric ensemble. At the endpoints of the chain, the energy is reevaluated at a different level of approximation (the 'full' energy) and a composite move encompassing all of the intervening steps is accepted on the basis of a modified Metropolis criterion. By manipulating the thermodynamic variables characterizing the reference system we maximize the average acceptance probability of composite moves, lengthening significantly the random walk made between consecutive evaluations of the full energy at a fixed acceptance probability. This provides maximally decorrelated samples ofmore » the full potential, thereby lowering the total number required to build ensemble averages of a given variance. The efficiency of the method is illustrated using model potentials appropriate to molecular fluids at high pressure. Implications for ab initio or density functional theory (DFT) treatment are discussed.« less
Hernández, Griselda; Anderson, Janet S.; LeMaster, David M.
2012-01-01
The acute sensitivity to conformation exhibited by amide hydrogen exchange reactivity provides a valuable test for the physical accuracy of model ensembles developed to represent the Boltzmann distribution of the protein native state. A number of molecular dynamics studies of ubiquitin have predicted a well-populated transition in the tight turn immediately preceding the primary site of proteasome-directed polyubiquitylation Lys 48. Amide exchange reactivity analysis demonstrates that this transition is 103-fold rarer than these predictions. More strikingly, for the most populated novel conformational basin predicted from a recent 1 ms MD simulation of bovine pancreatic trypsin inhibitor (at 13% of total), experimental hydrogen exchange data indicates a population below 10−6. The most sophisticated efforts to directly incorporate experimental constraints into the derivation of model protein ensembles have been applied to ubiquitin, as illustrated by three recently deposited studies (PDB codes 2NR2, 2K39 and 2KOX). Utilizing the extensive set of experimental NOE constraints, each of these three ensembles yields a modestly more accurate prediction of the exchange rates for the highly exposed amides than does a standard unconstrained molecular simulation. However, for the less frequently exposed amide hydrogens, the 2NR2 ensemble offers no improvement in rate predictions as compared to the unconstrained MD ensemble. The other two NMR-constrained ensembles performed markedly worse, either underestimating (2KOX) or overestimating (2K39) the extent of conformational diversity. PMID:22425325
Ensemble perception in autism spectrum disorder: Member-identification versus mean-discrimination.
Van der Hallen, Ruth; Lemmens, Lisa; Steyaert, Jean; Noens, Ilse; Wagemans, Johan
2017-07-01
To efficiently represent the outside world our brain compresses sets of similar items into a summarized representation, a phenomenon known as ensemble perception. While most studies on ensemble perception investigate this perceptual mechanism in typically developing (TD) adults, more recently, researchers studying perceptual organization in individuals with autism spectrum disorder (ASD) have turned their attention toward ensemble perception. The current study is the first to investigate the use of ensemble perception for size in children with and without ASD (N = 42, 8-16 years). We administered a pair of tasks pioneered by Ariely [2001] evaluating both member-identification and mean-discrimination. In addition, we varied the distribution types of our sets to allow a more detailed evaluation of task performance. Results show that, overall, both groups performed similarly in the member-identification task, a test of "local perception," and similarly in the mean identification task, a test of "gist perception." However, in both tasks performance of the TD group was affected more strongly by the degree of stimulus variability in the set, than performance of the ASD group. These findings indicate that both TD children and children with ASD use ensemble statistics to represent a set of similar items, illustrating the fundamental nature of ensemble coding in visual perception. Differences in sensitivity to stimulus variability between both groups are discussed in relation to recent theories of information processing in ASD (e.g., increased sampling, decreased priors, increased precision). Autism Res 2017. © 2017 International Society for Autism Research, Wiley Periodicals, Inc. Autism Res 2017, 10: 1291-1299. © 2017 International Society for Autism Research, Wiley Periodicals, Inc. © 2017 International Society for Autism Research, Wiley Periodicals, Inc.
Improving precision of glomerular filtration rate estimating model by ensemble learning.
Liu, Xun; Li, Ningshan; Lv, Linsheng; Fu, Yongmei; Cheng, Cailian; Wang, Caixia; Ye, Yuqiu; Li, Shaomin; Lou, Tanqi
2017-11-09
Accurate assessment of kidney function is clinically important, but estimates of glomerular filtration rate (GFR) by regression are imprecise. We hypothesized that ensemble learning could improve precision. A total of 1419 participants were enrolled, with 1002 in the development dataset and 417 in the external validation dataset. GFR was independently estimated from age, sex and serum creatinine using an artificial neural network (ANN), support vector machine (SVM), regression, and ensemble learning. GFR was measured by 99mTc-DTPA renal dynamic imaging calibrated with dual plasma sample 99mTc-DTPA GFR. Mean measured GFRs were 70.0 ml/min/1.73 m 2 in the developmental and 53.4 ml/min/1.73 m 2 in the external validation cohorts. In the external validation cohort, precision was better in the ensemble model of the ANN, SVM and regression equation (IQR = 13.5 ml/min/1.73 m 2 ) than in the new regression model (IQR = 14.0 ml/min/1.73 m 2 , P < 0.001). The precision of ensemble learning was the best of the three models, but the models had similar bias and accuracy. The median difference ranged from 2.3 to 3.7 ml/min/1.73 m 2 , 30% accuracy ranged from 73.1 to 76.0%, and P was > 0.05 for all comparisons of the new regression equation and the other new models. An ensemble learning model including three variables, the average ANN, SVM, and regression equation values, was more precise than the new regression model. A more complex ensemble learning strategy may further improve GFR estimates.
Sun, Jian; Yang, Xiurong
2015-12-15
Based on the specific binding of Cu(2+) ions to the 11-mercaptoundecanoic acid (11-MUA)-protected AuNCs with intense orange-red emission, we have proposed and constructed a novel fluorescent nanomaterials-metal ions ensemble at a nonfluorescence off-state. Subsequently, an AuNCs@11-MUA-Cu(2+) ensemble-based fluorescent chemosensor, which is amenable to convenient, sensitive, selective, turn-on and real-time assay of acetylcholinesterase (AChE), could be developed by using acetylthiocholine (ATCh) as the substrate. Herein, the sensing ensemble solution exhibits a marvelous fluorescent enhancement in the presence of AChE and ATCh, where AChE hydrolyzes its active substrate ATCh into thiocholine (TCh), and then TCh captures Cu(2+) from the ensemble, accompanied by the conversion from fluorescence off-state to on-state of the AuNCs. The AChE activity could be detected less than 0.05 mU/mL within a good linear range from 0.05 to 2.5 mU/mL. Our proposed fluorescence assay can be utilized to evaluate the AChE activity quantitatively in real biological sample, and furthermore to screen the inhibitor of AChE. As far as we know, the present study has reported the first analytical proposal for sensing AChE activity in real time by using a fluorescent nanomaterials-Cu(2+) ensemble or focusing on the Cu(2+)-triggered fluorescence quenching/recovery. This strategy paves a new avenue for exploring the biosensing applications of fluorescent AuNCs, and presents the prospect of AuNCs@11-MUA-Cu(2+) ensemble as versatile enzyme activity assay platforms by means of other appropriate substrates/analytes. Copyright © 2015 Elsevier B.V. All rights reserved.
An operational mesoscale ensemble data assimilation and prediction system: E-RTFDDA
NASA Astrophysics Data System (ADS)
Liu, Y.; Hopson, T.; Roux, G.; Hacker, J.; Xu, M.; Warner, T.; Swerdlin, S.
2009-04-01
Mesoscale (2-2000 km) meteorological processes differ from synoptic circulations in that mesoscale weather changes rapidly in space and time, and physics processes that are parameterized in NWP models play a great role. Complex interactions of synoptic circulations, regional and local terrain, land-surface heterogeneity, and associated physical properties, and the physical processes of radiative transfer, cloud and precipitation and boundary layer mixing, are crucial in shaping regional weather and climate. Mesoscale ensemble analysis and prediction should sample the uncertainties of mesoscale modeling systems in representing these factors. An innovative mesoscale Ensemble Real-Time Four Dimensional Data Assimilation (E-RTFDDA) and forecasting system has been developed at NCAR. E-RTFDDA contains diverse ensemble perturbation approaches that consider uncertainties in all major system components to produce multi-scale continuously-cycling probabilistic data assimilation and forecasting. A 30-member E-RTFDDA system with three nested domains with grid sizes of 30, 10 and 3.33 km has been running on a Department of Defense high-performance computing platform since September 2007. It has been applied at two very different US geographical locations; one in the western inter-mountain area and the other in the northeastern states, producing 6 hour analyses and 48 hour forecasts, with 4 forecast cycles a day. The operational model outputs are analyzed to a) assess overall ensemble performance and properties, b) study terrain effect on mesoscale predictability, c) quantify the contribution of different ensemble perturbation approaches to the overall forecast skill, and d) assess the additional contributed skill from an ensemble calibration process based on a quantile-regression algorithm. The system and the results will be reported at the meeting.
NASA Astrophysics Data System (ADS)
Shulman, Igor; Gould, Richard W.; Frolov, Sergey; McCarthy, Sean; Penta, Brad; Anderson, Stephanie; Sakalaukus, Peter
2018-03-01
An ensemble-based approach to specify observational error covariance in the data assimilation of satellite bio-optical properties is proposed. The observational error covariance is derived from statistical properties of the generated ensemble of satellite MODIS-Aqua chlorophyll (Chl) images. The proposed observational error covariance is used in the Optimal Interpolation scheme for the assimilation of MODIS-Aqua Chl observations. The forecast error covariance is specified in the subspace of the multivariate (bio-optical, physical) empirical orthogonal functions (EOFs) estimated from a month-long model run. The assimilation of surface MODIS-Aqua Chl improved surface and subsurface model Chl predictions. Comparisons with surface and subsurface water samples demonstrate that data assimilation run with the proposed observational error covariance has higher RMSE than the data assimilation run with "optimistic" assumption about observational errors (10% of the ensemble mean), but has smaller or comparable RMSE than data assimilation run with an assumption that observational errors equal to 35% of the ensemble mean (the target error for satellite data product for chlorophyll). Also, with the assimilation of the MODIS-Aqua Chl data, the RMSE between observed and model-predicted fractions of diatoms to the total phytoplankton is reduced by a factor of two in comparison to the nonassimilative run.
Study on high-resolution representation of terraces in Shanxi Loess Plateau area
NASA Astrophysics Data System (ADS)
Zhao, Weidong; Tang, Guo'an; Ma, Lei
2008-10-01
A new elevation points sampling method, namely TIN-based Sampling Method (TSM) and a new visual method called Elevation Addition Method (EAM), are put forth for representing the typical terraces in Shanxi loess plateau area. The DEM Feature Points and Lines Classification (DEPLC) put forth by the authors in 2007 is perfected for depicting the main path in the study area. The EAM is used to visualize the terraces and the path in the study area. 406 key elevation points and 15 feature constrained lines sampled by this method are used to construct CD-TINs which can depict the terraces and path correctly and effectively. Our case study shows that the new sampling method called TSM is reasonable and feasible. The complicated micro-terrains like terraces and path can be represented with high resolution and high efficiency successfully by use of the perfected DEPLC, TSM and CD-TINs. And both the terraces and the main path are visualized very well by use of EAM even when the terrace height is not more than 1m.
Hayabusa Re-Entry: Trajectory Analysis and Observation Mission Design
NASA Technical Reports Server (NTRS)
Cassell, Alan M.; Winter, Michael W.; Allen, Gary A.; Grinstead, Jay H.; Antimisiaris, Manny E.; Albers, James; Jenniskens, Peter
2011-01-01
On June 13th, 2010, the Hayabusa sample return capsule successfully re-entered Earth s atmosphere over the Woomera Prohibited Area in southern Australia in its quest to return fragments from the asteroid 1998 SF36 Itokawa . The sample return capsule entered at a super-orbital velocity of 12.04 km/sec (inertial), making it the second fastest human-made object to traverse the atmosphere. The NASA DC-8 airborne observatory was utilized as an instrument platform to record the luminous portion of the sample return capsule re-entry (60 sec) with a variety of on-board spectroscopic imaging instruments. The predicted sample return capsule s entry state information at 200 km altitude was propagated through the atmosphere to generate aerothermodynamic and trajectory data used for initial observation flight path design and planning. The DC- 8 flight path was designed by considering safety, optimal sample return capsule viewing geometry and aircraft capabilities in concert with key aerothermodynamic events along the predicted trajectory. Subsequent entry state vector updates provided by the Deep Space Network team at NASA s Jet Propulsion Laboratory were analyzed after the planned trajectory correction maneuvers to further refine the DC-8 observation flight path. Primary and alternate observation flight paths were generated during the mission planning phase which required coordination with Australian authorities for pre-mission approval. The final observation flight path was chosen based upon trade-offs between optimal viewing requirements, ground based observer locations (to facilitate post-flight trajectory reconstruction), predicted weather in the Woomera Prohibited Area and constraints imposed by flight path filing deadlines. To facilitate sample return capsule tracking by the instrument operators, a series of two racetrack flight path patterns were performed prior to the observation leg so the instruments could be pointed towards the region in the star background where the sample return capsule was expected to become visible. An overview of the design methodologies and trade-offs used in the Hayabusa re-entry observation campaign are presented.
NASA Astrophysics Data System (ADS)
Otto, F. E. L.; Mitchell, D.; Sippel, S.; Black, M. T.; Dittus, A. J.; Harrington, L. J.; Mohd Saleh, N. H.
2014-12-01
A shift in the distribution of socially-relevant climate variables such as daily minimum winter temperatures and daily precipitation extremes, has been attributed to anthropogenic climate change for various mid-latitude regions. However, while there are many process-based arguments suggesting also a change in the shape of these distributions, attribution studies demonstrating this have not currently been undertaken. Here we use a very large initial condition ensemble of ~40,000 members simulating the European winter 2013/2014 using the distributed computing infrastructure under the weather@home project. Two separate scenarios are used:1. current climate conditions, and 2. a counterfactual scenario of "world that might have been" without anthropogenic forcing. Specifically focusing on extreme events, we assess how the estimated parameters of the Generalized Extreme Value (GEV) distribution vary depending on variable-type, sampling frequency (daily, monthly, …) and geographical region. We find that the location parameter changes for most variables but, depending on the region and variables, we also find significant changes in scale and shape parameters. The very large ensemble allows, furthermore, to assess whether such findings in the fitted GEV distributions are consistent with an empirical analysis of the model data, and whether the most extreme data still follow a known underlying distribution that in a small sample size might otherwise be thought of as an out-lier. The ~40,000 member ensemble is simulated using 12 different SST patterns (1 'observed', and 11 best guesses of SSTs with no anthropogenic warming). The range in SSTs, along with the corresponding changings in the NAO and high-latitude blocking inform on the dynamics governing some of these extreme events. While strong tele-connection patterns are not found in this particular experiment, the high number of simulated extreme events allows for a more thorough analysis of the dynamics than has been performed before. Therefore, combining extreme value theory with very large ensemble simulations allows us to understand the dynamics of changes in extreme events which is not possible just using the former but also shows in which cases statistics combined with smaller ensembles give as valid results as very large initial conditions.
Structure of marginally jammed polydisperse packings of frictionless spheres
NASA Astrophysics Data System (ADS)
Zhang, Chi; O'Donovan, Cathal B.; Corwin, Eric I.; Cardinaux, Frédéric; Mason, Thomas G.; Möbius, Matthias E.; Scheffold, Frank
2015-03-01
We model the packing structure of a marginally jammed bulk ensemble of polydisperse spheres. To this end we expand on the granocentric model [Clusel et al., Nature (London) 460, 611 (2009), 10.1038/nature08158], explicitly taking into account rattlers. This leads to a relationship between the characteristic parameters of the packing, such as the mean number of neighbors and the fraction of rattlers, and the radial distribution function g (r ) . We find excellent agreement between the model predictions for g (r ) and packing simulations, as well as experiments on jammed emulsion droplets. The observed quantitative agreement opens the path towards a full structural characterization of jammed particle systems for imaging and scattering experiments.
Molecular traffic jams on DNA.
Finkelstein, Ilya J; Greene, Eric C
2013-01-01
All aspects of DNA metabolism-including transcription, replication, and repair-involve motor enzymes that move along genomic DNA. These processes must all take place on chromosomes that are occupied by a large number of other proteins. However, very little is known regarding how nucleic acid motor proteins move along the crowded DNA substrates that are likely to exist in physiological settings. This review summarizes recent progress in understanding how DNA-binding motor proteins respond to the presence of other proteins that lie in their paths. We highlight recent single-molecule biophysical experiments aimed at addressing this question, with an emphasis placed on analyzing the single-molecule, ensemble biochemical, and in vivo data from a mechanistic perspective.
Simulating Energy Relaxation in Pump-Probe Vibrational Spectroscopy of Hydrogen-Bonded Liquids.
Dettori, Riccardo; Ceriotti, Michele; Hunger, Johannes; Melis, Claudio; Colombo, Luciano; Donadio, Davide
2017-03-14
We introduce a nonequilibrium molecular dynamics simulation approach, based on the generalized Langevin equation, to study vibrational energy relaxation in pump-probe spectroscopy. A colored noise thermostat is used to selectively excite a set of vibrational modes, leaving the other modes nearly unperturbed, to mimic the effect of a monochromatic laser pump. Energy relaxation is probed by analyzing the evolution of the system after excitation in the microcanonical ensemble, thus providing direct information about the energy redistribution paths at the molecular level and their time scale. The method is applied to hydrogen-bonded molecular liquids, specifically deuterated methanol and water, providing a robust picture of energy relaxation at the molecular scale.
Wikipedias: Collaborative web-based encyclopedias as complex networks
NASA Astrophysics Data System (ADS)
Zlatić, V.; Božičević, M.; Štefančić, H.; Domazet, M.
2006-07-01
Wikipedia is a popular web-based encyclopedia edited freely and collaboratively by its users. In this paper we present an analysis of Wikipedias in several languages as complex networks. The hyperlinks pointing from one Wikipedia article to another are treated as directed links while the articles represent the nodes of the network. We show that many network characteristics are common to different language versions of Wikipedia, such as their degree distributions, growth, topology, reciprocity, clustering, assortativity, path lengths, and triad significance profiles. These regularities, found in the ensemble of Wikipedias in different languages and of different sizes, point to the existence of a unique growth process. We also compare Wikipedias to other previously studied networks.
Wikipedias: collaborative web-based encyclopedias as complex networks.
Zlatić, V; Bozicević, M; Stefancić, H; Domazet, M
2006-07-01
Wikipedia is a popular web-based encyclopedia edited freely and collaboratively by its users. In this paper we present an analysis of Wikipedias in several languages as complex networks. The hyperlinks pointing from one Wikipedia article to another are treated as directed links while the articles represent the nodes of the network. We show that many network characteristics are common to different language versions of Wikipedia, such as their degree distributions, growth, topology, reciprocity, clustering, assortativity, path lengths, and triad significance profiles. These regularities, found in the ensemble of Wikipedias in different languages and of different sizes, point to the existence of a unique growth process. We also compare Wikipedias to other previously studied networks.
Rodriguez-Diaz, Eladio; Castanon, David A; Singh, Satish K; Bigio, Irving J
2011-06-01
Optical spectroscopy has shown potential as a real-time, in vivo, diagnostic tool for identifying neoplasia during endoscopy. We present the development of a diagnostic algorithm to classify elastic-scattering spectroscopy (ESS) spectra as either neoplastic or non-neoplastic. The algorithm is based on pattern recognition methods, including ensemble classifiers, in which members of the ensemble are trained on different regions of the ESS spectrum, and misclassification-rejection, where the algorithm identifies and refrains from classifying samples that are at higher risk of being misclassified. These "rejected" samples can be reexamined by simply repositioning the probe to obtain additional optical readings or ultimately by sending the polyp for histopathological assessment, as per standard practice. Prospective validation using separate training and testing sets result in a baseline performance of sensitivity = .83, specificity = .79, using the standard framework of feature extraction (principal component analysis) followed by classification (with linear support vector machines). With the developed algorithm, performance improves to Se ∼ 0.90, Sp ∼ 0.90, at a cost of rejecting 20-33% of the samples. These results are on par with a panel of expert pathologists. For colonoscopic prevention of colorectal cancer, our system could reduce biopsy risk and cost, obviate retrieval of non-neoplastic polyps, decrease procedure time, and improve assessment of cancer risk.
Rodriguez-Diaz, Eladio; Castanon, David A.; Singh, Satish K.; Bigio, Irving J.
2011-01-01
Optical spectroscopy has shown potential as a real-time, in vivo, diagnostic tool for identifying neoplasia during endoscopy. We present the development of a diagnostic algorithm to classify elastic-scattering spectroscopy (ESS) spectra as either neoplastic or non-neoplastic. The algorithm is based on pattern recognition methods, including ensemble classifiers, in which members of the ensemble are trained on different regions of the ESS spectrum, and misclassification-rejection, where the algorithm identifies and refrains from classifying samples that are at higher risk of being misclassified. These “rejected” samples can be reexamined by simply repositioning the probe to obtain additional optical readings or ultimately by sending the polyp for histopathological assessment, as per standard practice. Prospective validation using separate training and testing sets result in a baseline performance of sensitivity = .83, specificity = .79, using the standard framework of feature extraction (principal component analysis) followed by classification (with linear support vector machines). With the developed algorithm, performance improves to Se ∼ 0.90, Sp ∼ 0.90, at a cost of rejecting 20–33% of the samples. These results are on par with a panel of expert pathologists. For colonoscopic prevention of colorectal cancer, our system could reduce biopsy risk and cost, obviate retrieval of non-neoplastic polyps, decrease procedure time, and improve assessment of cancer risk. PMID:21721830
Are atmospheric surface layer flows ergodic?
NASA Astrophysics Data System (ADS)
Higgins, Chad W.; Katul, Gabriel G.; Froidevaux, Martin; Simeonov, Valentin; Parlange, Marc B.
2013-06-01
The transposition of atmospheric turbulence statistics from the time domain, as conventionally sampled in field experiments, is explained by the so-called ergodic hypothesis. In micrometeorology, this hypothesis assumes that the time average of a measured flow variable represents an ensemble of independent realizations from similar meteorological states and boundary conditions. That is, the averaging duration must be sufficiently long to include a large number of independent realizations of the sampled flow variable so as to represent the ensemble. While the validity of the ergodic hypothesis for turbulence has been confirmed in laboratory experiments, and numerical simulations for idealized conditions, evidence for its validity in the atmospheric surface layer (ASL), especially for nonideal conditions, continues to defy experimental efforts. There is some urgency to make progress on this problem given the proliferation of tall tower scalar concentration networks aimed at constraining climate models yet are impacted by nonideal conditions at the land surface. Recent advancements in water vapor concentration lidar measurements that simultaneously sample spatial and temporal series in the ASL are used to investigate the validity of the ergodic hypothesis for the first time. It is shown that ergodicity is valid in a strict sense above uniform surfaces away from abrupt surface transitions. Surprisingly, ergodicity may be used to infer the ensemble concentration statistics of a composite grass-lake system using only water vapor concentration measurements collected above the sharp transition delineating the lake from the grass surface.
Decimated Input Ensembles for Improved Generalization
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Oza, Nikunj C.; Norvig, Peter (Technical Monitor)
1999-01-01
Recently, many researchers have demonstrated that using classifier ensembles (e.g., averaging the outputs of multiple classifiers before reaching a classification decision) leads to improved performance for many difficult generalization problems. However, in many domains there are serious impediments to such "turnkey" classification accuracy improvements. Most notable among these is the deleterious effect of highly correlated classifiers on the ensemble performance. One particular solution to this problem is generating "new" training sets by sampling the original one. However, with finite number of patterns, this causes a reduction in the training patterns each classifier sees, often resulting in considerably worsened generalization performance (particularly for high dimensional data domains) for each individual classifier. Generally, this drop in the accuracy of the individual classifier performance more than offsets any potential gains due to combining, unless diversity among classifiers is actively promoted. In this work, we introduce a method that: (1) reduces the correlation among the classifiers; (2) reduces the dimensionality of the data, thus lessening the impact of the 'curse of dimensionality'; and (3) improves the classification performance of the ensemble.
Ligand-biased ensemble receptor docking (LigBEnD): a hybrid ligand/receptor structure-based approach
NASA Astrophysics Data System (ADS)
Lam, Polo C.-H.; Abagyan, Ruben; Totrov, Maxim
2018-01-01
Ligand docking to flexible protein molecules can be efficiently carried out through ensemble docking to multiple protein conformations, either from experimental X-ray structures or from in silico simulations. The success of ensemble docking often requires the careful selection of complementary protein conformations, through docking and scoring of known co-crystallized ligands. False positives, in which a ligand in a wrong pose achieves a better docking score than that of native pose, arise as additional protein conformations are added. In the current study, we developed a new ligand-biased ensemble receptor docking method and composite scoring function which combine the use of ligand-based atomic property field (APF) method with receptor structure-based docking. This method helps us to correctly dock 30 out of 36 ligands presented by the D3R docking challenge. For the six mis-docked ligands, the cognate receptor structures prove to be too different from the 40 available experimental Pocketome conformations used for docking and could be identified only by receptor sampling beyond experimentally explored conformational subspace.
Geometric integrator for simulations in the canonical ensemble
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tapias, Diego, E-mail: diego.tapias@nucleares.unam.mx; Sanders, David P., E-mail: dpsanders@ciencias.unam.mx; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139
2016-08-28
We introduce a geometric integrator for molecular dynamics simulations of physical systems in the canonical ensemble that preserves the invariant distribution in equations arising from the density dynamics algorithm, with any possible type of thermostat. Our integrator thus constitutes a unified framework that allows the study and comparison of different thermostats and of their influence on the equilibrium and non-equilibrium (thermo-)dynamic properties of a system. To show the validity and the generality of the integrator, we implement it with a second-order, time-reversible method and apply it to the simulation of a Lennard-Jones system with three different thermostats, obtaining good conservationmore » of the geometrical properties and recovering the expected thermodynamic results. Moreover, to show the advantage of our geometric integrator over a non-geometric one, we compare the results with those obtained by using the non-geometric Gear integrator, which is frequently used to perform simulations in the canonical ensemble. The non-geometric integrator induces a drift in the invariant quantity, while our integrator has no such drift, thus ensuring that the system is effectively sampling the correct ensemble.« less
Ensemble representations: effects of set size and item heterogeneity on average size perception.
Marchant, Alexander P; Simons, Daniel J; de Fockert, Jan W
2013-02-01
Observers can accurately perceive and evaluate the statistical properties of a set of objects, forming what is now known as an ensemble representation. The accuracy and speed with which people can judge the mean size of a set of objects have led to the proposal that ensemble representations of average size can be computed in parallel when attention is distributed across the display. Consistent with this idea, judgments of mean size show little or no decrement in accuracy when the number of objects in the set increases. However, the lack of a set size effect might result from the regularity of the item sizes used in previous studies. Here, we replicate these previous findings, but show that judgments of mean set size become less accurate when set size increases and the heterogeneity of the item sizes increases. This pattern can be explained by assuming that average size judgments are computed using a limited capacity sampling strategy, and it does not necessitate an ensemble representation computed in parallel across all items in a display. Copyright © 2012 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Khodabakhshi, M.; Jafarpour, B.
2013-12-01
Characterization of complex geologic patterns that create preferential flow paths in certain reservoir systems requires higher-order geostatistical modeling techniques. Multipoint statistics (MPS) provides a flexible grid-based approach for simulating such complex geologic patterns from a conceptual prior model known as a training image (TI). In this approach, a stationary TI that encodes the higher-order spatial statistics of the expected geologic patterns is used to represent the shape and connectivity of the underlying lithofacies. While MPS is quite powerful for describing complex geologic facies connectivity, the nonlinear and complex relation between the flow data and facies distribution makes flow data conditioning quite challenging. We propose an adaptive technique for conditioning facies simulation from a prior TI to nonlinear flow data. Non-adaptive strategies for conditioning facies simulation to flow data can involves many forward flow model solutions that can be computationally very demanding. To improve the conditioning efficiency, we develop an adaptive sampling approach through a data feedback mechanism based on the sampling history. In this approach, after a short period of sampling burn-in time where unconditional samples are generated and passed through an acceptance/rejection test, an ensemble of accepted samples is identified and used to generate a facies probability map. This facies probability map contains the common features of the accepted samples and provides conditioning information about facies occurrence in each grid block, which is used to guide the conditional facies simulation process. As the sampling progresses, the initial probability map is updated according to the collective information about the facies distribution in the chain of accepted samples to increase the acceptance rate and efficiency of the conditioning. This conditioning process can be viewed as an optimization approach where each new sample is proposed based on the sampling history to improve the data mismatch objective function. We extend the application of this adaptive conditioning approach to the case where multiple training images are proposed to describe the geologic scenario in a given formation. We discuss the advantages and limitations of the proposed adaptive conditioning scheme and use numerical experiments from fluvial channel formations to demonstrate its applicability and performance compared to non-adaptive conditioning techniques.
Trends in the predictive performance of raw ensemble weather forecasts
NASA Astrophysics Data System (ADS)
Hemri, Stephan; Scheuerer, Michael; Pappenberger, Florian; Bogner, Konrad; Haiden, Thomas
2015-04-01
Over the last two decades the paradigm in weather forecasting has shifted from being deterministic to probabilistic. Accordingly, numerical weather prediction (NWP) models have been run increasingly as ensemble forecasting systems. The goal of such ensemble forecasts is to approximate the forecast probability distribution by a finite sample of scenarios. Global ensemble forecast systems, like the European Centre for Medium-Range Weather Forecasts (ECMWF) ensemble, are prone to probabilistic biases, and are therefore not reliable. They particularly tend to be underdispersive for surface weather parameters. Hence, statistical post-processing is required in order to obtain reliable and sharp forecasts. In this study we apply statistical post-processing to ensemble forecasts of near-surface temperature, 24-hour precipitation totals, and near-surface wind speed from the global ECMWF model. Our main objective is to evaluate the evolution of the difference in skill between the raw ensemble and the post-processed forecasts. The ECMWF ensemble is under continuous development, and hence its forecast skill improves over time. Parts of these improvements may be due to a reduction of probabilistic bias. Thus, we first hypothesize that the gain by post-processing decreases over time. Based on ECMWF forecasts from January 2002 to March 2014 and corresponding observations from globally distributed stations we generate post-processed forecasts by ensemble model output statistics (EMOS) for each station and variable. Parameter estimates are obtained by minimizing the Continuous Ranked Probability Score (CRPS) over rolling training periods that consist of the n days preceding the initialization dates. Given the higher average skill in terms of CRPS of the post-processed forecasts for all three variables, we analyze the evolution of the difference in skill between raw ensemble and EMOS forecasts. The fact that the gap in skill remains almost constant over time, especially for near-surface wind speed, suggests that improvements to the atmospheric model have an effect quite different from what calibration by statistical post-processing is doing. That is, they are increasing potential skill. Thus this study indicates that (a) further model development is important even if one is just interested in point forecasts, and (b) statistical post-processing is important because it will keep adding skill in the foreseeable future.
Hong-Ou-Mandel Interference between Two Deterministic Collective Excitations in an Atomic Ensemble
NASA Astrophysics Data System (ADS)
Li, Jun; Zhou, Ming-Ti; Jing, Bo; Wang, Xu-Jie; Yang, Sheng-Jun; Jiang, Xiao; Mølmer, Klaus; Bao, Xiao-Hui; Pan, Jian-Wei
2016-10-01
We demonstrate deterministic generation of two distinct collective excitations in one atomic ensemble, and we realize the Hong-Ou-Mandel interference between them. Using Rydberg blockade we create single collective excitations in two different Zeeman levels, and we use stimulated Raman transitions to perform a beam-splitter operation between the excited atomic modes. By converting the atomic excitations into photons, the two-excitation interference is measured by photon coincidence detection with a visibility of 0.89(6). The Hong-Ou-Mandel interference witnesses an entangled NOON state of the collective atomic excitations, and we demonstrate its two times enhanced sensitivity to a magnetic field compared with a single excitation. Our work implements a minimal instance of boson sampling and paves the way for further multimode and multiexcitation studies with collective excitations of atomic ensembles.
Malolepsza, Edyta; Secor, Maxim; Keyes, Tom
2015-09-23
A prescription for sampling isobaric generalized ensembles with molecular dynamics is presented and applied to the generalized replica exchange method (gREM), which was designed for simulating first-order phase transitions. The properties of the isobaric gREM ensemble are discussed and a study is presented of the liquid-vapor equilibrium of the guest molecules given for gas hydrate formation with the mW water model. As a result, phase diagrams, critical parameters, and a law of corresponding states are obtained.
Near-Unity Internal Quantum Efficiency of Luminescent Silicon Nanocrystals with Ligand Passivation.
Sangghaleh, Fatemeh; Sychugov, Ilya; Yang, Zhenyu; Veinot, Jonathan G C; Linnros, Jan
2015-07-28
Spectrally resolved photoluminescence (PL) decays were measured for samples of colloidal, ligand-passivated silicon nanocrystals. These samples have PL emission energies with peak positions in the range ∼1.4-1.8 eV and quantum yields of ∼30-70%. Their ensemble PL decays are characterized by a stretched-exponential decay with a dispersion factor of ∼0.8, which changes to an almost monoexponential character at fixed detection energies. The dispersion factors and decay rates for various detection energies were extracted from spectrally resolved curves using a mathematical approach that excluded the effect of homogeneous line width broadening. Since nonradiative recombination would introduce a random lifetime variation, leading to a stretched-exponential decay for an ensemble, we conclude that the observed monoexponential decay in size-selected ensembles signifies negligible nonradiative transitions of a similar strength to the radiative one. This conjecture is further supported as extracted decay rates agree with radiative rates reported in the literature, suggesting 100% internal quantum efficiency over a broad range of emission wavelengths. The apparent differences in the quantum yields can then be explained by a varying fraction of "dark" or blinking nanocrystals.
Clark, Allan K.; Journey, Celeste A.
2006-01-01
The U.S. Geological Survey, in cooperation with the San Antonio Water System, conducted a 4-year study during 2001– 04 to identify major ground-water flow paths in the Edwards aquifer in northern Medina and northeastern Uvalde Counties, Texas. The study involved use of geologic structure, surfacewater and ground-water data, and geochemistry to identify ground-water flow paths. Relay ramps and associated faulting in northern Medina County appear to channel ground-water flow along four distinct flow paths that move water toward the southwest. The northwestern Medina flow path is bounded on the north by the Woodard Cave fault and on the south by the Parkers Creek fault. Water moves downdip toward the southwest until the flow encounters a cross fault along Seco Creek. This barrier to flow might force part or most of the flow to the south. Departure hydrographs for two wells and discharge departure for a streamflow-gaging station provide evidence for flow in the northwestern Medina flow path. The north-central Medina flow path (northern part) is bounded by the Parkers Creek fault on the north and the Medina Lake fault on the south. The adjacent north-central Medina flow path (southern part) is bounded on the north by the Medina Lake fault and on the south by the Diversion Lake fault. The north-central Medina flow path is separated into a northern and southern part because of water-level differences. Ground water in both parts of the northcentral Medina flow path moves downgradient (and down relay ramp) from eastern Medina County toward the southwest. The north-central Medina flow path is hypothesized to turn south in the vicinity of Seco Creek as it begins to be influenced by structural features. Departure hydrographs for four wells and Medina Lake and discharge departure for a streamflow-gaging station provide evidence for flow in the north-central Medina flow path. The south-central Medina flow path is bounded on the north by the Seco Creek and Diversion Lake faults and on the south by the Haby Crossing fault. Because of bounding faults oriented northeast-southwest and adjacent flow paths directed south by other geologic structures, the south-central Medina flow path follows the configuration of the adjacent flow paths—oriented initially southwest and then south. Immediately after turning south, the south-central Medina flow path turns sharply east. Departure hydrographs for four wells and discharge departure for a streamflow-gaging station provide evidence for flow in the south-central Medina flow path. Statistical correlations between water-level departures for 11 continuously monitored wells provide additional evidence for the hypothesized flow paths. Of the 55 combinations of departure dataset pairs, the stronger correlations (those greater than .6) are all among wells in the same flow path, with one exception. Simulations of compositional differences in water chemistry along a hypothesized flow path in the Edwards aquifer and between ground-water and surface-water systems near Medina Lake were developed using the geochemical model PHREEQC. Ground-water chemistry for samples from five wells in the Edwards aquifer in the northwestern Medina flow path were used to evaluate the evolution of ground-water chemistry in the northwestern Medina flow path. Seven simulations were done for samples from pairs of these wells collected during 2001–03; three of the seven yielded plausible models. Ground-water samples from 13 wells were used to evaluate the evolution of ground-water chemistry in the north-central Medina flow path (northern and southern parts). Five of the wells in the most upgradient part of the flow path were completed in the Trinity aquifer; the remaining eight were completed in the Edwards aquifer. Nineteen simulations were done for samples from well pairs collected during 1995–2003; eight of the 19 yielded plausible models. Ground-water samples from seven wells were used to evaluate the evolution of ground-water chemistry in the south-central Medina flow path. One well was the Trinity aquifer end-member well upgradient from all flow paths, and another was a Trinity aquifer well in the most upgradient part of the flow path; all other wells were completed in the Edwards aquifer. Nine simulations were done for samples from well pairs collected during 1996–2003; seven of the nine yielded plausible models. The plausible models demonstrate that the four hypothesized flow paths can be partially supported geochemically.
Accelerated sampling by infinite swapping of path integral molecular dynamics with surface hopping
NASA Astrophysics Data System (ADS)
Lu, Jianfeng; Zhou, Zhennan
2018-02-01
To accelerate the thermal equilibrium sampling of multi-level quantum systems, the infinite swapping limit of a recently proposed multi-level ring polymer representation is investigated. In the infinite swapping limit, the ring polymer evolves according to an averaged Hamiltonian with respect to all possible surface index configurations of the ring polymer and thus connects the surface hopping approach to the mean-field path-integral molecular dynamics. A multiscale integrator for the infinite swapping limit is also proposed to enable efficient sampling based on the limiting dynamics. Numerical results demonstrate the huge improvement of sampling efficiency of the infinite swapping compared with the direct simulation of path-integral molecular dynamics with surface hopping.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, Y; Southern Medical University, Guangzhou; Tian, Z
Purpose: Monte Carlo (MC) simulation is an important tool to solve radiotherapy and medical imaging problems. Low computational efficiency hinders its wide applications. Conventionally, MC is performed in a particle-by -particle fashion. The lack of control on particle trajectory is a main cause of low efficiency in some applications. Take cone beam CT (CBCT) projection simulation as an example, significant amount of computations were wasted on transporting photons that do not reach the detector. To solve this problem, we propose an innovative MC simulation scheme with a path-by-path sampling method. Methods: Consider a photon path starting at the x-ray source.more » After going through a set of interactions, it ends at the detector. In the proposed scheme, we sampled an entire photon path each time. Metropolis-Hasting algorithm was employed to accept/reject a sampled path based on a calculated acceptance probability, in order to maintain correct relative probabilities among different paths, which are governed by photon transport physics. We developed a package gMMC on GPU with this new scheme implemented. The performance of gMMC was tested in a sample problem of CBCT projection simulation for a homogeneous object. The results were compared to those obtained using gMCDRR, a GPU-based MC tool with the conventional particle-by-particle simulation scheme. Results: Calculated scattered photon signals in gMMC agreed with those from gMCDRR with a relative difference of 3%. It took 3.1 hr. for gMCDRR to simulate 7.8e11 photons and 246.5 sec for gMMC to simulate 1.4e10 paths. Under this setting, both results attained the same ∼2% statistical uncertainty. Hence, a speed-up factor of ∼45.3 was achieved by this new path-by-path simulation scheme, where all the computations were spent on those photons contributing to the detector signal. Conclusion: We innovatively proposed a novel path-by-path simulation scheme that enabled a significant efficiency enhancement for MC particle transport simulations.« less
On the use of transition matrix methods with extended ensembles.
Escobedo, Fernando A; Abreu, Charlles R A
2006-03-14
Different extended ensemble schemes for non-Boltzmann sampling (NBS) of a selected reaction coordinate lambda were formulated so that they employ (i) "variable" sampling window schemes (that include the "successive umbrella sampling" method) to comprehensibly explore the lambda domain and (ii) transition matrix methods to iteratively obtain the underlying free-energy eta landscape (or "importance" weights) associated with lambda. The connection between "acceptance ratio" and transition matrix methods was first established to form the basis of the approach for estimating eta(lambda). The validity and performance of the different NBS schemes were then assessed using as lambda coordinate the configurational energy of the Lennard-Jones fluid. For the cases studied, it was found that the convergence rate in the estimation of eta is little affected by the use of data from high-order transitions, while it is noticeably improved by the use of a broader window of sampling in the variable window methods. Finally, it is shown how an "elastic" window of sampling can be used to effectively enact (nonuniform) preferential sampling over the lambda domain, and how to stitch the weights from separate one-dimensional NBS runs to produce a eta surface over a two-dimensional domain.
NASA Astrophysics Data System (ADS)
Hashemian, Behrooz; Millán, Daniel; Arroyo, Marino
2013-12-01
Collective variables (CVs) are low-dimensional representations of the state of a complex system, which help us rationalize molecular conformations and sample free energy landscapes with molecular dynamics simulations. Given their importance, there is need for systematic methods that effectively identify CVs for complex systems. In recent years, nonlinear manifold learning has shown its ability to automatically characterize molecular collective behavior. Unfortunately, these methods fail to provide a differentiable function mapping high-dimensional configurations to their low-dimensional representation, as required in enhanced sampling methods. We introduce a methodology that, starting from an ensemble representative of molecular flexibility, builds smooth and nonlinear data-driven collective variables (SandCV) from the output of nonlinear manifold learning algorithms. We demonstrate the method with a standard benchmark molecule, alanine dipeptide, and show how it can be non-intrusively combined with off-the-shelf enhanced sampling methods, here the adaptive biasing force method. We illustrate how enhanced sampling simulations with SandCV can explore regions that were poorly sampled in the original molecular ensemble. We further explore the transferability of SandCV from a simpler system, alanine dipeptide in vacuum, to a more complex system, alanine dipeptide in explicit water.
Hashemian, Behrooz; Millán, Daniel; Arroyo, Marino
2013-12-07
Collective variables (CVs) are low-dimensional representations of the state of a complex system, which help us rationalize molecular conformations and sample free energy landscapes with molecular dynamics simulations. Given their importance, there is need for systematic methods that effectively identify CVs for complex systems. In recent years, nonlinear manifold learning has shown its ability to automatically characterize molecular collective behavior. Unfortunately, these methods fail to provide a differentiable function mapping high-dimensional configurations to their low-dimensional representation, as required in enhanced sampling methods. We introduce a methodology that, starting from an ensemble representative of molecular flexibility, builds smooth and nonlinear data-driven collective variables (SandCV) from the output of nonlinear manifold learning algorithms. We demonstrate the method with a standard benchmark molecule, alanine dipeptide, and show how it can be non-intrusively combined with off-the-shelf enhanced sampling methods, here the adaptive biasing force method. We illustrate how enhanced sampling simulations with SandCV can explore regions that were poorly sampled in the original molecular ensemble. We further explore the transferability of SandCV from a simpler system, alanine dipeptide in vacuum, to a more complex system, alanine dipeptide in explicit water.
Optical path switching based differential absorption radiometry for substance detection
NASA Technical Reports Server (NTRS)
Sachse, Glen W. (Inventor)
2005-01-01
An optical path switch divides sample path radiation into a time series of alternating first polarized components and second polarized components. The first polarized components are transmitted along a first optical path and the second polarized components along a second optical path. A first gasless optical filter train filters the first polarized components to isolate at least a first wavelength band thereby generating first filtered radiation. A second gasless optical filter train filters the second polarized components to isolate at least a second wavelength band thereby generating second filtered radiation. A beam combiner combines the first and second filtered radiation to form a combined beam of radiation. A detector is disposed to monitor magnitude of at least a portion of the combined beam alternately at the first wavelength band and the second wavelength band as an indication of the concentration of the substance in the sample path.
Optical path switching based differential absorption radiometry for substance detection
NASA Technical Reports Server (NTRS)
Sachse, Glen W. (Inventor)
2003-01-01
An optical path switch divides sample path radiation into a time series of alternating first polarized components and second polarized components. The first polarized components are transmitted along a first optical path and the second polarized components along a second optical path. A first gasless optical filter train filters the first polarized components to isolate at least a first wavelength band thereby generating first filtered radiation. A second gasless optical filter train filters the second polarized components to isolate at least a second wavelength band thereby generating second filtered radiation. A beam combiner combines the first and second filtered radiation to form a combined beam of radiation. A detector is disposed to monitor magnitude of at least a portion of the combined beam alternately at the first wavelength band and the second wavelength band as an indication of the concentration of the substance in the sample path.
Different realizations of Cooper-Frye sampling with conservation laws
NASA Astrophysics Data System (ADS)
Schwarz, C.; Oliinychenko, D.; Pang, L.-G.; Ryu, S.; Petersen, H.
2018-01-01
Approaches based on viscous hydrodynamics for the hot and dense stage and hadronic transport for the final dilute rescattering stage are successfully applied to the dynamic description of heavy ion reactions at high beam energies. One crucial step in such hybrid approaches is the so-called particlization, which is the transition between the hydrodynamic description and the microscopic degrees of freedom. For this purpose, individual particles are sampled on the Cooper-Frye hypersurface. In this work, four different realizations of the sampling algorithms are compared, with three of them incorporating the global conservation laws of quantum numbers in each event. The algorithms are compared within two types of scenarios: a simple ‘box’ hypersurface consisting of only one static cell and a typical particlization hypersurface for Au+Au collisions at \\sqrt{{s}{NN}}=200 {GeV}. For all algorithms the mean multiplicities (or particle spectra) remain unaffected by global conservation laws in the case of large volumes. In contrast, the fluctuations of the particle numbers are affected considerably. The fluctuations of the newly developed SPREW algorithm based on the exponential weight, and the recently suggested SER algorithm based on ensemble rejection, are smaller than those without conservation laws and agree with the expectation from the canonical ensemble. The previously applied mode sampling algorithm produces dramatically larger fluctuations than expected in the corresponding microcanonical ensemble, and therefore should be avoided in fluctuation studies. This study might be of interest for the investigation of particle fluctuations and correlations, e.g. the suggested signatures for a phase transition or a critical endpoint, in hybrid approaches that are affected by global conservation laws.
Mori, Takaharu; Jung, Jaewoon; Sugita, Yuji
2013-12-10
Conformational sampling is fundamentally important for simulating complex biomolecular systems. The generalized-ensemble algorithm, especially the temperature replica-exchange molecular dynamics method (T-REMD), is one of the most powerful methods to explore structures of biomolecules such as proteins, nucleic acids, carbohydrates, and also of lipid membranes. T-REMD simulations have focused on soluble proteins rather than membrane proteins or lipid bilayers, because explicit membranes do not keep their structural integrity at high temperature. Here, we propose a new generalized-ensemble algorithm for membrane systems, which we call the surface-tension REMD method. Each replica is simulated in the NPγT ensemble, and surface tensions in a pair of replicas are exchanged at certain intervals to enhance conformational sampling of the target membrane system. We test the method on two biological membrane systems: a fully hydrated DPPC (1,2-dipalmitoyl-sn-glycero-3-phosphatidylcholine) lipid bilayer and a WALP23-POPC (1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine) membrane system. During these simulations, a random walk in surface tension space is realized. Large-scale lateral deformation (shrinking and stretching) of the membranes takes place in all of the replicas without collapse of the lipid bilayer structure. There is accelerated lateral diffusion of DPPC lipid molecules compared with conventional MD simulation, and a much wider range of tilt angle of the WALP23 peptide is sampled due to large deformation of the POPC lipid bilayer and through peptide-lipid interactions. Our method could be applicable to a wide variety of biological membrane systems.
Smith, Morgan E; Singh, Brajendra K; Irvine, Michael A; Stolk, Wilma A; Subramanian, Swaminathan; Hollingsworth, T Déirdre; Michael, Edwin
2017-03-01
Mathematical models of parasite transmission provide powerful tools for assessing the impacts of interventions. Owing to complexity and uncertainty, no single model may capture all features of transmission and elimination dynamics. Multi-model ensemble modelling offers a framework to help overcome biases of single models. We report on the development of a first multi-model ensemble of three lymphatic filariasis (LF) models (EPIFIL, LYMFASIM, and TRANSFIL), and evaluate its predictive performance in comparison with that of the constituents using calibration and validation data from three case study sites, one each from the three major LF endemic regions: Africa, Southeast Asia and Papua New Guinea (PNG). We assessed the performance of the respective models for predicting the outcomes of annual MDA strategies for various baseline scenarios thought to exemplify the current endemic conditions in the three regions. The results show that the constructed multi-model ensemble outperformed the single models when evaluated across all sites. Single models that best fitted calibration data tended to do less well in simulating the out-of-sample, or validation, intervention data. Scenario modelling results demonstrate that the multi-model ensemble is able to compensate for variance between single models in order to produce more plausible predictions of intervention impacts. Our results highlight the value of an ensemble approach to modelling parasite control dynamics. However, its optimal use will require further methodological improvements as well as consideration of the organizational mechanisms required to ensure that modelling results and data are shared effectively between all stakeholders. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ginn, Timothy R.; Weathers, Tess
Biogeochemical modeling using PHREEQC2 and a streamtube ensemble approach is utilized to understand a well-to-well subsurface treatment system at the Vadose Zone Research Park (VZRP) near Idaho Falls, Idaho. Treatment involves in situ microbially-mediated ureolysis to induce calcite precipitation for the immobilization of strontium-90. PHREEQC2 is utilized to model the kinetically-controlled ureolysis and consequent calcite precipitation. Reaction kinetics, equilibrium phases, and cation exchange are used within PHREEQC2 to track pH and levels of calcium, ammonium, urea, and calcite precipitation over time, within a series of one-dimensional advective-dispersive transport paths creating a streamtube ensemble representation of the well-to-well transport. An understandingmore » of the impact of physical heterogeneities within this radial flowfield is critical for remediation design; we address this via the streamtube approach: instead of depicting spatial extents of solutes in the subsurface we focus on their arrival distribution at the control well(s). Traditionally, each streamtube maintains uniform velocity; however in radial flow in homogeneous media, the velocity within any given streamtube is spatially-variable in a common way, being highest at the input and output wells and approaching a minimum at the midpoint between the wells. This idealized velocity variability is of significance in the case of ureolytically driven calcite precipitation. Streamtube velocity patterns for any particular configuration of injection and withdrawal wells are available as explicit calculations from potential theory, and also from particle tracking programs. To approximate the actual spatial distribution of velocity along streamtubes, we assume idealized radial non-uniform velocity associated with homogeneous media. This is implemented in PHREEQC2 via a non-uniform spatial discretization within each streamtube that honors both the streamtube’s travel time and the idealized “fast-slow-fast” pattern of non-uniform velocity along the streamline. Breakthrough curves produced by each simulation are weighted by the path-respective flux fractions (obtained by deconvolution of tracer tests conducted at the VZRP) to obtain the flux-average of flow contributions to the observation well.« less
Park, Wooram; Liu, Yan; Zhou, Yu; Moses, Matthew; Chirikjian, Gregory S.
2010-01-01
SUMMARY A nonholonomic system subjected to external noise from the environment, or internal noise in its own actuators, will evolve in a stochastic manner described by an ensemble of trajectories. This ensemble of trajectories is equivalent to the solution of a Fokker–Planck equation that typically evolves on a Lie group. If the most likely state of such a system is to be estimated, and plans for subsequent motions from the current state are to be made so as to move the system to a desired state with high probability, then modeling how the probability density of the system evolves is critical. Methods for solving Fokker-Planck equations that evolve on Lie groups then become important. Such equations can be solved using the operational properties of group Fourier transforms in which irreducible unitary representation (IUR) matrices play a critical role. Therefore, we develop a simple approach for the numerical approximation of all the IUR matrices for two of the groups of most interest in robotics: the rotation group in three-dimensional space, SO(3), and the Euclidean motion group of the plane, SE(2). This approach uses the exponential mapping from the Lie algebras of these groups, and takes advantage of the sparse nature of the Lie algebra representation matrices. Other techniques for density estimation on groups are also explored. The computed densities are applied in the context of probabilistic path planning for kinematic cart in the plane and flexible needle steering in three-dimensional space. In these examples the injection of artificial noise into the computational models (rather than noise in the actual physical systems) serves as a tool to search the configuration spaces and plan paths. Finally, we illustrate how density estimation problems arise in the characterization of physical noise in orientational sensors such as gyroscopes. PMID:20454468
Representation of photon limited data in emission tomography using origin ensembles
NASA Astrophysics Data System (ADS)
Sitek, A.
2008-06-01
Representation and reconstruction of data obtained by emission tomography scanners are challenging due to high noise levels in the data. Typically, images obtained using tomographic measurements are represented using grids. In this work, we define images as sets of origins of events detected during tomographic measurements; we call these origin ensembles (OEs). A state in the ensemble is characterized by a vector of 3N parameters Y, where the parameters are the coordinates of origins of detected events in a three-dimensional space and N is the number of detected events. The 3N-dimensional probability density function (PDF) for that ensemble is derived, and we present an algorithm for OE image estimation from tomographic measurements. A displayable image (e.g. grid based image) is derived from the OE formulation by calculating ensemble expectations based on the PDF using the Markov chain Monte Carlo method. The approach was applied to computer-simulated 3D list-mode positron emission tomography data. The reconstruction errors for a 10 000 000 event acquisition for simulated ranged from 0.1 to 34.8%, depending on object size and sampling density. The method was also applied to experimental data and the results of the OE method were consistent with those obtained by a standard maximum-likelihood approach. The method is a new approach to representation and reconstruction of data obtained by photon-limited emission tomography measurements.
Leong, Max K.; Syu, Ren-Guei; Ding, Yi-Lung; Weng, Ching-Feng
2017-01-01
The glycine-binding site of the N-methyl-D-aspartate receptor (NMDAR) subunit GluN1 is a potential pharmacological target for neurodegenerative disorders. A novel combinatorial ensemble docking scheme using ligand and protein conformation ensembles and customized support vector machine (SVM)-based models to select the docked pose and to predict the docking score was generated for predicting the NMDAR GluN1-ligand binding affinity. The predicted root mean square deviation (RMSD) values in pose by SVM-Pose models were found to be in good agreement with the observed values (n = 30, r2 = 0.928–0.988, = 0.894–0.954, RMSE = 0.002–0.412, s = 0.001–0.214), and the predicted pKi values by SVM-Score were found to be in good agreement with the observed values for the training samples (n = 24, r2 = 0.967, = 0.899, RMSE = 0.295, s = 0.170) and test samples (n = 13, q2 = 0.894, RMSE = 0.437, s = 0.202). When subjected to various statistical validations, the developed SVM-Pose and SVM-Score models consistently met the most stringent criteria. A mock test asserted the predictivity of this novel docking scheme. Collectively, this accurate novel combinatorial ensemble docking scheme can be used to predict the NMDAR GluN1-ligand binding affinity for facilitating drug discovery. PMID:28059133
Leong, Max K; Syu, Ren-Guei; Ding, Yi-Lung; Weng, Ching-Feng
2017-01-06
The glycine-binding site of the N-methyl-D-aspartate receptor (NMDAR) subunit GluN1 is a potential pharmacological target for neurodegenerative disorders. A novel combinatorial ensemble docking scheme using ligand and protein conformation ensembles and customized support vector machine (SVM)-based models to select the docked pose and to predict the docking score was generated for predicting the NMDAR GluN1-ligand binding affinity. The predicted root mean square deviation (RMSD) values in pose by SVM-Pose models were found to be in good agreement with the observed values (n = 30, r 2 = 0.928-0.988, = 0.894-0.954, RMSE = 0.002-0.412, s = 0.001-0.214), and the predicted pK i values by SVM-Score were found to be in good agreement with the observed values for the training samples (n = 24, r 2 = 0.967, = 0.899, RMSE = 0.295, s = 0.170) and test samples (n = 13, q 2 = 0.894, RMSE = 0.437, s = 0.202). When subjected to various statistical validations, the developed SVM-Pose and SVM-Score models consistently met the most stringent criteria. A mock test asserted the predictivity of this novel docking scheme. Collectively, this accurate novel combinatorial ensemble docking scheme can be used to predict the NMDAR GluN1-ligand binding affinity for facilitating drug discovery.
NASA Astrophysics Data System (ADS)
Leong, Max K.; Syu, Ren-Guei; Ding, Yi-Lung; Weng, Ching-Feng
2017-01-01
The glycine-binding site of the N-methyl-D-aspartate receptor (NMDAR) subunit GluN1 is a potential pharmacological target for neurodegenerative disorders. A novel combinatorial ensemble docking scheme using ligand and protein conformation ensembles and customized support vector machine (SVM)-based models to select the docked pose and to predict the docking score was generated for predicting the NMDAR GluN1-ligand binding affinity. The predicted root mean square deviation (RMSD) values in pose by SVM-Pose models were found to be in good agreement with the observed values (n = 30, r2 = 0.928-0.988, = 0.894-0.954, RMSE = 0.002-0.412, s = 0.001-0.214), and the predicted pKi values by SVM-Score were found to be in good agreement with the observed values for the training samples (n = 24, r2 = 0.967, = 0.899, RMSE = 0.295, s = 0.170) and test samples (n = 13, q2 = 0.894, RMSE = 0.437, s = 0.202). When subjected to various statistical validations, the developed SVM-Pose and SVM-Score models consistently met the most stringent criteria. A mock test asserted the predictivity of this novel docking scheme. Collectively, this accurate novel combinatorial ensemble docking scheme can be used to predict the NMDAR GluN1-ligand binding affinity for facilitating drug discovery.
Uncertainty Quantification in Alchemical Free Energy Methods.
Bhati, Agastya P; Wan, Shunzhou; Hu, Yuan; Sherborne, Brad; Coveney, Peter V
2018-06-12
Alchemical free energy methods have gained much importance recently from several reports of improved ligand-protein binding affinity predictions based on their implementation using molecular dynamics simulations. A large number of variants of such methods implementing different accelerated sampling techniques and free energy estimators are available, each claimed to be better than the others in its own way. However, the key features of reproducibility and quantification of associated uncertainties in such methods have barely been discussed. Here, we apply a systematic protocol for uncertainty quantification to a number of popular alchemical free energy methods, covering both absolute and relative free energy predictions. We show that a reliable measure of error estimation is provided by ensemble simulation-an ensemble of independent MD simulations-which applies irrespective of the free energy method. The need to use ensemble methods is fundamental and holds regardless of the duration of time of the molecular dynamics simulations performed.
Fast adaptive flat-histogram ensemble to enhance the sampling in large systems
NASA Astrophysics Data System (ADS)
Xu, Shun; Zhou, Xin; Jiang, Yi; Wang, YanTing
2015-09-01
An efficient novel algorithm was developed to estimate the Density of States (DOS) for large systems by calculating the ensemble means of an extensive physical variable, such as the potential energy, U, in generalized canonical ensembles to interpolate the interior reverse temperature curve , where S( U) is the logarithm of the DOS. This curve is computed with different accuracies in different energy regions to capture the dependence of the reverse temperature on U without setting prior grid in the U space. By combining with a U-compression transformation, we decrease the computational complexity from O( N 3/2) in the normal Wang Landau type method to O( N 1/2) in the current algorithm, as the degrees of freedom of system N. The efficiency of the algorithm is demonstrated by applying to Lennard Jones fluids with various N, along with its ability to find different macroscopic states, including metastable states.
NASA Astrophysics Data System (ADS)
Vagnetti, F.; Middei, R.; Antonucci, M.; Paolillo, M.; Serafinelli, R.
2016-09-01
Context. Most investigations of the X-ray variability of active galactic nuclei (AGN) have been concentrated on the detailed analyses of individual, nearby sources. A relatively small number of studies have treated the ensemble behaviour of the more general AGN population in wider regions of the luminosity-redshift plane. Aims: We want to determine the ensemble variability properties of a rich AGN sample, called Multi-Epoch XMM Serendipitous AGN Sample (MEXSAS), extracted from the fifth release of the XMM-Newton Serendipitous Source Catalogue (XMMSSC-DR5), with redshift between ~0.1 and ~5, and X-ray luminosities in the 0.5-4.5 keV band between ~1042 erg/s and ~1047 erg/s. Methods: We urge caution on the use of the normalised excess variance (NXS), noting that it may lead to underestimate variability if used improperly. We use the structure function (SF), updating our previous analysis for a smaller sample. We propose a correction to the NXS variability estimator, taking account of the light curve duration in the rest frame on the basis of the knowledge of the variability behaviour gained by SF studies. Results: We find an ensemble increase of the X-ray variability with the rest-frame time lag τ, given by SF ∝ τ0.12. We confirm an inverse dependence on the X-ray luminosity, approximately as SF ∝ LX-0.19. We analyse the SF in different X-ray bands, finding a dependence of the variability on the frequency as SF ∝ ν-0.15, corresponding to a so-called softer when brighter trend. In turn, this dependence allows us to parametrically correct the variability estimated in observer-frame bands to that in the rest frame, resulting in a moderate (≲15%) shift upwards (V-correction). Conclusions: Ensemble X-ray variability of AGNs is best described by the structure function. An improper use of the normalised excess variance may lead to an underestimate of the intrinsic variability, so that appropriate corrections to the data or the models must be applied to prevent these effects. Full Table 1 is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/593/A55
2013-01-01
Background Elucidating the native structure of a protein molecule from its sequence of amino acids, a problem known as de novo structure prediction, is a long standing challenge in computational structural biology. Difficulties in silico arise due to the high dimensionality of the protein conformational space and the ruggedness of the associated energy surface. The issue of multiple minima is a particularly troublesome hallmark of energy surfaces probed with current energy functions. In contrast to the true energy surface, these surfaces are weakly-funneled and rich in comparably deep minima populated by non-native structures. For this reason, many algorithms seek to be inclusive and obtain a broad view of the low-energy regions through an ensemble of low-energy (decoy) conformations. Conformational diversity in this ensemble is key to increasing the likelihood that the native structure has been captured. Methods We propose an evolutionary search approach to address the multiple-minima problem in decoy sampling for de novo structure prediction. Two population-based evolutionary search algorithms are presented that follow the basic approach of treating conformations as individuals in an evolving population. Coarse graining and molecular fragment replacement are used to efficiently obtain protein-like child conformations from parents. Potential energy is used both to bias parent selection and determine which subset of parents and children will be retained in the evolving population. The effect on the decoy ensemble of sampling minima directly is measured by additionally mapping a conformation to its nearest local minimum before considering it for retainment. The resulting memetic algorithm thus evolves not just a population of conformations but a population of local minima. Results and conclusions Results show that both algorithms are effective in terms of sampling conformations in proximity of the known native structure. The additional minimization is shown to be key to enhancing sampling capability and obtaining a diverse ensemble of decoy conformations, circumventing premature convergence to sub-optimal regions in the conformational space, and approaching the native structure with proximity that is comparable to state-of-the-art decoy sampling methods. The results are shown to be robust and valid when using two representative state-of-the-art coarse-grained energy functions. PMID:24565020
Path Finding on High-Dimensional Free Energy Landscapes
NASA Astrophysics Data System (ADS)
Díaz Leines, Grisell; Ensing, Bernd
2012-07-01
We present a method for determining the average transition path and the free energy along this path in the space of selected collective variables. The formalism is based upon a history-dependent bias along a flexible path variable within the metadynamics framework but with a trivial scaling of the cost with the number of collective variables. Controlling the sampling of the orthogonal modes recovers the average path and the minimum free energy path as the limiting cases. The method is applied to resolve the path and the free energy of a conformational transition in alanine dipeptide.
X-ray-generated heralded macroscopical quantum entanglement of two nuclear ensembles.
Liao, Wen-Te; Keitel, Christoph H; Pálffy, Adriana
2016-09-19
Heralded entanglement between macroscopical samples is an important resource for present quantum technology protocols, allowing quantum communication over large distances. In such protocols, optical photons are typically used as information and entanglement carriers between macroscopic quantum memories placed in remote locations. Here we investigate theoretically a new implementation which employs more robust x-ray quanta to generate heralded entanglement between two crystal-hosted macroscopical nuclear ensembles. Mössbauer nuclei in the two crystals interact collectively with an x-ray spontaneous parametric down conversion photon that generates heralded macroscopical entanglement with coherence times of approximately 100 ns at room temperature. The quantum phase between the entangled crystals can be conveniently manipulated by magnetic field rotations at the samples. The inherent long nuclear coherence times allow also for mechanical manipulations of the samples, for instance to check the stability of entanglement in the x-ray setup. Our results pave the way for first quantum communication protocols that use x-ray qubits.
Analytical Applications of Monte Carlo Techniques.
ERIC Educational Resources Information Center
Guell, Oscar A.; Holcombe, James A.
1990-01-01
Described are analytical applications of the theory of random processes, in particular solutions obtained by using statistical procedures known as Monte Carlo techniques. Supercomputer simulations, sampling, integration, ensemble, annealing, and explicit simulation are discussed. (CW)
Nanoporous TiO2 nanoparticle assemblies with mesoscale morphologies: nano-cabbage versus sea-anemone
NASA Astrophysics Data System (ADS)
Darbandi, Masih; Gebre, Tesfaye; Mitchell, Lucas; Erwin, William; Bardhan, Rizia; Levan, M. Douglas; Mochena, Mogus D.; Dickerson, James H.
2014-05-01
We report the novel synthesis of nanoporous TiO2 nanoparticle ensembles with unique mesoscale morphologies. Constituent nanoparticles evolved into multifaceted assemblies, exhibiting excellent crystallinity and enhanced photocatalytic activity compared with commercial TiO2. Such materials could be exploited for applications, like organic pollutant degradation.We report the novel synthesis of nanoporous TiO2 nanoparticle ensembles with unique mesoscale morphologies. Constituent nanoparticles evolved into multifaceted assemblies, exhibiting excellent crystallinity and enhanced photocatalytic activity compared with commercial TiO2. Such materials could be exploited for applications, like organic pollutant degradation. Electronic supplementary information (ESI) available: Synthesis and characterization procedures, TEM/XRD of samples prepared at different temperature and water content, table of nitrogen adsorption-desorption values of different samples. See DOI: 10.1039/c3nr06154j
A stochastic diffusion process for Lochner's generalized Dirichlet distribution
Bakosi, J.; Ristorcelli, J. R.
2013-10-01
The method of potential solutions of Fokker-Planck equations is used to develop a transport equation for the joint probability of N stochastic variables with Lochner’s generalized Dirichlet distribution as its asymptotic solution. Individual samples of a discrete ensemble, obtained from the system of stochastic differential equations, equivalent to the Fokker-Planck equation developed here, satisfy a unit-sum constraint at all times and ensure a bounded sample space, similarly to the process developed in for the Dirichlet distribution. Consequently, the generalized Dirichlet diffusion process may be used to represent realizations of a fluctuating ensemble of N variables subject to a conservation principle.more » Compared to the Dirichlet distribution and process, the additional parameters of the generalized Dirichlet distribution allow a more general class of physical processes to be modeled with a more general covariance matrix.« less
Using beta binomials to estimate classification uncertainty for ensemble models.
Clark, Robert D; Liang, Wenkel; Lee, Adam C; Lawless, Michael S; Fraczkiewicz, Robert; Waldman, Marvin
2014-01-01
Quantitative structure-activity (QSAR) models have enormous potential for reducing drug discovery and development costs as well as the need for animal testing. Great strides have been made in estimating their overall reliability, but to fully realize that potential, researchers and regulators need to know how confident they can be in individual predictions. Submodels in an ensemble model which have been trained on different subsets of a shared training pool represent multiple samples of the model space, and the degree of agreement among them contains information on the reliability of ensemble predictions. For artificial neural network ensembles (ANNEs) using two different methods for determining ensemble classification - one using vote tallies and the other averaging individual network outputs - we have found that the distribution of predictions across positive vote tallies can be reasonably well-modeled as a beta binomial distribution, as can the distribution of errors. Together, these two distributions can be used to estimate the probability that a given predictive classification will be in error. Large data sets comprised of logP, Ames mutagenicity, and CYP2D6 inhibition data are used to illustrate and validate the method. The distributions of predictions and errors for the training pool accurately predicted the distribution of predictions and errors for large external validation sets, even when the number of positive and negative examples in the training pool were not balanced. Moreover, the likelihood of a given compound being prospectively misclassified as a function of the degree of consensus between networks in the ensemble could in most cases be estimated accurately from the fitted beta binomial distributions for the training pool. Confidence in an individual predictive classification by an ensemble model can be accurately assessed by examining the distributions of predictions and errors as a function of the degree of agreement among the constituent submodels. Further, ensemble uncertainty estimation can often be improved by adjusting the voting or classification threshold based on the parameters of the error distribution. Finally, the profiles for models whose predictive uncertainty estimates are not reliable provide clues to that effect without the need for comparison to an external test set.
NASA Astrophysics Data System (ADS)
Sanderson, B. M.
2017-12-01
The CMIP ensembles represent the most comprehensive source of information available to decision-makers for climate adaptation, yet it is clear that there are fundamental limitations in our ability to treat the ensemble as an unbiased sample of possible future climate trajectories. There is considerable evidence that models are not independent, and increasing complexity and resolution combined with computational constraints prevent a thorough exploration of parametric uncertainty or internal variability. Although more data than ever is available for calibration, the optimization of each model is influenced by institutional priorities, historical precedent and available resources. The resulting ensemble thus represents a miscellany of climate simulators which defy traditional statistical interpretation. Models are in some cases interdependent, but are sufficiently complex that the degree of interdependency is conditional on the application. Configurations have been updated using available observations to some degree, but not in a consistent or easily identifiable fashion. This means that the ensemble cannot be viewed as a true posterior distribution updated by available data, but nor can observational data alone be used to assess individual model likelihood. We assess recent literature for combining projections from an imperfect ensemble of climate simulators. Beginning with our published methodology for addressing model interdependency and skill in the weighting scheme for the 4th US National Climate Assessment, we consider strategies for incorporating process-based constraints on future response, perturbed parameter experiments and multi-model output into an integrated framework. We focus on a number of guiding questions: Is the traditional framework of confidence in projections inferred from model agreement leading to biased or misleading conclusions? Can the benefits of upweighting skillful models be reconciled with the increased risk of truth lying outside the weighted ensemble distribution? If CMIP is an ensemble of partially informed best-guesses, can we infer anything about the parent distribution of all possible models of the climate system (and if not, are we implicitly under-representing the risk of a climate catastrophe outside of the envelope of CMIP simulations)?
NASA Technical Reports Server (NTRS)
Todling, Ricardo; Diniz, F. L. R.; Takacs, L. L.; Suarez, M. J.
2018-01-01
Many hybrid data assimilation systems currently used for NWP employ some form of dual-analysis system approach. Typically a hybrid variational analysis is responsible for creating initial conditions for high-resolution forecasts, and an ensemble analysis system is responsible for creating sample perturbations used to form the flow-dependent part of the background error covariance required in the hybrid analysis component. In many of these, the two analysis components employ different methodologies, e.g., variational and ensemble Kalman filter. In such cases, it is not uncommon to have observations treated rather differently between the two analyses components; recentering of the ensemble analysis around the hybrid analysis is used to compensated for such differences. Furthermore, in many cases, the hybrid variational high-resolution system implements some type of four-dimensional approach, whereas the underlying ensemble system relies on a three-dimensional approach, which again introduces discrepancies in the overall system. Connected to these is the expectation that one can reliably estimate observation impact on forecasts issued from hybrid analyses by using an ensemble approach based on the underlying ensemble strategy of dual-analysis systems. Just the realization that the ensemble analysis makes substantially different use of observations as compared to their hybrid counterpart should serve as enough evidence of the implausibility of such expectation. This presentation assembles numerous anecdotal evidence to illustrate the fact that hybrid dual-analysis systems must, at the very minimum, strive for consistent use of the observations in both analysis sub-components. Simpler than that, this work suggests that hybrid systems can reliably be constructed without the need to employ a dual-analysis approach. In practice, the idea of relying on a single analysis system is appealing from a cost-maintenance perspective. More generally, single-analysis systems avoid contradictions such as having to choose one sub-component to generate performance diagnostics to another, possibly not fully consistent, component.
Zhou, Shenghan; Qian, Silin; Chang, Wenbing; Xiao, Yiyong; Cheng, Yang
2018-06-14
Timely and accurate state detection and fault diagnosis of rolling element bearings are very critical to ensuring the reliability of rotating machinery. This paper proposes a novel method of rolling bearing fault diagnosis based on a combination of ensemble empirical mode decomposition (EEMD), weighted permutation entropy (WPE) and an improved support vector machine (SVM) ensemble classifier. A hybrid voting (HV) strategy that combines SVM-based classifiers and cloud similarity measurement (CSM) was employed to improve the classification accuracy. First, the WPE value of the bearing vibration signal was calculated to detect the fault. Secondly, if a bearing fault occurred, the vibration signal was decomposed into a set of intrinsic mode functions (IMFs) by EEMD. The WPE values of the first several IMFs were calculated to form the fault feature vectors. Then, the SVM ensemble classifier was composed of binary SVM and the HV strategy to identify the bearing multi-fault types. Finally, the proposed model was fully evaluated by experiments and comparative studies. The results demonstrate that the proposed method can effectively detect bearing faults and maintain a high accuracy rate of fault recognition when a small number of training samples are available.
Transition-State Ensembles Navigate the Pathways of Enzyme Catalysis.
Mickert, Matthias J; Gorris, Hans H
2018-06-07
Transition-state theory (TST) provides an important framework for analyzing and explaining the reaction rates of enzymes. TST, however, needs to account for protein dynamic effects and heterogeneities in enzyme catalysis. We have analyzed the reaction rates of β-galactosidase and β-glucuronidase at the single molecule level by using large arrays of femtoliter-sized chambers. Heterogeneities in individual reaction rates yield information on the intrinsic distribution of the free energy of activation (Δ G ‡ ) in an enzyme ensemble. The broader distribution of Δ G ‡ in β-galactosidase compared to β-glucuronidase is attributed to β-galactosidase's multiple catalytic functions as a hydrolase and a transglycosylase. Based on the catalytic mechanism of β-galactosidase, we show that transition-state ensembles do not only contribute to enzyme catalysis but can also channel the catalytic pathway to the formation of different products. We conclude that β-galactosidase is an example of natural evolution, where a new catalytic pathway branches off from an established enzyme function. The functional division of work between enzymatic substates explains why the conformational space represented by the enzyme ensemble is larger than the conformational space that can be sampled by any given enzyme molecule during catalysis.
Fast Constrained Spectral Clustering and Cluster Ensemble with Random Projection
Liu, Wenfen
2017-01-01
Constrained spectral clustering (CSC) method can greatly improve the clustering accuracy with the incorporation of constraint information into spectral clustering and thus has been paid academic attention widely. In this paper, we propose a fast CSC algorithm via encoding landmark-based graph construction into a new CSC model and applying random sampling to decrease the data size after spectral embedding. Compared with the original model, the new algorithm has the similar results with the increase of its model size asymptotically; compared with the most efficient CSC algorithm known, the new algorithm runs faster and has a wider range of suitable data sets. Meanwhile, a scalable semisupervised cluster ensemble algorithm is also proposed via the combination of our fast CSC algorithm and dimensionality reduction with random projection in the process of spectral ensemble clustering. We demonstrate by presenting theoretical analysis and empirical results that the new cluster ensemble algorithm has advantages in terms of efficiency and effectiveness. Furthermore, the approximate preservation of random projection in clustering accuracy proved in the stage of consensus clustering is also suitable for the weighted k-means clustering and thus gives the theoretical guarantee to this special kind of k-means clustering where each point has its corresponding weight. PMID:29312447
NASA Astrophysics Data System (ADS)
Pollard, David; Chang, Won; Haran, Murali; Applegate, Patrick; DeConto, Robert
2016-05-01
A 3-D hybrid ice-sheet model is applied to the last deglacial retreat of the West Antarctic Ice Sheet over the last ˜ 20 000 yr. A large ensemble of 625 model runs is used to calibrate the model to modern and geologic data, including reconstructed grounding lines, relative sea-level records, elevation-age data and uplift rates, with an aggregate score computed for each run that measures overall model-data misfit. Two types of statistical methods are used to analyze the large-ensemble results: simple averaging weighted by the aggregate score, and more advanced Bayesian techniques involving Gaussian process-based emulation and calibration, and Markov chain Monte Carlo. The analyses provide sea-level-rise envelopes with well-defined parametric uncertainty bounds, but the simple averaging method only provides robust results with full-factorial parameter sampling in the large ensemble. Results for best-fit parameter ranges and envelopes of equivalent sea-level rise with the simple averaging method agree well with the more advanced techniques. Best-fit parameter ranges confirm earlier values expected from prior model tuning, including large basal sliding coefficients on modern ocean beds.
NASA Astrophysics Data System (ADS)
Zhang, Jiangjiang; Lin, Guang; Li, Weixuan; Wu, Laosheng; Zeng, Lingzao
2018-03-01
Ensemble smoother (ES) has been widely used in inverse modeling of hydrologic systems. However, for problems where the distribution of model parameters is multimodal, using ES directly would be problematic. One popular solution is to use a clustering algorithm to identify each mode and update the clusters with ES separately. However, this strategy may not be very efficient when the dimension of parameter space is high or the number of modes is large. Alternatively, we propose in this paper a very simple and efficient algorithm, i.e., the iterative local updating ensemble smoother (ILUES), to explore multimodal distributions of model parameters in nonlinear hydrologic systems. The ILUES algorithm works by updating local ensembles of each sample with ES to explore possible multimodal distributions. To achieve satisfactory data matches in nonlinear problems, we adopt an iterative form of ES to assimilate the measurements multiple times. Numerical cases involving nonlinearity and multimodality are tested to illustrate the performance of the proposed method. It is shown that overall the ILUES algorithm can well quantify the parametric uncertainties of complex hydrologic models, no matter whether the multimodal distribution exists.
Uncovering representations of sleep-associated hippocampal ensemble spike activity
NASA Astrophysics Data System (ADS)
Chen, Zhe; Grosmark, Andres D.; Penagos, Hector; Wilson, Matthew A.
2016-08-01
Pyramidal neurons in the rodent hippocampus exhibit spatial tuning during spatial navigation, and they are reactivated in specific temporal order during sharp-wave ripples observed in quiet wakefulness or slow wave sleep. However, analyzing representations of sleep-associated hippocampal ensemble spike activity remains a great challenge. In contrast to wake, during sleep there is a complete absence of animal behavior, and the ensemble spike activity is sparse (low occurrence) and fragmental in time. To examine important issues encountered in sleep data analysis, we constructed synthetic sleep-like hippocampal spike data (short epochs, sparse and sporadic firing, compressed timescale) for detailed investigations. Based upon two Bayesian population-decoding methods (one receptive field-based, and the other not), we systematically investigated their representation power and detection reliability. Notably, the receptive-field-free decoding method was found to be well-tuned for hippocampal ensemble spike data in slow wave sleep (SWS), even in the absence of prior behavioral measure or ground truth. Our results showed that in addition to the sample length, bin size, and firing rate, number of active hippocampal pyramidal neurons are critical for reliable representation of the space as well as for detection of spatiotemporal reactivated patterns in SWS or quiet wakefulness.
Enzymatic Kinetic Isotope Effects from Path-Integral Free Energy Perturbation Theory.
Gao, J
2016-01-01
Path-integral free energy perturbation (PI-FEP) theory is presented to directly determine the ratio of quantum mechanical partition functions of different isotopologs in a single simulation. Furthermore, a double averaging strategy is used to carry out the practical simulation, separating the quantum mechanical path integral exactly into two separate calculations, one corresponding to a classical molecular dynamics simulation of the centroid coordinates, and another involving free-particle path-integral sampling over the classical, centroid positions. An integrated centroid path-integral free energy perturbation and umbrella sampling (PI-FEP/UM, or simply, PI-FEP) method along with bisection sampling was summarized, which provides an accurate and fast convergent method for computing kinetic isotope effects for chemical reactions in solution and in enzymes. The PI-FEP method is illustrated by a number of applications, to highlight the computational precision and accuracy, the rule of geometrical mean in kinetic isotope effects, enhanced nuclear quantum effects in enzyme catalysis, and protein dynamics on temperature dependence of kinetic isotope effects. © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Peishu, Zong; Jianping, Tang; Shuyu, Wang; Lingyun, Xie; Jianwei, Yu; Yunqian, Zhu; Xiaorui, Niu; Chao, Li
2017-08-01
The parameterization of physical processes is one of the critical elements to properly simulate the regional climate over eastern China. It is essential to conduct detailed analyses on the effect of physical parameterization schemes on regional climate simulation, to provide more reliable regional climate change information. In this paper, we evaluate the 25-year (1983-2007) summer monsoon climate characteristics of precipitation and surface air temperature by using the regional spectral model (RSM) with different physical schemes. The ensemble results using the reliability ensemble averaging (REA) method are also assessed. The result shows that the RSM model has the capacity to reproduce the spatial patterns, the variations, and the temporal tendency of surface air temperature and precipitation over eastern China. And it tends to predict better climatology characteristics over the Yangtze River basin and the South China. The impact of different physical schemes on RSM simulations is also investigated. Generally, the CLD3 cloud water prediction scheme tends to produce larger precipitation because of its overestimation of the low-level moisture. The systematic biases derived from the KF2 cumulus scheme are larger than those from the RAS scheme. The scale-selective bias correction (SSBC) method improves the simulation of the temporal and spatial characteristics of surface air temperature and precipitation and advances the circulation simulation capacity. The REA ensemble results show significant improvement in simulating temperature and precipitation distribution, which have much higher correlation coefficient and lower root mean square error. The REA result of selected experiments is better than that of nonselected experiments, indicating the necessity of choosing better ensemble samples for ensemble.
Xu, Xinyu; Tian, Yu; Wang, Guolin; Tian, Xin
2014-08-15
Working memory (WM) refers to the temporary storage and manipulation of information necessary for performance of complex cognitive tasks. There is a growing interest in whether and how propofol anesthesia inhibits WM function. The aim of this study is to investigate the possible inhibition mechanism of propofol anesthesia from the view of single neuron and neuronal ensemble activities. Adult SD rats were randomly divided into two groups: propofol group (0.9 mg kg(-1)min(-1), 2h via a tail vein catheter) and control group. All the rats were tested for working memory performances in a Y-maze-rewarded alternation task (a task of delayed non-matched-to-sample) at 24, 48, 72 h after propofol anesthesia, and the behavior results of WM tasks were recorded at the same time. Spatio-temporal trains of action potentials were obtained from the original signals. Single neuron activity was characterized by peri-event time histograms analysis and neuron ensemble activities were characterized by Granger causality to describe the interactions within the neuron ensemble. The results show that: comparing with the control group, the percentage of neurons excited and related to WM was significantly decreased (p<0.01 in 24h, p<0.05 in 48 h); the interactions within neuron ensemble were significantly weakened (p<0.01 in 24h, p<0.05 in 48 h), whereas no significant difference in 72 h (p>0.05), which were consistent with the behavior results. These findings could lead to improved understanding of the mechanism of anesthesia inhibition on WM functions from the view of single neuron activity and neuron ensemble interactions. Copyright © 2014 Elsevier B.V. All rights reserved.
Deshmukh, Lalit; Schwieters, Charles D; Grishaev, Alexander; Clore, G Marius
2016-06-03
Nucleic-acid-related events in the HIV-1 replication cycle are mediated by nucleocapsid, a small protein comprising two zinc knuckles connected by a short flexible linker and flanked by disordered termini. Combining experimental NMR residual dipolar couplings, solution X-ray scattering and protein engineering with ensemble simulated annealing, we obtain a quantitative description of the configurational space sampled by the two zinc knuckles, the linker and disordered termini in the absence of nucleic acids. We first compute the conformational ensemble (with an optimal size of three members) of an engineered nucleocapsid construct lacking the N- and C-termini that satisfies the experimental restraints, and then validate this ensemble, as well as characterize the disordered termini, using the experimental data from the full-length nucleocapsid construct. The experimental and computational strategy is generally applicable to multidomain proteins. Differential flexibility within the linker results in asymmetric motion of the zinc knuckles which may explain their functionally distinct roles despite high sequence identity. One of the configurations (populated at a level of ≈40 %) closely resembles that observed in various ligand-bound forms, providing evidence for conformational selection and a mechanistic link between protein dynamics and function. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
TID measurement using oblique transmissions of HF pulses
NASA Astrophysics Data System (ADS)
Galkin, Ivan; Reinisch, Bodo; Huang, Xueqin; Paznukhov, Vadym; Hamel, Ryan; Kozlov, Alexander; Belehaki, Anna
2017-04-01
The Traveling Ionospheric Disturbance (TID), a wave-like signature of moving plasma density modulation in the ionosphere, is widely acknowledged for its utility in backtracking the anomalous events responsible for the TID generation, and as a major inconvenience to high-frequency (HF) operational systems because of its deleterious impact on the accuracy of navigation and geolocation. The pilot project "Net-TIDE" for the real-time detection and evaluation of TIDs began its operation in 2016 based on the remote-sensing data from synchronized, network-coordinated HF sounding between pairs of DPS4D ionosondes at five participating observatories in Europe. Measurement of all signal properties (Doppler frequency, angle of arrival, and time-of-flight from transmitter to receiver) proved to be instrumental in detecting the TID and deducing the TID parameters: amplitude, wavelength, phase velocity, and direction of propagation. Processing of the measured HF signal data required a specialized signal processing technique that is capable of consistently extracting different signals that have propagated along different ionospheric paths. The multi-path signal environment proved to be the greatest challenge for the reliable TID specification by Net-TIDE, demanding the development of an intelligent system for "signal tracking". The intelligent system is based on a neural network model of a pre-attentive vision capable of extracting continuous signal tracks from the multi-path signal ensemble. Specific examples of the Net-TIDE algorithm suite operation and its suitability for a fully automated TID warning service are discussed.
Modality-Driven Classification and Visualization of Ensemble Variance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bensema, Kevin; Gosink, Luke; Obermaier, Harald
Paper for the IEEE Visualization Conference Advances in computational power now enable domain scientists to address conceptual and parametric uncertainty by running simulations multiple times in order to sufficiently sample the uncertain input space.
Positron lifetime spectrometer using a DC positron beam
Xu, Jun; Moxom, Jeremy
2003-10-21
An entrance grid is positioned in the incident beam path of a DC beam positron lifetime spectrometer. The electrical potential difference between the sample and the entrance grid provides simultaneous acceleration of both the primary positrons and the secondary electrons. The result is a reduction in the time spread induced by the energy distribution of the secondary electrons. In addition, the sample, sample holder, entrance grid, and entrance face of the multichannel plate electron detector assembly are made parallel to each other, and are arranged at a tilt angle to the axis of the positron beam to effectively separate the path of the secondary electrons from the path of the incident positrons.
NASA Astrophysics Data System (ADS)
Cross, E. S.; Onasch, T. B.; Canagaratna, M.; Jayne, J. T.; Kimmel, J.; Yu, X.-Y.; Alexander, M. L.; Worsnop, D. R.; Davidovits, P.
2008-12-01
We present the first single particle results obtained using an Aerodyne time-of-flight aerosol mass spectrometer coupled with a light scattering module (LS-ToF-AMS). The instrument was deployed at the T1 ground site approximately 40 km northeast of the Mexico City Metropolitan Area (MCMA) as part of the MILAGRO field study in March of 2006. The instrument was operated as a standard AMS from 12-30 March, acquiring average chemical composition and size distributions for the ambient aerosol, and in single particle mode from 27-30 March. Over a 75-h sampling period, 12 853 single particle mass spectra were optically triggered, saved, and analyzed. The correlated optical and chemical detection allowed detailed examination of single particle collection and quantification within the LS-ToF-AMS. The single particle data enabled the mixing states of the ambient aerosol to be characterized within the context of the size-resolved ensemble chemical information. The particulate mixing states were examined as a function of sampling time and most of the particles were found to be internal mixtures containing many of the organic and inorganic species identified in the ensemble analysis. The single particle mass spectra were deconvolved, using techniques developed for ensemble AMS data analysis, into HOA, OOA, NH4NO3, (NH4)2SO4, and NH4Cl fractions. Average single particle mass and chemistry measurements are shown to be in agreement with ensemble MS and PTOF measurements. While a significant fraction of ambient particles were internal mixtures of varying degrees, single particle measurements of chemical composition allowed the identification of time periods during which the ambient ensemble was externally mixed. In some cases the chemical composition of the particles suggested a likely source. Throughout the full sampling period, the ambient ensemble was an external mixture of combustion-generated HOA particles from local sources (e.g. traffic), with number concentrations peaking during morning rush hour (04:00-08:00 LT) each day, and more processed particles of mixed composition from nonspecific sources. From 09:00-12:00 LT all particles within the ambient ensemble, including the locally produced HOA particles, became coated with NH4NO3 due to photochemical production of HNO3. The number concentration of externally mixed HOA particles remained low during daylight hours. Throughout the afternoon the OOA component dominated the organic fraction of the single particles, likely due to secondary organic aerosol formation and condensation. Single particle mass fractions of (NH4)2SO4 were lowest during the day and highest during the night. In one instance, gas-to-particle condensation of (NH4)2SO4 was observed on all measured particles within a strong SO2 plume arriving at T1 from the northwest. Particles with high NH4Cl mass fractions were identified during early morning periods. A limited number of particles (~5% of the total number) with mass spectral features characteristic of biomass burning were also identified.
Re-evaluation of P-T paths across the Himalayan Main Central Thrust
NASA Astrophysics Data System (ADS)
Catlos, E. J.; Harrison, M.; Kelly, E. D.; Ashley, K.; Lovera, O. M.; Etzel, T.; Lizzadro-McPherson, D. J.
2016-12-01
The Main Central Thrust (MCT) is the dominant crustal thickening structure in the Himalayas, juxtaposing high-grade Greater Himalayan Crystalline rocks over the lower-grade Lesser Himalaya Formations. The fault is underlain by a 2 to 12-km-thick sequence of deformed rocks characterized by an apparent inverted metamorphic gradient, termed the MCT shear zone. Garnet-bearing rocks sampled from across the MCT along the Marysandi River in central Nepal contain monazite that decrease in age from Early Miocene (ca. 20 Ma) in the hanging wall to Late Miocene-Pliocene (ca. 7 Ma and 3 Ma) towards structurally lower levels in the shear zone. We obtained high-resolution garnet-zoning pressure-temperature (P-T) paths from 11 of the same rocks used for monazite geochronology using a recently-developed semi-automated Gibbs-free-energy-minimization technique. Quartz-in-garnet Raman barometry refined the locations of the paths. Diffusional re-equilibration of garnet zoning in hanging wall samples prevented accurate path determinations from most Greater Himalayan Crystalline samples, but one that shows a bell-shaped Mn zoning profile shows a slight decrease in P (from 8.2 to 7.6kbar) with increase in T (from 590 to 640ºC). Three MCT shear zone samples were modeled: one yields a simple path increasing in both P and T (6 to 7kbar, 540 to 580ºC); the others yield N-shaped paths that occupy similar P-T space (4 to 5.5 kbar, 500 to 560ºC). Five lower lesser Himalaya garnet-bearing rocks were modeled. One yields a path increasing in both P-T (6 to 7 kbar, 525 to 550ºC) but others show either sharp compression/decompression or N-shape paths (within 4.5-6 kbar and 530-580ºC). The lowermost sample decreases in P (5.5 to 5 kbar) over increasing T (540 to 580°C). No progressive change is seen from one type of path to another within the Lesser Himalayan Formations to the MCT zone. The results using the modeling approach yield lower P-T conditions compared to the Gibbs method and lower core/rim P-T conditions compared to traditional thermometers and barometers. Inclusion barometry suggests that baric estimates from the modeling may be underestimated by 2-4 kbar. Despite uncertainty, path shapes are consistent with a model in which the MCT shear zone experienced a progressive accretion of footwall slivers.
Infrared (IR) photon-sensitive spectromicroscopy in a cryogenic environment
Pereverzev, Sergey
2016-06-14
A system designed to suppress thermal radiation background and to allow IR single-photon sensitive spectromicroscopy of small samples by using both absorption, reflection, and emission/luminescence measurements. The system in one embodiment includes: a light source; a plurality of cold mirrors configured to direct light along a beam path; a cold or warm sample holder in the beam path; windows of sample holder (or whole sample holder) are transparent in a spectral region of interest, so they do not emit thermal radiation in the same spectral region of interest; a cold monochromator or other cold spectral device configured to direct a selected fraction of light onto a cold detector; a system of cold apertures and shields positioned along the beam path to prevent unwanted thermal radiation from arriving at the cold monochromator and/or the detector; a plurality of optical, IR and microwave filters positioned along the beam path and configured to adjust a spectral composition of light incident upon the sample under investigation and/or on the detector; a refrigerator configured to maintain the detector at a temperature below 1.0K; and an enclosure configured to: thermally insulate the light source, the plurality of mirrors, the sample holder, the cold monochromator and the refrigerator.
Fielding, M. D.; Chiu, J. C.; Hogan, R. J.; ...
2015-02-16
Active remote sensing of marine boundary-layer clouds is challenging as drizzle drops often dominate the observed radar reflectivity. We present a new method to simultaneously retrieve cloud and drizzle vertical profiles in drizzling boundary-layer cloud using surface-based observations of radar reflectivity, lidar attenuated backscatter, and zenith radiances. Specifically, the vertical structure of droplet size and water content of both cloud and drizzle is characterised throughout the cloud. An ensemble optimal estimation approach provides full error statistics given the uncertainty in the observations. To evaluate the new method, we first perform retrievals using synthetic measurements from large-eddy simulation snapshots of cumulusmore » under stratocumulus, where cloud water path is retrieved with an error of 31 g m −2. The method also performs well in non-drizzling clouds where no assumption of the cloud profile is required. We then apply the method to observations of marine stratocumulus obtained during the Atmospheric Radiation Measurement MAGIC deployment in the northeast Pacific. Here, retrieved cloud water path agrees well with independent 3-channel microwave radiometer retrievals, with a root mean square difference of 10–20 g m −2.« less
NASA Astrophysics Data System (ADS)
Seo, H.; Kwon, Y. O.; Joyce, T. M.; Ummenhofer, C.
2016-12-01
This study examines the North Atlantic atmospheric circulation response to the meridional shift of Gulf Stream path using a large-ensemble, high-resolution, and hemispheric-scale WRF simulations. The model is forced with wintertime SST anomalies derived from a wide range of Gulf Stream shift scenarios. The key result of the model experiments, supported in part by an independent analysis of a reanalysis data set, is that the large-scale, quasi-steady North Atlantic circulation response is unambiguously nonlinear about the sign and amplitude of chosen SST anomalies. This nonlinear response prevails over the weak linear response and resembles the negative North Atlantic Oscillation, the leading intrinsic mode of variability in the model and the observations. Further analysis of the associated dynamics reveals that the nonlinear responses are accompanied by the anomalous southward shift of the North Atlantic eddy-driven jet stream, which is reinforced nearly equally by the high-frequency transient eddy feedback and the low-frequency high-latitude wave breaking events. The result highlights the importance of the intrinsically nonlinear transient eddy dynamics and eddy-mean flow interactions in generating the nonlinear forced response to the meridional shift in the Gulf Stream.
Deep biomarkers of human aging: Application of deep neural networks to biomarker development
Putin, Evgeny; Mamoshina, Polina; Aliper, Alexander; Korzinkin, Mikhail; Moskalev, Alexey; Kolosov, Alexey; Ostrovskiy, Alexander; Cantor, Charles; Vijg, Jan; Zhavoronkov, Alex
2016-01-01
One of the major impediments in human aging research is the absence of a comprehensive and actionable set of biomarkers that may be targeted and measured to track the effectiveness of therapeutic interventions. In this study, we designed a modular ensemble of 21 deep neural networks (DNNs) of varying depth, structure and optimization to predict human chronological age using a basic blood test. To train the DNNs, we used over 60,000 samples from common blood biochemistry and cell count tests from routine health exams performed by a single laboratory and linked to chronological age and sex. The best performing DNN in the ensemble demonstrated 81.5 % epsilon-accuracy r = 0.90 with R2 = 0.80 and MAE = 6.07 years in predicting chronological age within a 10 year frame, while the entire ensemble achieved 83.5% epsilon-accuracy r = 0.91 with R2 = 0.82 and MAE = 5.55 years. The ensemble also identified the 5 most important markers for predicting human chronological age: albumin, glucose, alkaline phosphatase, urea and erythrocytes. To allow for public testing and evaluate real-life performance of the predictor, we developed an online system available at http://www.aging.ai. The ensemble approach may facilitate integration of multi-modal data linked to chronological age and sex that may lead to simple, minimally invasive, and affordable methods of tracking integrated biomarkers of aging in humans and performing cross-species feature importance analysis. PMID:27191382
Deep biomarkers of human aging: Application of deep neural networks to biomarker development.
Putin, Evgeny; Mamoshina, Polina; Aliper, Alexander; Korzinkin, Mikhail; Moskalev, Alexey; Kolosov, Alexey; Ostrovskiy, Alexander; Cantor, Charles; Vijg, Jan; Zhavoronkov, Alex
2016-05-01
One of the major impediments in human aging research is the absence of a comprehensive and actionable set of biomarkers that may be targeted and measured to track the effectiveness of therapeutic interventions. In this study, we designed a modular ensemble of 21 deep neural networks (DNNs) of varying depth, structure and optimization to predict human chronological age using a basic blood test. To train the DNNs, we used over 60,000 samples from common blood biochemistry and cell count tests from routine health exams performed by a single laboratory and linked to chronological age and sex. The best performing DNN in the ensemble demonstrated 81.5 % epsilon-accuracy r = 0.90 with R(2) = 0.80 and MAE = 6.07 years in predicting chronological age within a 10 year frame, while the entire ensemble achieved 83.5% epsilon-accuracy r = 0.91 with R(2) = 0.82 and MAE = 5.55 years. The ensemble also identified the 5 most important markers for predicting human chronological age: albumin, glucose, alkaline phosphatase, urea and erythrocytes. To allow for public testing and evaluate real-life performance of the predictor, we developed an online system available at http://www.aging.ai. The ensemble approach may facilitate integration of multi-modal data linked to chronological age and sex that may lead to simple, minimally invasive, and affordable methods of tracking integrated biomarkers of aging in humans and performing cross-species feature importance analysis.
NASA Astrophysics Data System (ADS)
Qiao, Qin; Zhang, Hou-Dao; Huang, Xuhui
2016-04-01
Simulated tempering (ST) is a widely used enhancing sampling method for Molecular Dynamics simulations. As one expanded ensemble method, ST is a combination of canonical ensembles at different temperatures and the acceptance probability of cross-temperature transitions is determined by both the temperature difference and the weights of each temperature. One popular way to obtain the weights is to adopt the free energy of each canonical ensemble, which achieves uniform sampling among temperature space. However, this uniform distribution in temperature space may not be optimal since high temperatures do not always speed up the conformational transitions of interest, as anti-Arrhenius kinetics are prevalent in protein and RNA folding. Here, we propose a new method: Enhancing Pairwise State-transition Weights (EPSW), to obtain the optimal weights by minimizing the round-trip time for transitions among different metastable states at the temperature of interest in ST. The novelty of the EPSW algorithm lies in explicitly considering the kinetics of conformation transitions when optimizing the weights of different temperatures. We further demonstrate the power of EPSW in three different systems: a simple two-temperature model, a two-dimensional model for protein folding with anti-Arrhenius kinetics, and the alanine dipeptide. The results from these three systems showed that the new algorithm can substantially accelerate the transitions between conformational states of interest in the ST expanded ensemble and further facilitate the convergence of thermodynamics compared to the widely used free energy weights. We anticipate that this algorithm is particularly useful for studying functional conformational changes of biological systems where the initial and final states are often known from structural biology experiments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Qiao, Qin, E-mail: qqiao@ust.hk; Zhang, Hou-Dao; Huang, Xuhui, E-mail: xuhuihuang@ust.hk
2016-04-21
Simulated tempering (ST) is a widely used enhancing sampling method for Molecular Dynamics simulations. As one expanded ensemble method, ST is a combination of canonical ensembles at different temperatures and the acceptance probability of cross-temperature transitions is determined by both the temperature difference and the weights of each temperature. One popular way to obtain the weights is to adopt the free energy of each canonical ensemble, which achieves uniform sampling among temperature space. However, this uniform distribution in temperature space may not be optimal since high temperatures do not always speed up the conformational transitions of interest, as anti-Arrhenius kineticsmore » are prevalent in protein and RNA folding. Here, we propose a new method: Enhancing Pairwise State-transition Weights (EPSW), to obtain the optimal weights by minimizing the round-trip time for transitions among different metastable states at the temperature of interest in ST. The novelty of the EPSW algorithm lies in explicitly considering the kinetics of conformation transitions when optimizing the weights of different temperatures. We further demonstrate the power of EPSW in three different systems: a simple two-temperature model, a two-dimensional model for protein folding with anti-Arrhenius kinetics, and the alanine dipeptide. The results from these three systems showed that the new algorithm can substantially accelerate the transitions between conformational states of interest in the ST expanded ensemble and further facilitate the convergence of thermodynamics compared to the widely used free energy weights. We anticipate that this algorithm is particularly useful for studying functional conformational changes of biological systems where the initial and final states are often known from structural biology experiments.« less
Muhlestein, Whitney E; Akagi, Dallin S; Chotai, Silky; Chambless, Lola B
2017-08-01
Racial disparities exist in health care, frequently resulting in unfavorable outcomes for minority patients. Here, we use guided machine learning (ML) ensembles to model the impact of race on discharge disposition and length of stay (LOS) after brain tumor surgery from the Healthcare Cost and Utilization Project National Inpatient Sample. We performed a retrospective cohort study of 41,222 patients who underwent craniotomies for brain tumors from 2002 to 2011 and were registered in the National Inpatient Sample. Twenty-six ML algorithms were trained on prehospitalization variables to predict non-home discharge and extended LOS (>7 days) after brain tumor resection, and the most predictive algorithms combined to create ensemble models. Partial dependence analysis was performed to measure the independent impact of race on the ensembles. The guided ML ensembles predicted non-home disposition (area under the curve, 0.796) and extended LOS (area under the curve, 0.824) with good discrimination. Partial dependence analysis showed that black race increases the risk of non-home discharge and extended LOS over white race by 6.9% and 6.5%, respectively. Other, nonblack race increases the risk of extended LOS over white race by 6.0%. The impact of race on these outcomes is not seen when analyzing the general inpatient or general operative population. Minority race independently increases the risk of extended LOS and black race increases the risk of non-home discharge in patients undergoing brain tumor resection, a finding not mimicked in the general inpatient or operative population. Recognition of the influence of race on discharge and LOS could generate interventions that may improve outcomes in this population. Copyright © 2017 Elsevier Inc. All rights reserved.
Photodeposited Pd Nanoparticles with Disordered Structure for Phenylacetylene Semihydrogenation
Fan, Qining; He, Sha; Hao, Lin; Liu, Xin; Zhu, Yue; Xu, Sailong; Zhang, Fazhi
2017-01-01
Developing effective heterogeneous metal catalysts with high selectivity and satisfactory activity for chemoselective hydrogenation of alkyne to alkene is of great importance in the chemical industry. Herein, we report our efforts to fabricate TiO2-supported Pd catalysts by a photodeposition method at room temperature for phenylacetylene semihydrogenation to styrene. The resulting Pd/TiO2 catalyst, possessing smaller Pd ensembles with ambiguous lattice fringes and more low coordination Pd sites, exhibits higher styrene selectivity compared to two contrastive Pd/TiO2 samples with larger ensembles and well-organized crystal structure fabricated by deposition-precipitation or photodeposition with subsequent thermal treatment at 300 °C. The sample derived from photodeposition exhibits greatly slow styrene hydrogenation in kinetic evaluation because the disordered structure of Pd particles in photodeposited Pd/TiO2 may prevent the formation of β-hydride phases and probably produce more surface H atoms, which may favor high styrene selectivity. PMID:28176843
Near-optimal protocols in complex nonequilibrium transformations
Gingrich, Todd R.; Rotskoff, Grant M.; Crooks, Gavin E.; ...
2016-08-29
The development of sophisticated experimental means to control nanoscale systems has motivated efforts to design driving protocols that minimize the energy dissipated to the environment. Computational models are a crucial tool in this practical challenge. In this paper, we describe a general method for sampling an ensemble of finite-time, nonequilibrium protocols biased toward a low average dissipation. In addition, we show that this scheme can be carried out very efficiently in several limiting cases. As an application, we sample the ensemble of low-dissipation protocols that invert the magnetization of a 2D Ising model and explore how the diversity of themore » protocols varies in response to constraints on the average dissipation. In this example, we find that there is a large set of protocols with average dissipation close to the optimal value, which we argue is a general phenomenon.« less
NASA Astrophysics Data System (ADS)
Angerer, Andreas; Astner, Thomas; Wirtitsch, Daniel; Sumiya, Hitoshi; Onoda, Shinobu; Isoya, Junichi; Putz, Stefan; Majer, Johannes
2016-07-01
We design and implement 3D-lumped element microwave cavities that spatially focus magnetic fields to a small mode volume. They allow coherent and uniform coupling to electron spins hosted by nitrogen vacancy centers in diamond. We achieve large homogeneous single spin coupling rates, with an enhancement of more than one order of magnitude compared to standard 3D cavities with a fundamental resonance at 3 GHz. Finite element simulations confirm that the magnetic field distribution is homogeneous throughout the entire sample volume, with a root mean square deviation of 1.54%. With a sample containing 1017 nitrogen vacancy electron spins, we achieve a collective coupling strength of Ω = 12 MHz, a cooperativity factor C = 27, and clearly enter the strong coupling regime. This allows to interface a macroscopic spin ensemble with microwave circuits, and the homogeneous Rabi frequency paves the way to manipulate the full ensemble population in a coherent way.
NASA Astrophysics Data System (ADS)
Foreman-Mackey, Daniel; Hogg, David W.; Lang, Dustin; Goodman, Jonathan
2013-03-01
We introduce a stable, well tested Python implementation of the affine-invariant ensemble sampler for Markov chain Monte Carlo (MCMC) proposed by Goodman & Weare (2010). The code is open source and has already been used in several published projects in the astrophysics literature. The algorithm behind emcee has several advantages over traditional MCMC sampling methods and it has excellent performance as measured by the autocorrelation time (or function calls per independent sample). One major advantage of the algorithm is that it requires hand-tuning of only 1 or 2 parameters compared to ˜N2 for a traditional algorithm in an N-dimensional parameter space. In this document, we describe the algorithm and the details of our implementation. Exploiting the parallelism of the ensemble method, emcee permits any user to take advantage of multiple CPU cores without extra effort. The code is available online at http://dan.iel.fm/emcee under the GNU General Public License v2.
Photodeposited Pd Nanoparticles with Disordered Structure for Phenylacetylene Semihydrogenation
NASA Astrophysics Data System (ADS)
Fan, Qining; He, Sha; Hao, Lin; Liu, Xin; Zhu, Yue; Xu, Sailong; Zhang, Fazhi
2017-02-01
Developing effective heterogeneous metal catalysts with high selectivity and satisfactory activity for chemoselective hydrogenation of alkyne to alkene is of great importance in the chemical industry. Herein, we report our efforts to fabricate TiO2-supported Pd catalysts by a photodeposition method at room temperature for phenylacetylene semihydrogenation to styrene. The resulting Pd/TiO2 catalyst, possessing smaller Pd ensembles with ambiguous lattice fringes and more low coordination Pd sites, exhibits higher styrene selectivity compared to two contrastive Pd/TiO2 samples with larger ensembles and well-organized crystal structure fabricated by deposition-precipitation or photodeposition with subsequent thermal treatment at 300 °C. The sample derived from photodeposition exhibits greatly slow styrene hydrogenation in kinetic evaluation because the disordered structure of Pd particles in photodeposited Pd/TiO2 may prevent the formation of β-hydride phases and probably produce more surface H atoms, which may favor high styrene selectivity.
Measurement of refractive index of photopolymer for holographic gratings
NASA Astrophysics Data System (ADS)
Watanabe, Eriko; Mizuno, Jun; Fujikawa, Chiemi; Kodate, Kashiko
2007-02-01
We have made attempts to measure directly the small-scale variation of optical path lengths in photopolymer samples. For those with uniform thickness, the measured quantity is supposed to be proportional to the refractive index of the photopolymer. The system is based on a Mach-Zehnder interferometer using phase-locking technique and measures the change in optical path length during the sample is scanned across the optical axis. The spatial resolution is estimated to be 2μm, which is limited by the sample thickness. The path length resolution is estimated to be 6nm, which corresponds to the change in refractive index less than 10 -3 for the sample of 10μm thick. The measurement results showed clearly that the refractive index of photopolymer is not simply proportional to the exposure energy, contrary to the conventional photosensitive materials such as silver halide emulsion and dichromated gelatine. They also revealed the refractive index fluctuation in uniformly exposed photopolymer sample, which explains the milky appearance that sometimes observed in thick samples.
An Optimal Estimation Method to Obtain Surface Layer Turbulent Fluxes from Profile Measurements
NASA Astrophysics Data System (ADS)
Kang, D.
2015-12-01
In the absence of direct turbulence measurements, the turbulence characteristics of the atmospheric surface layer are often derived from measurements of the surface layer mean properties based on Monin-Obukhov Similarity Theory (MOST). This approach requires two levels of the ensemble mean wind, temperature, and water vapor, from which the fluxes of momentum, sensible heat, and water vapor can be obtained. When only one measurement level is available, the roughness heights and the assumed properties of the corresponding variables at the respective roughness heights are used. In practice, the temporal mean with large number of samples are used in place of the ensemble mean. However, in many situations the samples of data are taken from multiple levels. It is thus desirable to derive the boundary layer flux properties using all measurements. In this study, we used an optimal estimation approach to derive surface layer properties based on all available measurements. This approach assumes that the samples are taken from a population whose ensemble mean profile follows the MOST. An optimized estimate is obtained when the results yield a minimum cost function defined as a weighted summation of all error variance at each sample altitude. The weights are based one sample data variance and the altitude of the measurements. This method was applied to measurements in the marine atmospheric surface layer from a small boat using radiosonde on a tethered balloon where temperature and relative humidity profiles in the lowest 50 m were made repeatedly in about 30 minutes. We will present the resultant fluxes and the derived MOST mean profiles using different sets of measurements. The advantage of this method over the 'traditional' methods will be illustrated. Some limitations of this optimization method will also be discussed. Its application to quantify the effects of marine surface layer environment on radar and communication signal propagation will be shown as well.
Keinan, Shahar; Nocek, Judith M; Hoffman, Brian M; Beratan, David N
2012-10-28
Formation of a transient [myoglobin (Mb), cytochrome b(5) (cyt b(5))] complex is required for the reductive repair of inactive ferri-Mb to its functional ferro-Mb state. The [Mb, cyt b(5)] complex exhibits dynamic docking (DD), with its cyt b(5) partner in rapid exchange at multiple sites on the Mb surface. A triple mutant (Mb(3M)) was designed as part of efforts to shift the electron-transfer process to the simple docking (SD) regime, in which reactive binding occurs at a restricted, reactive region on the Mb surface that dominates the docked ensemble. An electrostatically-guided brownian dynamics (BD) docking protocol was used to generate an initial ensemble of reactive configurations of the complex between unrelaxed partners. This ensemble samples a broad and diverse array of heme-heme distances and orientations. These configurations seeded all-atom constrained molecular dynamics simulations (MD) to generate relaxed complexes for the calculation of electron tunneling matrix elements (T(DA)) through tunneling-pathway analysis. This procedure for generating an ensemble of relaxed complexes combines the ability of BD calculations to sample the large variety of available conformations and interprotein distances, with the ability of MD to generate the atomic level information, especially regarding the structure of water molecules at the protein-protein interface, that defines electron-tunneling pathways. We used the calculated T(DA) values to compute ET rates for the [Mb(wt), cyt b(5)] complex and for the complex with a mutant that has a binding free energy strengthened by three D/E → K charge-reversal mutations, [Mb(3M), cyt b(5)]. The calculated rate constants are in agreement with the measured values, and the mutant complex ensemble has many more geometries with higher T(DA) values than does the wild-type Mb complex. Interestingly, water plays a double role in this electron-transfer system, lowering the tunneling barrier as well as inducing protein interface remodeling that screens the repulsion between the negatively-charged propionates of the two hemes.
Estimating uncertainty of Full Waveform Inversion with Ensemble-based methods
NASA Astrophysics Data System (ADS)
Thurin, J.; Brossier, R.; Métivier, L.
2017-12-01
Uncertainty estimation is one key feature of tomographic applications for robust interpretation. However, this information is often missing in the frame of large scale linearized inversions, and only the results at convergence are shown, despite the ill-posed nature of the problem. This issue is common in the Full Waveform Inversion community.While few methodologies have already been proposed in the literature, standard FWI workflows do not include any systematic uncertainty quantifications methods yet, but often try to assess the result's quality through cross-comparison with other results from seismic or comparison with other geophysical data. With the development of large seismic networks/surveys, the increase in computational power and the more and more systematic application of FWI, it is crucial to tackle this problem and to propose robust and affordable workflows, in order to address the uncertainty quantification problem faced for near surface targets, crustal exploration, as well as regional and global scales.In this work (Thurin et al., 2017a,b), we propose an approach which takes advantage of the Ensemble Transform Kalman Filter (ETKF) proposed by Bishop et al., (2001), in order to estimate a low-rank approximation of the posterior covariance matrix of the FWI problem, allowing us to evaluate some uncertainty information of the solution. Instead of solving the FWI problem through a Bayesian inversion with the ETKF, we chose to combine a conventional FWI, based on local optimization, and the ETKF strategies. This scheme allows combining the efficiency of local optimization for solving large scale inverse problems and make the sampling of the local solution space possible thanks to its embarrassingly parallel property. References:Bishop, C. H., Etherton, B. J. and Majumdar, S. J., 2001. Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Monthly weather review, 129(3), 420-436.Thurin, J., Brossier, R. and Métivier, L. 2017,a.: Ensemble-Based Uncertainty Estimation in Full Waveform Inversion. 79th EAGE Conference and Exhibition 2017, (12 - 15 June, 2017)Thurin, J., Brossier, R. and Métivier, L. 2017,b.: An Ensemble-Transform Kalman Filter - Full Waveform Inversion scheme for Uncertainty estimation; SEG Technical Program Expanded Abstracts 2012
Biased Metropolis Sampling for Rugged Free Energy Landscapes
NASA Astrophysics Data System (ADS)
Berg, Bernd A.
2003-11-01
Metropolis simulations of all-atom models of peptides (i.e. small proteins) are considered. Inspired by the funnel picture of Bryngelson and Wolyness, a transformation of the updating probabilities of the dihedral angles is defined, which uses probability densities from a higher temperature to improve the algorithmic performance at a lower temperature. The method is suitable for canonical as well as for generalized ensemble simulations. A simple approximation to the full transformation is tested at room temperature for Met-Enkephalin in vacuum. Integrated autocorrelation times are found to be reduced by factors close to two and a similar improvement due to generalized ensemble methods enters multiplicatively.
Efficient Simulation of Explicitly Solvated Proteins in the Well-Tempered Ensemble.
Deighan, Michael; Bonomi, Massimiliano; Pfaendtner, Jim
2012-07-10
Herein, we report significant reduction in the cost of combined parallel tempering and metadynamics simulations (PTMetaD). The efficiency boost is achieved using the recently proposed well-tempered ensemble (WTE) algorithm. We studied the convergence of PTMetaD-WTE conformational sampling and free energy reconstruction of an explicitly solvated 20-residue tryptophan-cage protein (trp-cage). A set of PTMetaD-WTE simulations was compared to a corresponding standard PTMetaD simulation. The properties of PTMetaD-WTE and the convergence of the calculations were compared. The roles of the number of replicas, total simulation time, and adjustable WTE parameter γ were studied.
Pan, Feng; Tao, Guohua
2013-03-07
Full semiclassical (SC) initial value representation (IVR) for time correlation functions involves a double phase space average over a set of two phase points, each of which evolves along a classical path. Conventionally, the two initial phase points are sampled independently for all degrees of freedom (DOF) in the Monte Carlo procedure. Here, we present an efficient importance sampling scheme by including the path correlation between the two initial phase points for the bath DOF, which greatly improves the performance of the SC-IVR calculations for large molecular systems. Satisfactory convergence in the study of quantum coherence in vibrational relaxation has been achieved for a benchmark system-bath model with up to 21 DOF.
Microsecond simulations of the folding/unfolding thermodynamics of the Trp-cage mini protein
Day, Ryan; Paschek, Dietmar; Garcia, Angel E.
2012-01-01
We study the unbiased folding/unfolding thermodynamics of the Trp-cage miniprotein using detailed molecular dynamics simulations of an all-atom model of the protein in explicit solvent, using the Amberff99SB force field. Replica-exchange molecular dynamics (REMD) simulations are used to sample the protein ensembles over a broad range of temperatures covering the folded and unfolded states, and at two densities. The obtained ensembles are shown to reach equilibrium in the 1 μs per replica timescale. The total simulation time employed in the calculations exceeds 100 μs. Ensemble averages of the fraction folded, pressure, and energy differences between the folded and unfolded states as a function of temperature are used to model the free energy of the folding transition, ΔG(P,T), over the whole region of temperature and pressures sampled in the simulations. The ΔG(P,T) diagram describes an ellipse over the range of temperatures and pressures sampled, predicting that the system can undergo pressure induced unfolding and cold denaturation at low temperatures and high pressures, and unfolding at low pressures and high temperatures. The calculated free energy function exhibits remarkably good agreement with the experimental folding transition temperature (Tf = 321 K), free energy and specific heat changes. However, changes in enthalpy and entropy are significantly different than the experimental values. We speculate that these differences may be due to the simplicity of the semi-empirical force field used in the simulations and that more elaborate force fields may be required to describe appropriately the thermodynamics of proteins. PMID:20408169
Narrow field electromagnetic sensor system and method
McEwan, Thomas E.
1996-01-01
A narrow field electromagnetic sensor system and method of sensing a characteristic of an object provide the capability to realize a characteristic of an object such as density, thickness, or presence, for any desired coordinate position on the object. One application is imaging. The sensor can also be used as an obstruction detector or an electronic trip wire with a narrow field without the disadvantages of impaired performance when exposed to dirt, snow, rain, or sunlight. The sensor employs a transmitter for transmitting a sequence of electromagnetic signals in response to a transmit timing signal, a receiver for sampling only the initial direct RF path of the electromagnetic signal while excluding all other electromagnetic signals in response to a receive timing signal, and a signal processor for processing the sampled direct RF path electromagnetic signal and providing an indication of the characteristic of an object. Usually, the electromagnetic signal is a short RF burst and the obstruction must provide a substantially complete eclipse of the direct RF path. By employing time-of-flight techniques, a timing circuit controls the receiver to sample only the initial direct RF path of the electromagnetic signal while not sampling indirect path electromagnetic signals. The sensor system also incorporates circuitry for ultra-wideband spread spectrum operation that reduces interference to and from other RF services while allowing co-location of multiple electronic sensors without the need for frequency assignments.
Narrow field electromagnetic sensor system and method
McEwan, T.E.
1996-11-19
A narrow field electromagnetic sensor system and method of sensing a characteristic of an object provide the capability to realize a characteristic of an object such as density, thickness, or presence, for any desired coordinate position on the object. One application is imaging. The sensor can also be used as an obstruction detector or an electronic trip wire with a narrow field without the disadvantages of impaired performance when exposed to dirt, snow, rain, or sunlight. The sensor employs a transmitter for transmitting a sequence of electromagnetic signals in response to a transmit timing signal, a receiver for sampling only the initial direct RF path of the electromagnetic signal while excluding all other electromagnetic signals in response to a receive timing signal, and a signal processor for processing the sampled direct RF path electromagnetic signal and providing an indication of the characteristic of an object. Usually, the electromagnetic signal is a short RF burst and the obstruction must provide a substantially complete eclipse of the direct RF path. By employing time-of-flight techniques, a timing circuit controls the receiver to sample only the initial direct RF path of the electromagnetic signal while not sampling indirect path electromagnetic signals. The sensor system also incorporates circuitry for ultra-wideband spread spectrum operation that reduces interference to and from other RF services while allowing co-location of multiple electronic sensors without the need for frequency assignments. 12 figs.
Appraisal of jump distributions in ensemble-based sampling algorithms
NASA Astrophysics Data System (ADS)
Dejanic, Sanda; Scheidegger, Andreas; Rieckermann, Jörg; Albert, Carlo
2017-04-01
Sampling Bayesian posteriors of model parameters is often required for making model-based probabilistic predictions. For complex environmental models, standard Monte Carlo Markov Chain (MCMC) methods are often infeasible because they require too many sequential model runs. Therefore, we focused on ensemble methods that use many Markov chains in parallel, since they can be run on modern cluster architectures. Little is known about how to choose the best performing sampler, for a given application. A poor choice can lead to an inappropriate representation of posterior knowledge. We assessed two different jump moves, the stretch and the differential evolution move, underlying, respectively, the software packages EMCEE and DREAM, which are popular in different scientific communities. For the assessment, we used analytical posteriors with features as they often occur in real posteriors, namely high dimensionality, strong non-linear correlations or multimodality. For posteriors with non-linear features, standard convergence diagnostics based on sample means can be insufficient. Therefore, we resorted to an entropy-based convergence measure. We assessed the samplers by means of their convergence speed, robustness and effective sample sizes. For posteriors with strongly non-linear features, we found that the stretch move outperforms the differential evolution move, w.r.t. all three aspects.
Numerical Solution of Dyson Brownian Motion and a Sampling Scheme for Invariant Matrix Ensembles
NASA Astrophysics Data System (ADS)
Li, Xingjie Helen; Menon, Govind
2013-12-01
The Dyson Brownian Motion (DBM) describes the stochastic evolution of N points on the line driven by an applied potential, a Coulombic repulsion and identical, independent Brownian forcing at each point. We use an explicit tamed Euler scheme to numerically solve the Dyson Brownian motion and sample the equilibrium measure for non-quadratic potentials. The Coulomb repulsion is too singular for the SDE to satisfy the hypotheses of rigorous convergence proofs for tamed Euler schemes (Hutzenthaler et al. in Ann. Appl. Probab. 22(4):1611-1641, 2012). Nevertheless, in practice the scheme is observed to be stable for time steps of O(1/ N 2) and to relax exponentially fast to the equilibrium measure with a rate constant of O(1) independent of N. Further, this convergence rate appears to improve with N in accordance with O(1/ N) relaxation of local statistics of the Dyson Brownian motion. This allows us to use the Dyson Brownian motion to sample N× N Hermitian matrices from the invariant ensembles. The computational cost of generating M independent samples is O( MN 4) with a naive scheme, and O( MN 3log N) when a fast multipole method is used to evaluate the Coulomb interaction.
Equilibrium sampling by reweighting nonequilibrium simulation trajectories
NASA Astrophysics Data System (ADS)
Yang, Cheng; Wan, Biao; Xu, Shun; Wang, Yanting; Zhou, Xin
2016-03-01
Based on equilibrium molecular simulations, it is usually difficult to efficiently visit the whole conformational space of complex systems, which are separated into some metastable regions by high free energy barriers. Nonequilibrium simulations could enhance transitions among these metastable regions and then be applied to sample equilibrium distributions in complex systems, since the associated nonequilibrium effects can be removed by employing the Jarzynski equality (JE). Here we present such a systematical method, named reweighted nonequilibrium ensemble dynamics (RNED), to efficiently sample equilibrium conformations. The RNED is a combination of the JE and our previous reweighted ensemble dynamics (RED) method. The original JE reproduces equilibrium from lots of nonequilibrium trajectories but requires that the initial distribution of these trajectories is equilibrium. The RED reweights many equilibrium trajectories from an arbitrary initial distribution to get the equilibrium distribution, whereas the RNED has both advantages of the two methods, reproducing equilibrium from lots of nonequilibrium simulation trajectories with an arbitrary initial conformational distribution. We illustrated the application of the RNED in a toy model and in a Lennard-Jones fluid to detect its liquid-solid phase coexistence. The results indicate that the RNED sufficiently extends the application of both the original JE and the RED in equilibrium sampling of complex systems.
Equilibrium sampling by reweighting nonequilibrium simulation trajectories.
Yang, Cheng; Wan, Biao; Xu, Shun; Wang, Yanting; Zhou, Xin
2016-03-01
Based on equilibrium molecular simulations, it is usually difficult to efficiently visit the whole conformational space of complex systems, which are separated into some metastable regions by high free energy barriers. Nonequilibrium simulations could enhance transitions among these metastable regions and then be applied to sample equilibrium distributions in complex systems, since the associated nonequilibrium effects can be removed by employing the Jarzynski equality (JE). Here we present such a systematical method, named reweighted nonequilibrium ensemble dynamics (RNED), to efficiently sample equilibrium conformations. The RNED is a combination of the JE and our previous reweighted ensemble dynamics (RED) method. The original JE reproduces equilibrium from lots of nonequilibrium trajectories but requires that the initial distribution of these trajectories is equilibrium. The RED reweights many equilibrium trajectories from an arbitrary initial distribution to get the equilibrium distribution, whereas the RNED has both advantages of the two methods, reproducing equilibrium from lots of nonequilibrium simulation trajectories with an arbitrary initial conformational distribution. We illustrated the application of the RNED in a toy model and in a Lennard-Jones fluid to detect its liquid-solid phase coexistence. The results indicate that the RNED sufficiently extends the application of both the original JE and the RED in equilibrium sampling of complex systems.
Deterministic Mean-Field Ensemble Kalman Filtering
Law, Kody J. H.; Tembine, Hamidou; Tempone, Raul
2016-05-03
The proof of convergence of the standard ensemble Kalman filter (EnKF) from Le Gland, Monbet, and Tran [Large sample asymptotics for the ensemble Kalman filter, in The Oxford Handbook of Nonlinear Filtering, Oxford University Press, Oxford, UK, 2011, pp. 598--631] is extended to non-Gaussian state-space models. In this paper, a density-based deterministic approximation of the mean-field limit EnKF (DMFEnKF) is proposed, consisting of a PDE solver and a quadrature rule. Given a certain minimal order of convergence κ between the two, this extends to the deterministic filter approximation, which is therefore asymptotically superior to standard EnKF for dimension d
Deterministic Mean-Field Ensemble Kalman Filtering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Law, Kody J. H.; Tembine, Hamidou; Tempone, Raul
The proof of convergence of the standard ensemble Kalman filter (EnKF) from Le Gland, Monbet, and Tran [Large sample asymptotics for the ensemble Kalman filter, in The Oxford Handbook of Nonlinear Filtering, Oxford University Press, Oxford, UK, 2011, pp. 598--631] is extended to non-Gaussian state-space models. In this paper, a density-based deterministic approximation of the mean-field limit EnKF (DMFEnKF) is proposed, consisting of a PDE solver and a quadrature rule. Given a certain minimal order of convergence κ between the two, this extends to the deterministic filter approximation, which is therefore asymptotically superior to standard EnKF for dimension d
Examination of multi-model ensemble seasonal prediction methods using a simple climate system
NASA Astrophysics Data System (ADS)
Kang, In-Sik; Yoo, Jin Ho
2006-02-01
A simple climate model was designed as a proxy for the real climate system, and a number of prediction models were generated by slightly perturbing the physical parameters of the simple model. A set of long (240 years) historical hindcast predictions were performed with various prediction models, which are used to examine various issues of multi-model ensemble seasonal prediction, such as the best ways of blending multi-models and the selection of models. Based on these results, we suggest a feasible way of maximizing the benefit of using multi models in seasonal prediction. In particular, three types of multi-model ensemble prediction systems, i.e., the simple composite, superensemble, and the composite after statistically correcting individual predictions (corrected composite), are examined and compared to each other. The superensemble has more of an overfitting problem than the others, especially for the case of small training samples and/or weak external forcing, and the corrected composite produces the best prediction skill among the multi-model systems.
Krishnan, Ranjani; Walton, Emily B; Van Vliet, Krystyn J
2009-11-01
As computational resources increase, molecular dynamics simulations of biomolecules are becoming an increasingly informative complement to experimental studies. In particular, it has now become feasible to use multiple initial molecular configurations to generate an ensemble of replicate production-run simulations that allows for more complete characterization of rare events such as ligand-receptor unbinding. However, there are currently no explicit guidelines for selecting an ensemble of initial configurations for replicate simulations. Here, we use clustering analysis and steered molecular dynamics simulations to demonstrate that the configurational changes accessible in molecular dynamics simulations of biomolecules do not necessarily correlate with observed rare-event properties. This informs selection of a representative set of initial configurations. We also employ statistical analysis to identify the minimum number of replicate simulations required to sufficiently sample a given biomolecular property distribution. Together, these results suggest a general procedure for generating an ensemble of replicate simulations that will maximize accurate characterization of rare-event property distributions in biomolecules.
Mercadante, Davide; Milles, Sigrid; Fuertes, Gustavo; Svergun, Dmitri I; Lemke, Edward A; Gräter, Frauke
2015-06-25
Understanding the function of intrinsically disordered proteins is intimately related to our capacity to correctly sample their conformational dynamics. So far, a gap between experimentally and computationally derived ensembles exists, as simulations show overcompacted conformers. Increasing evidence suggests that the solvent plays a crucial role in shaping the ensembles of intrinsically disordered proteins and has led to several attempts to modify water parameters and thereby favor protein-water over protein-protein interactions. This study tackles the problem from a different perspective, which is the use of the Kirkwood-Buff theory of solutions to reproduce the correct conformational ensemble of intrinsically disordered proteins (IDPs). A protein force field recently developed on such a basis was found to be highly effective in reproducing ensembles for a fragment from the FG-rich nucleoporin 153, with dimensions matching experimental values obtained from small-angle X-ray scattering and single molecule FRET experiments. Kirkwood-Buff theory presents a complementary and fundamentally different approach to the recently developed four-site TIP4P-D water model, both of which can rescue the overcollapse observed in IDPs with canonical protein force fields. As such, our study provides a new route for tackling the deficiencies of current protein force fields in describing protein solvation.
Yan, Rui; Edwards, Thomas J.; Pankratz, Logan M.; Kuhn, Richard J.; Lanman, Jason K.; Liu, Jun; Jiang, Wen
2015-01-01
Cryo-electron tomography (cryo-ET) is an emerging technique that can elucidate the architecture of macromolecular complexes and cellular ultrastructure in a near-native state. Some important sample parameters, such as thickness and tilt, are needed for 3-D reconstruction. However, these parameters can currently only be determined using trial 3-D reconstructions. Accurate electron mean free path plays a significant role in modeling image formation process essential for simulation of electron microscopy images and model-based iterative 3-D reconstruction methods; however, their values are voltage and sample dependent and have only been experimentally measured for a limited number of sample conditions. Here, we report a computational method, tomoThickness, based on the Beer-Lambert law, to simultaneously determine the sample thickness, tilt and electron inelastic mean free path by solving an overdetermined nonlinear least square optimization problem utilizing the strong constraints of tilt relationships. The method has been extensively tested with both stained and cryo datasets. The fitted electron mean free paths are consistent with reported experimental measurements. The accurate thickness estimation eliminates the need for a generous assignment of Z-dimension size of the tomogram. Interestingly, we have also found that nearly all samples are a few degrees tilted relative to the electron beam. Compensation of the intrinsic sample tilt can result in horizontal structure and reduced Z-dimension of tomograms. Our fast, pre-reconstruction method can thus provide important sample parameters that can help improve performance of tomographic reconstruction of a wide range of samples. PMID:26433027
Yan, Rui; Edwards, Thomas J; Pankratz, Logan M; Kuhn, Richard J; Lanman, Jason K; Liu, Jun; Jiang, Wen
2015-11-01
Cryo-electron tomography (cryo-ET) is an emerging technique that can elucidate the architecture of macromolecular complexes and cellular ultrastructure in a near-native state. Some important sample parameters, such as thickness and tilt, are needed for 3-D reconstruction. However, these parameters can currently only be determined using trial 3-D reconstructions. Accurate electron mean free path plays a significant role in modeling image formation process essential for simulation of electron microscopy images and model-based iterative 3-D reconstruction methods; however, their values are voltage and sample dependent and have only been experimentally measured for a limited number of sample conditions. Here, we report a computational method, tomoThickness, based on the Beer-Lambert law, to simultaneously determine the sample thickness, tilt and electron inelastic mean free path by solving an overdetermined nonlinear least square optimization problem utilizing the strong constraints of tilt relationships. The method has been extensively tested with both stained and cryo datasets. The fitted electron mean free paths are consistent with reported experimental measurements. The accurate thickness estimation eliminates the need for a generous assignment of Z-dimension size of the tomogram. Interestingly, we have also found that nearly all samples are a few degrees tilted relative to the electron beam. Compensation of the intrinsic sample tilt can result in horizontal structure and reduced Z-dimension of tomograms. Our fast, pre-reconstruction method can thus provide important sample parameters that can help improve performance of tomographic reconstruction of a wide range of samples. Copyright © 2015 Elsevier Inc. All rights reserved.
Structure-guided Protein Transition Modeling with a Probabilistic Roadmap Algorithm.
Maximova, Tatiana; Plaku, Erion; Shehu, Amarda
2016-07-07
Proteins are macromolecules in perpetual motion, switching between structural states to modulate their function. A detailed characterization of the precise yet complex relationship between protein structure, dynamics, and function requires elucidating transitions between functionally-relevant states. Doing so challenges both wet and dry laboratories, as protein dynamics involves disparate temporal scales. In this paper we present a novel, sampling-based algorithm to compute transition paths. The algorithm exploits two main ideas. First, it leverages known structures to initialize its search and define a reduced conformation space for rapid sampling. This is key to address the insufficient sampling issue suffered by sampling-based algorithms. Second, the algorithm embeds samples in a nearest-neighbor graph where transition paths can be efficiently computed via queries. The algorithm adapts the probabilistic roadmap framework that is popular in robot motion planning. In addition to efficiently computing lowest-cost paths between any given structures, the algorithm allows investigating hypotheses regarding the order of experimentally-known structures in a transition event. This novel contribution is likely to open up new venues of research. Detailed analysis is presented on multiple-basin proteins of relevance to human disease. Multiscaling and the AMBER ff14SB force field are used to obtain energetically-credible paths at atomistic detail.
Comparison of different filter methods for data assimilation in the unsaturated zone
NASA Astrophysics Data System (ADS)
Lange, Natascha; Berkhahn, Simon; Erdal, Daniel; Neuweiler, Insa
2016-04-01
The unsaturated zone is an important compartment, which plays a role for the division of terrestrial water fluxes into surface runoff, groundwater recharge and evapotranspiration. For data assimilation in coupled systems it is therefore important to have a good representation of the unsaturated zone in the model. Flow processes in the unsaturated zone have all the typical features of flow in porous media: Processes can have long memory and as observations are scarce, hydraulic model parameters cannot be determined easily. However, they are important for the quality of model predictions. On top of that, the established flow models are highly non-linear. For these reasons, the use of the popular Ensemble Kalman filter as a data assimilation method to estimate state and parameters in unsaturated zone models could be questioned. With respect to the long process memory in the subsurface, it has been suggested that iterative filters and smoothers may be more suitable for parameter estimation in unsaturated media. We test the performance of different iterative filters and smoothers for data assimilation with a focus on parameter updates in the unsaturated zone. In particular we compare the Iterative Ensemble Kalman Filter and Smoother as introduced by Bocquet and Sakov (2013) as well as the Confirming Ensemble Kalman Filter and the modified Restart Ensemble Kalman Filter proposed by Song et al. (2014) to the original Ensemble Kalman Filter (Evensen, 2009). This is done with simple test cases generated numerically. We consider also test examples with layering structure, as a layering structure is often found in natural soils. We assume that observations are water content, obtained from TDR probes or other observation methods sampling relatively small volumes. Particularly in larger data assimilation frameworks, a reasonable balance between computational effort and quality of results has to be found. Therefore, we compare computational costs of the different methods as well as the quality of open loop model predictions and the estimated parameters. Bocquet, M. and P. Sakov, 2013: Joint state and parameter estimation with an iterative ensemble Kalman smoother, Nonlinear Processes in Geophysics 20(5): 803-818. Evensen, G., 2009: Data assimilation: The ensemble Kalman filter. Springer Science & Business Media. Song, X.H., L.S. Shi, M. Ye, J.Z. Yang and I.M. Navon, 2014: Numerical comparison of iterative ensemble Kalman filters for unsaturated flow inverse modeling. Vadose Zone Journal 13(2), 10.2136/vzj2013.05.0083.
Impact of hindcast length on estimates of seasonal climate predictability.
Shi, W; Schaller, N; MacLeod, D; Palmer, T N; Weisheimer, A
2015-03-16
It has recently been argued that single-model seasonal forecast ensembles are overdispersive, implying that the real world is more predictable than indicated by estimates of so-called perfect model predictability, particularly over the North Atlantic. However, such estimates are based on relatively short forecast data sets comprising just 20 years of seasonal predictions. Here we study longer 40 year seasonal forecast data sets from multimodel seasonal forecast ensemble projects and show that sampling uncertainty due to the length of the hindcast periods is large. The skill of forecasting the North Atlantic Oscillation during winter varies within the 40 year data sets with high levels of skill found for some subperiods. It is demonstrated that while 20 year estimates of seasonal reliability can show evidence of overdispersive behavior, the 40 year estimates are more stable and show no evidence of overdispersion. Instead, the predominant feature on these longer time scales is underdispersion, particularly in the tropics. Predictions can appear overdispersive due to hindcast length sampling errorLonger hindcasts are more robust and underdispersive, especially in the tropicsTwenty hindcasts are an inadequate sample size to assess seasonal forecast skill.
Xu, Zhiming; So, Rosa Q; Toe, Kyaw Kyar; Ang, Kai Keng; Guan, Cuntai
2014-01-01
This paper presents an asynchronously intracortical brain-computer interface (BCI) which allows the subject to continuously drive a mobile robot. This system has a great implication for disabled patients to move around. By carefully designing a multiclass support vector machine (SVM), the subject's self-paced instantaneous movement intents are continuously decoded to control the mobile robot. In particular, we studied the stability of the neural representation of the movement directions. Experimental results on the nonhuman primate showed that the overt movement directions were stably represented in ensemble of recorded units, and our SVM classifier could successfully decode such movements continuously along the desired movement path. However, the neural representation of the stop state for the self-paced control was not stably represented and could drift.
Zhang, Wei; Ding, Dong-Sheng; Dong, Ming-Xin; Shi, Shuai; Wang, Kai; Liu, Shi-Long; Li, Yan; Zhou, Zhi-Yuan; Shi, Bao-Sen; Guo, Guang-Can
2016-11-14
Entanglement in multiple degrees of freedom has many benefits over entanglement in a single one. The former enables quantum communication with higher channel capacity and more efficient quantum information processing and is compatible with diverse quantum networks. Establishing multi-degree-of-freedom entangled memories is not only vital for high-capacity quantum communication and computing, but also promising for enhanced violations of nonlocality in quantum systems. However, there have been yet no reports of the experimental realization of multi-degree-of-freedom entangled memories. Here we experimentally established hyper- and hybrid entanglement in multiple degrees of freedom, including path (K-vector) and orbital angular momentum, between two separated atomic ensembles by using quantum storage. The results are promising for achieving quantum communication and computing with many degrees of freedom.
Baele, Guy; Lemey, Philippe; Vansteelandt, Stijn
2013-03-06
Accurate model comparison requires extensive computation times, especially for parameter-rich models of sequence evolution. In the Bayesian framework, model selection is typically performed through the evaluation of a Bayes factor, the ratio of two marginal likelihoods (one for each model). Recently introduced techniques to estimate (log) marginal likelihoods, such as path sampling and stepping-stone sampling, offer increased accuracy over the traditional harmonic mean estimator at an increased computational cost. Most often, each model's marginal likelihood will be estimated individually, which leads the resulting Bayes factor to suffer from errors associated with each of these independent estimation processes. We here assess the original 'model-switch' path sampling approach for direct Bayes factor estimation in phylogenetics, as well as an extension that uses more samples, to construct a direct path between two competing models, thereby eliminating the need to calculate each model's marginal likelihood independently. Further, we provide a competing Bayes factor estimator using an adaptation of the recently introduced stepping-stone sampling algorithm and set out to determine appropriate settings for accurately calculating such Bayes factors, with context-dependent evolutionary models as an example. While we show that modest efforts are required to roughly identify the increase in model fit, only drastically increased computation times ensure the accuracy needed to detect more subtle details of the evolutionary process. We show that our adaptation of stepping-stone sampling for direct Bayes factor calculation outperforms the original path sampling approach as well as an extension that exploits more samples. Our proposed approach for Bayes factor estimation also has preferable statistical properties over the use of individual marginal likelihood estimates for both models under comparison. Assuming a sigmoid function to determine the path between two competing models, we provide evidence that a single well-chosen sigmoid shape value requires less computational efforts in order to approximate the true value of the (log) Bayes factor compared to the original approach. We show that the (log) Bayes factors calculated using path sampling and stepping-stone sampling differ drastically from those estimated using either of the harmonic mean estimators, supporting earlier claims that the latter systematically overestimate the performance of high-dimensional models, which we show can lead to erroneous conclusions. Based on our results, we argue that highly accurate estimation of differences in model fit for high-dimensional models requires much more computational effort than suggested in recent studies on marginal likelihood estimation.
2013-01-01
Background Accurate model comparison requires extensive computation times, especially for parameter-rich models of sequence evolution. In the Bayesian framework, model selection is typically performed through the evaluation of a Bayes factor, the ratio of two marginal likelihoods (one for each model). Recently introduced techniques to estimate (log) marginal likelihoods, such as path sampling and stepping-stone sampling, offer increased accuracy over the traditional harmonic mean estimator at an increased computational cost. Most often, each model’s marginal likelihood will be estimated individually, which leads the resulting Bayes factor to suffer from errors associated with each of these independent estimation processes. Results We here assess the original ‘model-switch’ path sampling approach for direct Bayes factor estimation in phylogenetics, as well as an extension that uses more samples, to construct a direct path between two competing models, thereby eliminating the need to calculate each model’s marginal likelihood independently. Further, we provide a competing Bayes factor estimator using an adaptation of the recently introduced stepping-stone sampling algorithm and set out to determine appropriate settings for accurately calculating such Bayes factors, with context-dependent evolutionary models as an example. While we show that modest efforts are required to roughly identify the increase in model fit, only drastically increased computation times ensure the accuracy needed to detect more subtle details of the evolutionary process. Conclusions We show that our adaptation of stepping-stone sampling for direct Bayes factor calculation outperforms the original path sampling approach as well as an extension that exploits more samples. Our proposed approach for Bayes factor estimation also has preferable statistical properties over the use of individual marginal likelihood estimates for both models under comparison. Assuming a sigmoid function to determine the path between two competing models, we provide evidence that a single well-chosen sigmoid shape value requires less computational efforts in order to approximate the true value of the (log) Bayes factor compared to the original approach. We show that the (log) Bayes factors calculated using path sampling and stepping-stone sampling differ drastically from those estimated using either of the harmonic mean estimators, supporting earlier claims that the latter systematically overestimate the performance of high-dimensional models, which we show can lead to erroneous conclusions. Based on our results, we argue that highly accurate estimation of differences in model fit for high-dimensional models requires much more computational effort than suggested in recent studies on marginal likelihood estimation. PMID:23497171
Method and apparatus for probing relative volume fractions
Jandrasits, Walter G.; Kikta, Thomas J.
1998-01-01
A relative volume fraction probe particularly for use in a multiphase fluid system includes two parallel conductive paths defining therebetween a sample zone within the system. A generating unit generates time varying electrical signals which are inserted into one of the two parallel conductive paths. A time domain reflectometer receives the time varying electrical signals returned by the second of the two parallel conductive paths and, responsive thereto, outputs a curve of impedance versus distance. An analysis unit then calculates the area under the curve, subtracts the calculated area from an area produced when the sample zone consists entirely of material of a first fluid phase, and divides this calculated difference by the difference between an area produced when the sample zone consists entirely of material of the first fluid phase and an area produced when the sample zone consists entirely of material of a second fluid phase. The result is the volume fraction.
Method and apparatus for probing relative volume fractions
Jandrasits, W.G.; Kikta, T.J.
1998-03-17
A relative volume fraction probe particularly for use in a multiphase fluid system includes two parallel conductive paths defining therebetween a sample zone within the system. A generating unit generates time varying electrical signals which are inserted into one of the two parallel conductive paths. A time domain reflectometer receives the time varying electrical signals returned by the second of the two parallel conductive paths and, responsive thereto, outputs a curve of impedance versus distance. An analysis unit then calculates the area under the curve, subtracts the calculated area from an area produced when the sample zone consists entirely of material of a first fluid phase, and divides this calculated difference by the difference between an area produced when the sample zone consists entirely of material of the first fluid phase and an area produced when the sample zone consists entirely of material of a second fluid phase. The result is the volume fraction. 9 figs.
A comparison of breeding and ensemble transform vectors for global ensemble generation
NASA Astrophysics Data System (ADS)
Deng, Guo; Tian, Hua; Li, Xiaoli; Chen, Jing; Gong, Jiandong; Jiao, Meiyan
2012-02-01
To compare the initial perturbation techniques using breeding vectors and ensemble transform vectors, three ensemble prediction systems using both initial perturbation methods but with different ensemble member sizes based on the spectral model T213/L31 are constructed at the National Meteorological Center, China Meteorological Administration (NMC/CMA). A series of ensemble verification scores such as forecast skill of the ensemble mean, ensemble resolution, and ensemble reliability are introduced to identify the most important attributes of ensemble forecast systems. The results indicate that the ensemble transform technique is superior to the breeding vector method in light of the evaluation of anomaly correlation coefficient (ACC), which is a deterministic character of the ensemble mean, the root-mean-square error (RMSE) and spread, which are of probabilistic attributes, and the continuous ranked probability score (CRPS) and its decomposition. The advantage of the ensemble transform approach is attributed to its orthogonality among ensemble perturbations as well as its consistence with the data assimilation system. Therefore, this study may serve as a reference for configuration of the best ensemble prediction system to be used in operation.
NASA Astrophysics Data System (ADS)
Pachov, Dimitar V.
Biomolecules are dynamic in nature and visit a number of states while performing their biological function. However, understanding how they interconvert between functional substates is a challenging task. In this thesis, we employ enhanced computational strategies to reveal in atomistic resolution transition states and molecular mechanism along conformational pathways of the signaling protein Nitrogen Regulatory Protein C (NtrC) and the enzyme Adenylate Kinase (Adk). Targeted Molecular Dynamics (TMD) simulations and NMR experiments have previously found the active/inactive interconversion of NtrC is stabilized by non-native transient contacts. To find where along the conformational pathway they lie and probe the existence of multiple intermediates, a beyond 8mus-extensive mapping of the conformational landscape was performed by a multitude of straightforward MD simulations relaxed from the biased TMD pathway. A number of metastable states stabilized by local interactions was found to underline the conformational pathway of NtrC. Two spontaneous transitions of the last stage of the active-to-inactive conversion were identified and used in path sampling procedures to generate an ensemble of truly dynamic reactive pathways. The transition state ensemble (TSE) and mechanistic descriptors of this transition were revealed in atomic detail and verified by committor analysis. By analyzing how pressure affects the dynamics and function of two homologous Adk proteins - the P.Profundum Adk surviving at 700atm pressure in the deep sea, and the E. coli Adk that lives at ambient pressures - we indirectly obtained atomic information about the TSE of the large-amplitude rate-limiting conformational opening of the Adk lids. Guided by NMR experiments showing significantly decreased activation volumes of the piezophile compared to its mesophilic counterpart, TMD simulations revealed the formation of an extended hydrogen-bonded water network in the transition state of the piezophile that can explain the experimentally measured activation volume differences. The transition state of the conformational change was proposed to lie close to the closed state. Additionally, a number of descriptors were used to characterize the free energy landscape of the mesophile. It was found that the features of landscape are highly sensitive to the binding of different ligands, their protonation states and the presence of magnesium.
Neural Representation of Spatial Topology in the Rodent Hippocampus
Chen, Zhe; Gomperts, Stephen N.; Yamamoto, Jun; Wilson, Matthew A.
2014-01-01
Pyramidal cells in the rodent hippocampus often exhibit clear spatial tuning in navigation. Although it has been long suggested that pyramidal cell activity may underlie a topological code rather than a topographic code, it remains unclear whether an abstract spatial topology can be encoded in the ensemble spiking activity of hippocampal place cells. Using a statistical approach developed previously, we investigate this question and related issues in greater details. We recorded ensembles of hippocampal neurons as rodents freely foraged in one and two-dimensional spatial environments, and we used a “decode-to-uncover” strategy to examine the temporally structured patterns embedded in the ensemble spiking activity in the absence of observed spatial correlates during periods of rodent navigation or awake immobility. Specifically, the spatial environment was represented by a finite discrete state space. Trajectories across spatial locations (“states”) were associated with consistent hippocampal ensemble spiking patterns, which were characterized by a state transition matrix. From this state transition matrix, we inferred a topology graph that defined the connectivity in the state space. In both one and two-dimensional environments, the extracted behavior patterns from the rodent hippocampal population codes were compared against randomly shuffled spike data. In contrast to a topographic code, our results support the efficiency of topological coding in the presence of sparse sample size and fuzzy space mapping. This computational approach allows us to quantify the variability of ensemble spiking activity, to examine hippocampal population codes during off-line states, and to quantify the topological complexity of the environment. PMID:24102128
Johnson, David K.; Karanicolas, John
2015-01-01
Small-molecules that inhibit interactions between specific pairs of proteins have long represented a promising avenue for therapeutic intervention in a variety of settings. Structural studies have shown that in many cases, the inhibitor-bound protein adopts a conformation that is distinct from its unbound and its protein-bound conformations. This plasticity of the protein surface presents a major challenge in predicting which members of a protein family will be inhibited by a given ligand. Here, we use biased simulations of Bcl-2-family proteins to generate ensembles of low-energy conformations that contain surface pockets suitable for small molecule binding. We find that the resulting conformational ensembles include surface pockets that mimic those observed in inhibitor-bound crystal structures. Next, we find that the ensembles generated using different members of this protein family are overlapping but distinct, and that the activity of a given compound against a particular family member (ligand selectivity) can be predicted from whether the corresponding ensemble samples a complementary surface pocket. Finally, we find that each ensemble includes certain surface pockets that are not shared by any other family member: while no inhibitors have yet been identified to take advantage of these pockets, we expect that chemical scaffolds complementing these “distinct” pockets will prove highly selective for their targets. The opportunity to achieve target selectivity within a protein family by exploiting differences in surface fluctuations represents a new paradigm that may facilitate design of family-selective small-molecule inhibitors of protein-protein interactions. PMID:25706586
NASA Technical Reports Server (NTRS)
Chambon, Philippe; Zhang, Sara Q.; Hou, Arthur Y.; Zupanski, Milija; Cheung, Samson
2013-01-01
The forthcoming Global Precipitation Measurement (GPM) Mission will provide next generation precipitation observations from a constellation of satellites. Since precipitation by nature has large variability and low predictability at cloud-resolving scales, the impact of precipitation data on the skills of mesoscale numerical weather prediction (NWP) is largely affected by the characterization of background and observation errors and the representation of nonlinear cloud/precipitation physics in an NWP data assimilation system. We present a data impact study on the assimilation of precipitation-affected microwave (MW) radiances from a pre-GPM satellite constellation using the Goddard WRF Ensemble Data Assimilation System (Goddard WRF-EDAS). A series of assimilation experiments are carried out in a Weather Research Forecast (WRF) model domain of 9 km resolution in western Europe. Sensitivities to observation error specifications, background error covariance estimated from ensemble forecasts with different ensemble sizes, and MW channel selections are examined through single-observation assimilation experiments. An empirical bias correction for precipitation-affected MW radiances is developed based on the statistics of radiance innovations in rainy areas. The data impact is assessed by full data assimilation cycling experiments for a storm event that occurred in France in September 2010. Results show that the assimilation of MW precipitation observations from a satellite constellation mimicking GPM has a positive impact on the accumulated rain forecasts verified with surface radar rain estimates. The case-study on a convective storm also reveals that the accuracy of ensemble-based background error covariance is limited by sampling errors and model errors such as precipitation displacement and unresolved convective scale instability.
Hampson, Robert E.; Song, Dong; Chan, Rosa H.M.; Sweatt, Andrew J.; Riley, Mitchell R.; Gerhardt, Gregory A.; Shin, Dae C.; Marmarelis, Vasilis Z.; Berger, Theodore W.; Deadwyler, Samuel A.
2012-01-01
Collaborative investigations have characterized how multineuron hippocampal ensembles encode memory necessary for subsequent successful performance by rodents in a delayed nonmatch to sample (DNMS) task and utilized that information to provide the basis for a memory prosthesis to enhance performance. By employing a unique nonlinear dynamic multi-input/multi-output (MIMO) model, developed and adapted to hippocampal neural ensemble firing patterns derived from simultaneous recorded CA1 and CA3 activity, it was possible to extract information encoded in the sample phase necessary for successful performance in the nonmatch phase of the task. The extension of this MIMO model to online delivery of electrical stimulation delivered to the same recording loci that mimicked successful CA1 firing patterns, provided the means to increase levels of performance on a trial-by-trial basis. Inclusion of several control procedures provides evidence for the specificity of effective MIMO model generated patterns of electrical stimulation. Increased utility of the MIMO model as a prosthesis device was exhibited by the demonstration of cumulative increases in DNMS task performance with repeated MIMO stimulation over many sessions on both stimulation and nonstimulation trials, suggesting overall system modification with continued exposure. Results reported here are compatible with and extend prior demonstrations and further support the candidacy of the MIMO model as an effective cortical prosthesis. PMID:22438334
Quasi-Monte Carlo Methods Applied to Tau-Leaping in Stochastic Biological Systems.
Beentjes, Casper H L; Baker, Ruth E
2018-05-25
Quasi-Monte Carlo methods have proven to be effective extensions of traditional Monte Carlo methods in, amongst others, problems of quadrature and the sample path simulation of stochastic differential equations. By replacing the random number input stream in a simulation procedure by a low-discrepancy number input stream, variance reductions of several orders have been observed in financial applications. Analysis of stochastic effects in well-mixed chemical reaction networks often relies on sample path simulation using Monte Carlo methods, even though these methods suffer from typical slow [Formula: see text] convergence rates as a function of the number of sample paths N. This paper investigates the combination of (randomised) quasi-Monte Carlo methods with an efficient sample path simulation procedure, namely [Formula: see text]-leaping. We show that this combination is often more effective than traditional Monte Carlo simulation in terms of the decay of statistical errors. The observed convergence rate behaviour is, however, non-trivial due to the discrete nature of the models of chemical reactions. We explain how this affects the performance of quasi-Monte Carlo methods by looking at a test problem in standard quadrature.
NASA Technical Reports Server (NTRS)
Hewagama, TIlak; Aslam, Shahid; Talabac, Stephen; Allen, John E., Jr.; Annen, John N.; Jennings, Donald E.
2011-01-01
Fourier transform spectrometers have a venerable heritage as flight instruments. However, obtaining an accurate spectrum exacts a penalty in instrument mass and power requirements. Recent advances in a broad class of non-scanning Fourier transform spectrometer (FTS) devices, generally called spatial heterodyne spectrometers, offer distinct advantages as flight optimized systems. We are developing a miniaturized system that employs photonics lightwave circuit principles and functions as an FTS operating in the 7-14 micrometer spectral region. The inteferogram is constructed from an ensemble of Mach-Zehnder interferometers with path length differences calibrated to mimic scan mirror sample positions of a classic Michelson type FTS. One potential long-term application of this technology in low cost planetary missions is the concept of a self-contained sensor system. We are developing a systems architecture concept for wide area in situ and remote monitoring of characteristic properties that are of scientific interest. The system will be based on wavelength- and resolution-independent spectroscopic sensors for studying atmospheric and surface chemistry, physics, and mineralogy. The self-contained sensor network is based on our concept of an Addressable Photonics Cube (APC) which has real-time flexibility and broad science applications. It is envisaged that a spatially distributed autonomous sensor web concept that integrates multiple APCs will be reactive and dynamically driven. The network is designed to respond in an event- or model-driven manner or reconfigured as needed.
An adaptive multi-level simulation algorithm for stochastic biological systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lester, C., E-mail: lesterc@maths.ox.ac.uk; Giles, M. B.; Baker, R. E.
2015-01-14
Discrete-state, continuous-time Markov models are widely used in the modeling of biochemical reaction networks. Their complexity often precludes analytic solution, and we rely on stochastic simulation algorithms (SSA) to estimate system statistics. The Gillespie algorithm is exact, but computationally costly as it simulates every single reaction. As such, approximate stochastic simulation algorithms such as the tau-leap algorithm are often used. Potentially computationally more efficient, the system statistics generated suffer from significant bias unless tau is relatively small, in which case the computational time can be comparable to that of the Gillespie algorithm. The multi-level method [Anderson and Higham, “Multi-level Montemore » Carlo for continuous time Markov chains, with applications in biochemical kinetics,” SIAM Multiscale Model. Simul. 10(1), 146–179 (2012)] tackles this problem. A base estimator is computed using many (cheap) sample paths at low accuracy. The bias inherent in this estimator is then reduced using a number of corrections. Each correction term is estimated using a collection of paired sample paths where one path of each pair is generated at a higher accuracy compared to the other (and so more expensive). By sharing random variables between these paired paths, the variance of each correction estimator can be reduced. This renders the multi-level method very efficient as only a relatively small number of paired paths are required to calculate each correction term. In the original multi-level method, each sample path is simulated using the tau-leap algorithm with a fixed value of τ. This approach can result in poor performance when the reaction activity of a system changes substantially over the timescale of interest. By introducing a novel adaptive time-stepping approach where τ is chosen according to the stochastic behaviour of each sample path, we extend the applicability of the multi-level method to such cases. We demonstrate the efficiency of our method using a number of examples.« less
NASA Astrophysics Data System (ADS)
Janardhanan, S.; Datta, B.
2011-12-01
Surrogate models are widely used to develop computationally efficient simulation-optimization models to solve complex groundwater management problems. Artificial intelligence based models are most often used for this purpose where they are trained using predictor-predictand data obtained from a numerical simulation model. Most often this is implemented with the assumption that the parameters and boundary conditions used in the numerical simulation model are perfectly known. However, in most practical situations these values are uncertain. Under these circumstances the application of such approximation surrogates becomes limited. In our study we develop a surrogate model based coupled simulation optimization methodology for determining optimal pumping strategies for coastal aquifers considering parameter uncertainty. An ensemble surrogate modeling approach is used along with multiple realization optimization. The methodology is used to solve a multi-objective coastal aquifer management problem considering two conflicting objectives. Hydraulic conductivity and the aquifer recharge are considered as uncertain values. Three dimensional coupled flow and transport simulation model FEMWATER is used to simulate the aquifer responses for a number of scenarios corresponding to Latin hypercube samples of pumping and uncertain parameters to generate input-output patterns for training the surrogate models. Non-parametric bootstrap sampling of this original data set is used to generate multiple data sets which belong to different regions in the multi-dimensional decision and parameter space. These data sets are used to train and test multiple surrogate models based on genetic programming. The ensemble of surrogate models is then linked to a multi-objective genetic algorithm to solve the pumping optimization problem. Two conflicting objectives, viz, maximizing total pumping from beneficial wells and minimizing the total pumping from barrier wells for hydraulic control of saltwater intrusion are considered. The salinity levels resulting at strategic locations due to these pumping are predicted using the ensemble surrogates and are constrained to be within pre-specified levels. Different realizations of the concentration values are obtained from the ensemble predictions corresponding to each candidate solution of pumping. Reliability concept is incorporated as the percent of the total number of surrogate models which satisfy the imposed constraints. The methodology was applied to a realistic coastal aquifer system in Burdekin delta area in Australia. It was found that all optimal solutions corresponding to a reliability level of 0.99 satisfy all the constraints and as reducing reliability level decreases the constraint violation increases. Thus ensemble surrogate model based simulation-optimization was found to be useful in deriving multi-objective optimal pumping strategies for coastal aquifers under parameter uncertainty.
Yang, Li; Sun, Rui; Hase, William L
2011-11-08
In a previous study (J. Chem. Phys.2008, 129, 094701) it was shown that for a large molecule, with a total energy much greater than its barrier for decomposition and whose vibrational modes are harmonic oscillators, the expressions for the classical Rice-Ramsperger-Kassel-Marcus (RRKM) (i.e., RRK) and classical transition-state theory (TST) rate constants become equivalent. Using this relationship, a molecule's unimolecular rate constants versus temperature may be determined from chemical dynamics simulations of microcanonical ensembles for the molecule at different total energies. The simulation identifies the molecule's unimolecular pathways and their Arrhenius parameters. In the work presented here, this approach is used to study the thermal decomposition of CH3-NH-CH═CH-CH3, an important constituent in the polymer of cross-linked epoxy resins. Direct dynamics simulations, at the MP2/6-31+G* level of theory, were used to investigate the decomposition of microcanonical ensembles for this molecule. The Arrhenius A and Ea parameters determined from the direct dynamics simulation are in very good agreement with the TST Arrhenius parameters for the MP2/6-31+G* potential energy surface. The simulation method applied here may be particularly useful for large molecules with a multitude of decomposition pathways and whose transition states may be difficult to determine and have structures that are not readily obvious.
NASA Astrophysics Data System (ADS)
Li, Xiaohui; Sun, Zhenping; Cao, Dongpu; Liu, Daxue; He, Hangen
2017-03-01
This study proposes a novel integrated local trajectory planning and tracking control (ILTPTC) framework for autonomous vehicles driving along a reference path with obstacles avoidance. For this ILTPTC framework, an efficient state-space sampling-based trajectory planning scheme is employed to smoothly follow the reference path. A model-based predictive path generation algorithm is applied to produce a set of smooth and kinematically-feasible paths connecting the initial state with the sampling terminal states. A velocity control law is then designed to assign a speed value at each of the points along the generated paths. An objective function considering both safety and comfort performance is carefully formulated for assessing the generated trajectories and selecting the optimal one. For accurately tracking the optimal trajectory while overcoming external disturbances and model uncertainties, a combined feedforward and feedback controller is developed. Both simulation analyses and vehicle testing are performed to verify the effectiveness of the proposed ILTPTC framework, and future research is also briefly discussed.
The Ensembl REST API: Ensembl Data for Any Language.
Yates, Andrew; Beal, Kathryn; Keenan, Stephen; McLaren, William; Pignatelli, Miguel; Ritchie, Graham R S; Ruffier, Magali; Taylor, Kieron; Vullo, Alessandro; Flicek, Paul
2015-01-01
We present a Web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA while minimizing client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor tool permitting large-scale programmatic variant analysis independent of any specific programming language. The Ensembl REST API can be accessed at http://rest.ensembl.org and source code is freely available under an Apache 2.0 license from http://github.com/Ensembl/ensembl-rest. © The Author 2014. Published by Oxford University Press.
Ensembl BioMarts: a hub for data retrieval across taxonomic space.
Kinsella, Rhoda J; Kähäri, Andreas; Haider, Syed; Zamora, Jorge; Proctor, Glenn; Spudich, Giulietta; Almeida-King, Jeff; Staines, Daniel; Derwent, Paul; Kerhornou, Arnaud; Kersey, Paul; Flicek, Paul
2011-01-01
For a number of years the BioMart data warehousing system has proven to be a valuable resource for scientists seeking a fast and versatile means of accessing the growing volume of genomic data provided by the Ensembl project. The launch of the Ensembl Genomes project in 2009 complemented the Ensembl project by utilizing the same visualization, interactive and programming tools to provide users with a means for accessing genome data from a further five domains: protists, bacteria, metazoa, plants and fungi. The Ensembl and Ensembl Genomes BioMarts provide a point of access to the high-quality gene annotation, variation data, functional and regulatory annotation and evolutionary relationships from genomes spanning the taxonomic space. This article aims to give a comprehensive overview of the Ensembl and Ensembl Genomes BioMarts as well as some useful examples and a description of current data content and future objectives. Database URLs: http://www.ensembl.org/biomart/martview/; http://metazoa.ensembl.org/biomart/martview/; http://plants.ensembl.org/biomart/martview/; http://protists.ensembl.org/biomart/martview/; http://fungi.ensembl.org/biomart/martview/; http://bacteria.ensembl.org/biomart/martview/.
Daniele Tonina; Alberto Bellin
2008-01-01
Pore-scale dispersion (PSD), aquifer heterogeneity, sampling volume, and source size influence solute concentrations of conservative tracers transported in heterogeneous porous formations. In this work, we developed a new set of analytical solutions for the concentration ensemble mean, variance, and coefficient of variation (CV), which consider the effects of all these...
A preclustering-based ensemble learning technique for acute appendicitis diagnoses.
Lee, Yen-Hsien; Hu, Paul Jen-Hwa; Cheng, Tsang-Hsiang; Huang, Te-Chia; Chuang, Wei-Yao
2013-06-01
Acute appendicitis is a common medical condition, whose effective, timely diagnosis can be difficult. A missed diagnosis not only puts the patient in danger but also requires additional resources for corrective treatments. An acute appendicitis diagnosis constitutes a classification problem, for which a further fundamental challenge pertains to the skewed outcome class distribution of instances in the training sample. A preclustering-based ensemble learning (PEL) technique aims to address the associated imbalanced sample learning problems and thereby support the timely, accurate diagnosis of acute appendicitis. The proposed PEL technique employs undersampling to reduce the number of majority-class instances in a training sample, uses preclustering to group similar majority-class instances into multiple groups, and selects from each group representative instances to create more balanced samples. The PEL technique thereby reduces potential information loss from random undersampling. It also takes advantage of ensemble learning to improve performance. We empirically evaluate this proposed technique with 574 clinical cases obtained from a comprehensive tertiary hospital in southern Taiwan, using several prevalent techniques and a salient scoring system as benchmarks. The comparative results show that PEL is more effective and less biased than any benchmarks. The proposed PEL technique seems more sensitive to identifying positive acute appendicitis than the commonly used Alvarado scoring system and exhibits higher specificity in identifying negative acute appendicitis. In addition, the sensitivity and specificity values of PEL appear higher than those of the investigated benchmarks that follow the resampling approach. Our analysis suggests PEL benefits from the more representative majority-class instances in the training sample. According to our overall evaluation results, PEL records the best overall performance, and its area under the curve measure reaches 0.619. The PEL technique is capable of addressing imbalanced sample learning associated with acute appendicitis diagnosis. Our evaluation results suggest PEL is less biased toward a positive or negative class than the investigated benchmark techniques. In addition, our results indicate the overall effectiveness of the proposed technique, compared with prevalent scoring systems or salient classification techniques that follow the resampling approach. Copyright © 2013 Elsevier B.V. All rights reserved.
Kim, Hyoungrae; Jang, Cheongyun; Yadav, Dharmendra K; Kim, Mi-Hyun
2017-03-23
The accuracy of any 3D-QSAR, Pharmacophore and 3D-similarity based chemometric target fishing models are highly dependent on a reasonable sample of active conformations. Since a number of diverse conformational sampling algorithm exist, which exhaustively generate enough conformers, however model building methods relies on explicit number of common conformers. In this work, we have attempted to make clustering algorithms, which could find reasonable number of representative conformer ensembles automatically with asymmetric dissimilarity matrix generated from openeye tool kit. RMSD was the important descriptor (variable) of each column of the N × N matrix considered as N variables describing the relationship (network) between the conformer (in a row) and the other N conformers. This approach used to evaluate the performance of the well-known clustering algorithms by comparison in terms of generating representative conformer ensembles and test them over different matrix transformation functions considering the stability. In the network, the representative conformer group could be resampled for four kinds of algorithms with implicit parameters. The directed dissimilarity matrix becomes the only input to the clustering algorithms. Dunn index, Davies-Bouldin index, Eta-squared values and omega-squared values were used to evaluate the clustering algorithms with respect to the compactness and the explanatory power. The evaluation includes the reduction (abstraction) rate of the data, correlation between the sizes of the population and the samples, the computational complexity and the memory usage as well. Every algorithm could find representative conformers automatically without any user intervention, and they reduced the data to 14-19% of the original values within 1.13 s per sample at the most. The clustering methods are simple and practical as they are fast and do not ask for any explicit parameters. RCDTC presented the maximum Dunn and omega-squared values of the four algorithms in addition to consistent reduction rate between the population size and the sample size. The performance of the clustering algorithms was consistent over different transformation functions. Moreover, the clustering method can also be applied to molecular dynamics sampling simulation results.
Chernesky, Max; Jang, Dan; Gilchrist, Jodi; Elit, Laurie; Lytwyn, Alice; Smieja, Marek; Dockter, Janel; Getman, Damon; Reid, Jennifer; Hill, Craig
2014-06-01
An APTIMA specimen collection and transportation (SCT) kit was developed by Hologic/Gen-Probe. To compare cervical SCT samples to PreservCyt and SurePath samples and self-collected vaginal samples to physician-collected vaginal and cervical SCT samples. To determine ease and comfort of self-collection with the kit. Each woman (n = 580) self-collected a vaginal SCT, then filled out a questionnaire (n = 563) to determine ease and comfort of self-collection. Colposcopy physicians collected a vaginal SCT and cervical PreservCyt, SCT, and SurePath samples. Samples were tested by APTIMA HPV (AHPV) assay. Agreement between testing of cervical SCT and PreservCyt was 91.1% (κ = 0.82), and that of SurePath samples was 86.7% (κ = 0.72). Agreement of self-collected vaginal SCT to physician-collected SCT was 84.7% (κ = 0.68), and that of self-collected vaginal to cervical SCT was 82.0% (κ = 0.63). For 30 patients with CIN2+, AHPV testing of cervical SCT was 100% sensitive and 59.8% specific compared with PreservCyt (96.6% and 66.2%) and SurePath (93.3% and 70.9%). Vaginal SCT sensitivity was 86.7% for self-collection and 80.0% for physician collection. Most patients found that vaginal self-collection was easy, 5.3% reported some difficulty, and 87.6% expressed no discomfort. Cervical samples collected with the new SCT kit compared well to traditional liquid-based samples tested by AHPV. Although there was good agreement between self-collected and physician-collected samples with the SCT, in a limited number of 30 women, vaginal sampling identified fewer with CIN2+ precancerous cervical lesions than cervical SCT sampling. Comfort, ease of use, and detection of high-risk HPV demonstrated that the kit could be used for cervical and vaginal sampling.
NASA Astrophysics Data System (ADS)
Fazel, Nasim; Berndtsson, Ronny; Bertacchi Uvo, Cintia; Klove, Bjorn; Madani, Kaveh
2015-04-01
Drought is a natural phenomenon that can cause significant environmental, ecological, and socio-economic losses in water scarce regions. Studies of drought under climate change are essential for water resources planning and management. Dry spells and number of consecutive days with precipitation below a certain threshold can be used to identify the severity of hydrological drought. In this study, we analyzed the projected changes of number of dry days in two future periods, 2011-2040 and 2071-2100, for both seasonal and annual time scales in the Lake Urmia Basin. The lake and its wetlands, located in northwestern Iran, have invaluable environmental, social, and economic importance for the region. The lake level has been shrinking dramatically since 1995 and now the water volume is less than 30% of its original. Moreover, frequent dry spells have struck the region and effected the region's water resources and lake ecosystem as in other parts of Iran too. Analyzing future drought and dry spells characteristics in the region is crucial for sustainable water management and lake restoration plans. We used daily projected precipitation from 20 climate models used in the CMIP5 (Coupled Model Inter-comparison Project Phase 5) driven by three representative paths, RCP2.6, RCP4.5, and, RCP8.5. The model outputs were statistically downscaled and validated based on the historical observation period 1980-2010. We defined days with precipitation less than 1 mm as dry days for both observation periods and model projections. The model validation showed that all models underestimated the number of dry days. An ensemble based on the validation results consisting of five models which were in best agreement with observations was used to assess the changes in number of future dry days in Lake Urmia Basin. The entire ensemble showed increase in number of dry days for all seasons. The projected changes in winter and spring were larger than for summer and autumn. All models projected dryer winter and spring periods in the near and far future periods. The ensemble mean for future annual dry days increased by 6.5 % to 7.3% for the different climate change related emission and concentration pathway RCP2.6, RCP4.5, and, RCP8.5.
Applying Ensemble Kalman Filter to Regional Ocean Circulation Model in the East Asian Marginal Sea
NASA Astrophysics Data System (ADS)
Pak, Gyun-Do; Kim, Young Ho; Chang, Kyung-Il
2010-05-01
We successfully apply the ensemble Kalman filter (EnKF) data assimilation scheme to the East Sea Regional Ocean Model (ESROM). The ESROM solves the three dimensional ocean primitive equations with the hydrostatic and Boussinesq approximations. The domain of ESROM fully covers East Sea with grid intervals of approximately 0.1˚. The ESROM has one inflow port, the Korea Strait, and two outflow ports, the Tsugaru and Soya straits. High resolution bathymetry of 1/60˚ (Choi et al., 2002) is adopted for the model topography. The ESROM is initialized using hydrographic data from World Ocean Atlas (WOA), and forced by monthly mean surface and open boundary conditions supplied from European Centre for Medium-Range Weather Forecast data, WOA and so on. The EnKF system is composed of 16 ensembles and thousands of observation data are assimilated at every assimilation step into its parallel version, which significantly reduces the required memory and computational time more than 3-fold compared with its serial version. To prevent the collapse of ensembles due to rank deficiency, we employ various schemes such as localization and inflation of the background error covariance and disturbance of observations. Sea surface temperature from the Advanced Very High Resolution Radiometer and in-situ temperature profiles from various sources including Argo floats have been assimilated into the EnKF system. For cyclonic circulation in the northern East Sea and paths of the East Korean Warm Current and the Nearshore Branch, the EnKF system reproduces the mean surface circulation more realistically than that in the case without data assimilation. Simulated area-averaged vertical temperature profiles also agrees well with the Generalized Digital Environmental Model data, which indicates that the EnKF system corrects the warming of subsurface temperature and the erosion of the permanent thermocline that are usually observed in numerical models without data assimilation. We also quantitatively validate the EnKF system by comparing its results with observed temperatures at 100 m for two years in the southwestern East Sea. We find that spatial and temporal correlations are higher and root-mean-square errors are lower in the EnKF system as compared with those systems without data assimilation.
Relation Between Pore Size and the Compressibility of a Confined Fluid
Gor, Gennady Y.; Siderius, Daniel W.; Rasmussen, Christopher J.; Krekelberg, William P.; Shen, Vincent K.; Bernstein, Noam
2015-01-01
When a fluid is confined to a nanopore, its thermodynamic properties differ from the properties of a bulk fluid, so measuring such properties of the confined fluid can provide information about the pore sizes. Here we report a simple relation between the pore size and isothermal compressibility of argon confined in these pores. Compressibility is calculated from the fluctuations of the number of particles in the grand canonical ensemble using two different simulation techniques: conventional grand-canonical Monte Carlo and grand-canonical ensemble transition-matrix Monte Carlo. Our results provide a theoretical framework for extracting the information on the pore sizes of fluid-saturated samples by measuring the compressibility from ultrasonic experiments. PMID:26590541
Dynamic principle for ensemble control tools.
Samoletov, A; Vasiev, B
2017-11-28
Dynamical equations describing physical systems in contact with a thermal bath are commonly extended by mathematical tools called "thermostats." These tools are designed for sampling ensembles in statistical mechanics. Here we propose a dynamic principle underlying a range of thermostats which is derived using fundamental laws of statistical physics and ensures invariance of the canonical measure. The principle covers both stochastic and deterministic thermostat schemes. Our method has a clear advantage over a range of proposed and widely used thermostat schemes that are based on formal mathematical reasoning. Following the derivation of the proposed principle, we show its generality and illustrate its applications including design of temperature control tools that differ from the Nosé-Hoover-Langevin scheme.
ms 2: A molecular simulation tool for thermodynamic properties, release 3.0
NASA Astrophysics Data System (ADS)
Rutkai, Gábor; Köster, Andreas; Guevara-Carrion, Gabriela; Janzen, Tatjana; Schappals, Michael; Glass, Colin W.; Bernreuther, Martin; Wafai, Amer; Stephan, Simon; Kohns, Maximilian; Reiser, Steffen; Deublein, Stephan; Horsch, Martin; Hasse, Hans; Vrabec, Jadran
2017-12-01
A new version release (3.0) of the molecular simulation tool ms 2 (Deublein et al., 2011; Glass et al. 2014) is presented. Version 3.0 of ms 2 features two additional ensembles, i.e. microcanonical (NVE) and isobaric-isoenthalpic (NpH), various Helmholtz energy derivatives in the NVE ensemble, thermodynamic integration as a method for calculating the chemical potential, the osmotic pressure for calculating the activity of solvents, the six Maxwell-Stefan diffusion coefficients of quaternary mixtures, statistics for sampling hydrogen bonds, smooth-particle mesh Ewald summation as well as the ability to carry out molecular dynamics runs for an arbitrary number of state points in a single program execution.
Wang, Jian; Ben, Weiwei; Yang, Min; Zhang, Yu; Qiang, Zhimin
2016-01-01
Swine feedlots are an important pollution source of antibiotics and antibiotic resistance genes (ARGs) to the environment. This study investigated the dissemination of two classes of commonly-used veterinary antibiotics, namely, tetracyclines (TCs) and sulfonamides (SAs), and their corresponding ARGs along the waste treatment paths from a concentrated swine feedlot located in Beijing, China. The highest total TC and total SA concentrations detected were 166.7mgkg(-1) and 64.5μgkg(-1) in swine manure as well as 388.7 and 7.56μgL(-1) in swine wastewater, respectively. Fourteen tetracycline resistance genes (TRGs) encoding ribosomal protection proteins (RPP), efflux proteins (EFP) and enzymatic inactivation proteins, three sulfonamide resistance genes (SRGs), and two integrase genes were detected along the waste treatment paths with detection frequencies of 33.3-75.0%. The relative abundances of target ARGs ranged from 2.74×10(-6) to 1.19. The antibiotics and ARGs generally declined along both waste treatment paths, but their degree of reduction was more significant along the manure treatment path. The RPP TRGs dominated in the upstream samples and then decreased continuously along both waste treatment paths, whilst the EFP TRGs and SRGs maintained relatively stable. Strong correlations between antibiotic concentrations and ARGs were observed among both manure and wastewater samples. In addition, seasonal temperature, and integrase genes, moisture content and nutrient level of tested samples could all impact the relative abundances of ARGs along the swine waste treatment paths. This study helps understand the evolution and spread of ARGs from swine feedlots to the environment as well as assess the environmental risk arising from swine waste treatment. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Simone, Gabriele; Cordone, Roberto; Serapioni, Raul Paolo; Lecca, Michela
2017-05-01
Retinex theory estimates the human color sensation at any observed point by correcting its color based on the spatial arrangement of the colors in proximate regions. We revise two recent path-based, edge-aware Retinex implementations: Termite Retinex (TR) and Energy-driven Termite Retinex (ETR). As the original Retinex implementation, TR and ETR scan the neighborhood of any image pixel by paths and rescale their chromatic intensities by intensity levels computed by reworking the colors of the pixels on the paths. Our interest in TR and ETR is due to their unique, content-based scanning scheme, which uses the image edges to define the paths and exploits a swarm intelligence model for guiding the spatial exploration of the image. The exploration scheme of ETR has been showed to be particularly effective: its paths are local minima of an energy functional, designed to favor the sampling of image pixels highly relevant to color sensation. Nevertheless, since its computational complexity makes ETR poorly practicable, here we present a light version of it, named Light Energy-driven TR, and obtained from ETR by implementing a modified, optimized minimization procedure and by exploiting parallel computing.
Optical Path Switching Based Differential Absorption Radiometry for Substance Detection
NASA Technical Reports Server (NTRS)
Sachse, Glen W. (Inventor)
2000-01-01
A system and method are provided for detecting one or more substances. An optical path switch divides sample path radiation into a time series of alternating first polarized components and second polarized components. The first polarized components are transmitted along a first optical path and the second polarized components along a second optical path. A first gasless optical filter train filters the first polarized components to isolate at least a first wavelength band thereby generating first filtered radiation. A second gasless optical filter train filters the second polarized components to isolate at least a second wavelength band thereby generating second filtered radiation. The first wavelength band and second wavelength band are unique. Further, spectral absorption of a substance of interest is different at the first wavelength band as compared to the second wavelength band. A beam combiner combines the first and second filtered radiation to form a combined beam of radiation. A detector is disposed to monitor magnitude of at least a portion of the combined beam alternately at the first wavelength band and the second wavelength band as an indication of the concentration of the substance in the sample path.
Foundations and latest advances in replica exchange transition interface sampling.
Cabriolu, Raffaela; Skjelbred Refsnes, Kristin M; Bolhuis, Peter G; van Erp, Titus S
2017-10-21
Nearly 20 years ago, transition path sampling (TPS) emerged as an alternative method to free energy based approaches for the study of rare events such as nucleation, protein folding, chemical reactions, and phase transitions. TPS effectively performs Monte Carlo simulations with relatively short molecular dynamics trajectories, with the advantage of not having to alter the actual potential energy surface nor the underlying physical dynamics. Although the TPS approach also introduced a methodology to compute reaction rates, this approach was for a long time considered theoretically attractive, providing the exact same results as extensively long molecular dynamics simulations, but still expensive for most relevant applications. With the increase of computer power and improvements in the algorithmic methodology, quantitative path sampling is finding applications in more and more areas of research. In particular, the transition interface sampling (TIS) and the replica exchange TIS (RETIS) algorithms have, in turn, improved the efficiency of quantitative path sampling significantly, while maintaining the exact nature of the approach. Also, open-source software packages are making these methods, for which implementation is not straightforward, now available for a wider group of users. In addition, a blooming development takes place regarding both applications and algorithmic refinements. Therefore, it is timely to explore the wide panorama of the new developments in this field. This is the aim of this article, which focuses on the most efficient exact path sampling approach, RETIS, as well as its recent applications, extensions, and variations.
Foundations and latest advances in replica exchange transition interface sampling
NASA Astrophysics Data System (ADS)
Cabriolu, Raffaela; Skjelbred Refsnes, Kristin M.; Bolhuis, Peter G.; van Erp, Titus S.
2017-10-01
Nearly 20 years ago, transition path sampling (TPS) emerged as an alternative method to free energy based approaches for the study of rare events such as nucleation, protein folding, chemical reactions, and phase transitions. TPS effectively performs Monte Carlo simulations with relatively short molecular dynamics trajectories, with the advantage of not having to alter the actual potential energy surface nor the underlying physical dynamics. Although the TPS approach also introduced a methodology to compute reaction rates, this approach was for a long time considered theoretically attractive, providing the exact same results as extensively long molecular dynamics simulations, but still expensive for most relevant applications. With the increase of computer power and improvements in the algorithmic methodology, quantitative path sampling is finding applications in more and more areas of research. In particular, the transition interface sampling (TIS) and the replica exchange TIS (RETIS) algorithms have, in turn, improved the efficiency of quantitative path sampling significantly, while maintaining the exact nature of the approach. Also, open-source software packages are making these methods, for which implementation is not straightforward, now available for a wider group of users. In addition, a blooming development takes place regarding both applications and algorithmic refinements. Therefore, it is timely to explore the wide panorama of the new developments in this field. This is the aim of this article, which focuses on the most efficient exact path sampling approach, RETIS, as well as its recent applications, extensions, and variations.
Nanoscale imaging of clinical specimens using pathology-optimized expansion microscopy
Zhao, Yongxin; Bucur, Octavian; Irshad, Humayun; Chen, Fei; Weins, Astrid; Stancu, Andreea L.; Oh, Eun-Young; DiStasio, Marcello; Torous, Vanda; Glass, Benjamin; Stillman, Isaac E.; Schnitt, Stuart J.; Beck, Andrew H.; Boyden, Edward S.
2017-01-01
Expansion microscopy (ExM), a method for improving the resolution of light microscopy by physically expanding the specimen, has not been applied to clinical tissue samples. Here we report a clinically optimized form of ExM that supports nanoscale imaging of human tissue specimens that have been fixed with formalin, embedded in paraffin, stained with hematoxylin and eosin (H&E), and/or fresh frozen. The method, which we call expansion pathology (ExPath), converts clinical samples into an ExM-compatible state, then applies an ExM protocol with protein anchoring and mechanical homogenization steps optimized for clinical samples. ExPath enables ~70 nm resolution imaging of diverse biomolecules in intact tissues using conventional diffraction-limited microscopes, and standard antibody and fluorescent DNA in situ hybridization reagents. We use ExPath for optical diagnosis of kidney minimal-change disease, which previously required electron microscopy (EM), and demonstrate high-fidelity computational discrimination between early breast neoplastic lesions that to date have challenged human judgment. ExPath may enable the routine use of nanoscale imaging in pathology and clinical research. PMID:28714966
Nanoscale imaging of clinical specimens using pathology-optimized expansion microscopy.
Zhao, Yongxin; Bucur, Octavian; Irshad, Humayun; Chen, Fei; Weins, Astrid; Stancu, Andreea L; Oh, Eun-Young; DiStasio, Marcello; Torous, Vanda; Glass, Benjamin; Stillman, Isaac E; Schnitt, Stuart J; Beck, Andrew H; Boyden, Edward S
2017-08-01
Expansion microscopy (ExM), a method for improving the resolution of light microscopy by physically expanding a specimen, has not been applied to clinical tissue samples. Here we report a clinically optimized form of ExM that supports nanoscale imaging of human tissue specimens that have been fixed with formalin, embedded in paraffin, stained with hematoxylin and eosin, and/or fresh frozen. The method, which we call expansion pathology (ExPath), converts clinical samples into an ExM-compatible state, then applies an ExM protocol with protein anchoring and mechanical homogenization steps optimized for clinical samples. ExPath enables ∼70-nm-resolution imaging of diverse biomolecules in intact tissues using conventional diffraction-limited microscopes and standard antibody and fluorescent DNA in situ hybridization reagents. We use ExPath for optical diagnosis of kidney minimal-change disease, a process that previously required electron microscopy, and we demonstrate high-fidelity computational discrimination between early breast neoplastic lesions for which pathologists often disagree in classification. ExPath may enable the routine use of nanoscale imaging in pathology and clinical research.
Residue-level global and local ensemble-ensemble comparisons of protein domains.
Clark, Sarah A; Tronrud, Dale E; Karplus, P Andrew
2015-09-01
Many methods of protein structure generation such as NMR-based solution structure determination and template-based modeling do not produce a single model, but an ensemble of models consistent with the available information. Current strategies for comparing ensembles lose information because they use only a single representative structure. Here, we describe the ENSEMBLATOR and its novel strategy to directly compare two ensembles containing the same atoms to identify significant global and local backbone differences between them on per-atom and per-residue levels, respectively. The ENSEMBLATOR has four components: eePREP (ee for ensemble-ensemble), which selects atoms common to all models; eeCORE, which identifies atoms belonging to a cutoff-distance dependent common core; eeGLOBAL, which globally superimposes all models using the defined core atoms and calculates for each atom the two intraensemble variations, the interensemble variation, and the closest approach of members of the two ensembles; and eeLOCAL, which performs a local overlay of each dipeptide and, using a novel measure of local backbone similarity, reports the same four variations as eeGLOBAL. The combination of eeGLOBAL and eeLOCAL analyses identifies the most significant differences between ensembles. We illustrate the ENSEMBLATOR's capabilities by showing how using it to analyze NMR ensembles and to compare NMR ensembles with crystal structures provides novel insights compared to published studies. One of these studies leads us to suggest that a "consistency check" of NMR-derived ensembles may be a useful analysis step for NMR-based structure determinations in general. The ENSEMBLATOR 1.0 is available as a first generation tool to carry out ensemble-ensemble comparisons. © 2015 The Protein Society.
Residue-level global and local ensemble-ensemble comparisons of protein domains
Clark, Sarah A; Tronrud, Dale E; Andrew Karplus, P
2015-01-01
Many methods of protein structure generation such as NMR-based solution structure determination and template-based modeling do not produce a single model, but an ensemble of models consistent with the available information. Current strategies for comparing ensembles lose information because they use only a single representative structure. Here, we describe the ENSEMBLATOR and its novel strategy to directly compare two ensembles containing the same atoms to identify significant global and local backbone differences between them on per-atom and per-residue levels, respectively. The ENSEMBLATOR has four components: eePREP (ee for ensemble-ensemble), which selects atoms common to all models; eeCORE, which identifies atoms belonging to a cutoff-distance dependent common core; eeGLOBAL, which globally superimposes all models using the defined core atoms and calculates for each atom the two intraensemble variations, the interensemble variation, and the closest approach of members of the two ensembles; and eeLOCAL, which performs a local overlay of each dipeptide and, using a novel measure of local backbone similarity, reports the same four variations as eeGLOBAL. The combination of eeGLOBAL and eeLOCAL analyses identifies the most significant differences between ensembles. We illustrate the ENSEMBLATOR's capabilities by showing how using it to analyze NMR ensembles and to compare NMR ensembles with crystal structures provides novel insights compared to published studies. One of these studies leads us to suggest that a “consistency check” of NMR-derived ensembles may be a useful analysis step for NMR-based structure determinations in general. The ENSEMBLATOR 1.0 is available as a first generation tool to carry out ensemble-ensemble comparisons. PMID:26032515
Wei, Kun; Ren, Bingyin
2018-02-13
In a future intelligent factory, a robotic manipulator must work efficiently and safely in a Human-Robot collaborative and dynamic unstructured environment. Autonomous path planning is the most important issue which must be resolved first in the process of improving robotic manipulator intelligence. Among the path-planning methods, the Rapidly Exploring Random Tree (RRT) algorithm based on random sampling has been widely applied in dynamic path planning for a high-dimensional robotic manipulator, especially in a complex environment because of its probability completeness, perfect expansion, and fast exploring speed over other planning methods. However, the existing RRT algorithm has a limitation in path planning for a robotic manipulator in a dynamic unstructured environment. Therefore, an autonomous obstacle avoidance dynamic path-planning method for a robotic manipulator based on an improved RRT algorithm, called Smoothly RRT (S-RRT), is proposed. This method that targets a directional node extends and can increase the sampling speed and efficiency of RRT dramatically. A path optimization strategy based on the maximum curvature constraint is presented to generate a smooth and curved continuous executable path for a robotic manipulator. Finally, the correctness, effectiveness, and practicability of the proposed method are demonstrated and validated via a MATLAB static simulation and a Robot Operating System (ROS) dynamic simulation environment as well as a real autonomous obstacle avoidance experiment in a dynamic unstructured environment for a robotic manipulator. The proposed method not only provides great practical engineering significance for a robotic manipulator's obstacle avoidance in an intelligent factory, but also theoretical reference value for other type of robots' path planning.
The Ensembl REST API: Ensembl Data for Any Language
Yates, Andrew; Beal, Kathryn; Keenan, Stephen; McLaren, William; Pignatelli, Miguel; Ritchie, Graham R. S.; Ruffier, Magali; Taylor, Kieron; Vullo, Alessandro; Flicek, Paul
2015-01-01
Motivation: We present a Web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA while minimizing client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor tool permitting large-scale programmatic variant analysis independent of any specific programming language. Availability and implementation: The Ensembl REST API can be accessed at http://rest.ensembl.org and source code is freely available under an Apache 2.0 license from http://github.com/Ensembl/ensembl-rest. Contact: ayates@ebi.ac.uk or flicek@ebi.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25236461
HLPI-Ensemble: Prediction of human lncRNA-protein interactions based on ensemble strategy.
Hu, Huan; Zhang, Li; Ai, Haixin; Zhang, Hui; Fan, Yetian; Zhao, Qi; Liu, Hongsheng
2018-03-27
LncRNA plays an important role in many biological and disease progression by binding to related proteins. However, the experimental methods for studying lncRNA-protein interactions are time-consuming and expensive. Although there are a few models designed to predict the interactions of ncRNA-protein, they all have some common drawbacks that limit their predictive performance. In this study, we present a model called HLPI-Ensemble designed specifically for human lncRNA-protein interactions. HLPI-Ensemble adopts the ensemble strategy based on three mainstream machine learning algorithms of Support Vector Machines (SVM), Random Forests (RF) and Extreme Gradient Boosting (XGB) to generate HLPI-SVM Ensemble, HLPI-RF Ensemble and HLPI-XGB Ensemble, respectively. The results of 10-fold cross-validation show that HLPI-SVM Ensemble, HLPI-RF Ensemble and HLPI-XGB Ensemble achieved AUCs of 0.95, 0.96 and 0.96, respectively, in the test dataset. Furthermore, we compared the performance of the HLPI-Ensemble models with the previous models through external validation dataset. The results show that the false positives (FPs) of HLPI-Ensemble models are much lower than that of the previous models, and other evaluation indicators of HLPI-Ensemble models are also higher than those of the previous models. It is further showed that HLPI-Ensemble models are superior in predicting human lncRNA-protein interaction compared with previous models. The HLPI-Ensemble is publicly available at: http://ccsipb.lnu.edu.cn/hlpiensemble/ .
A simple analytical model for dynamics of time-varying target leverage ratios
NASA Astrophysics Data System (ADS)
Lo, C. F.; Hui, C. H.
2012-03-01
In this paper we have formulated a simple theoretical model for the dynamics of the time-varying target leverage ratio of a firm under some assumptions based upon empirical observations. In our theoretical model the time evolution of the target leverage ratio of a firm can be derived self-consistently from a set of coupled Ito's stochastic differential equations governing the leverage ratios of an ensemble of firms by the nonlinear Fokker-Planck equation approach. The theoretically derived time paths of the target leverage ratio bear great resemblance to those used in the time-dependent stationary-leverage (TDSL) model [Hui et al., Int. Rev. Financ. Analy. 15, 220 (2006)]. Thus, our simple model is able to provide a theoretical foundation for the selected time paths of the target leverage ratio in the TDSL model. We also examine how the pace of the adjustment of a firm's target ratio, the volatility of the leverage ratio and the current leverage ratio affect the dynamics of the time-varying target leverage ratio. Hence, with the proposed dynamics of the time-dependent target leverage ratio, the TDSL model can be readily applied to generate the default probabilities of individual firms and to assess the default risk of the firms.
Fielding, M. D.; Chiu, J. C.; Hogan, R. J.; ...
2015-07-02
Active remote sensing of marine boundary-layer clouds is challenging as drizzle drops often dominate the observed radar reflectivity. We present a new method to simultaneously retrieve cloud and drizzle vertical profiles in drizzling boundary-layer clouds using surface-based observations of radar reflectivity, lidar attenuated backscatter, and zenith radiances under conditions when precipitation does not reach the surface. Specifically, the vertical structure of droplet size and water content of both cloud and drizzle is characterised throughout the cloud. An ensemble optimal estimation approach provides full error statistics given the uncertainty in the observations. To evaluate the new method, we first perform retrievalsmore » using synthetic measurements from large-eddy simulation snapshots of cumulus under stratocumulus, where cloud water path is retrieved with an error of 31 g m -2. The method also performs well in non-drizzling clouds where no assumption of the cloud profile is required. We then apply the method to observations of marine stratocumulus obtained during the Atmospheric Radiation Measurement MAGIC deployment in the Northeast Pacific. Here, retrieved cloud water path agrees well with independent three-channel microwave radiometer retrievals, with a root mean square difference of 10–20 g m -2.« less
Spreading paths in partially observed social networks
NASA Astrophysics Data System (ADS)
Onnela, Jukka-Pekka; Christakis, Nicholas A.
2012-03-01
Understanding how and how far information, behaviors, or pathogens spread in social networks is an important problem, having implications for both predicting the size of epidemics, as well as for planning effective interventions. There are, however, two main challenges for inferring spreading paths in real-world networks. One is the practical difficulty of observing a dynamic process on a network, and the other is the typical constraint of only partially observing a network. Using static, structurally realistic social networks as platforms for simulations, we juxtapose three distinct paths: (1) the stochastic path taken by a simulated spreading process from source to target; (2) the topologically shortest path in the fully observed network, and hence the single most likely stochastic path, between the two nodes; and (3) the topologically shortest path in a partially observed network. In a sampled network, how closely does the partially observed shortest path (3) emulate the unobserved spreading path (1)? Although partial observation inflates the length of the shortest path, the stochastic nature of the spreading process also frequently derails the dynamic path from the shortest path. We find that the partially observed shortest path does not necessarily give an inflated estimate of the length of the process path; in fact, partial observation may, counterintuitively, make the path seem shorter than it actually is.
Spreading paths in partially observed social networks.
Onnela, Jukka-Pekka; Christakis, Nicholas A
2012-03-01
Understanding how and how far information, behaviors, or pathogens spread in social networks is an important problem, having implications for both predicting the size of epidemics, as well as for planning effective interventions. There are, however, two main challenges for inferring spreading paths in real-world networks. One is the practical difficulty of observing a dynamic process on a network, and the other is the typical constraint of only partially observing a network. Using static, structurally realistic social networks as platforms for simulations, we juxtapose three distinct paths: (1) the stochastic path taken by a simulated spreading process from source to target; (2) the topologically shortest path in the fully observed network, and hence the single most likely stochastic path, between the two nodes; and (3) the topologically shortest path in a partially observed network. In a sampled network, how closely does the partially observed shortest path (3) emulate the unobserved spreading path (1)? Although partial observation inflates the length of the shortest path, the stochastic nature of the spreading process also frequently derails the dynamic path from the shortest path. We find that the partially observed shortest path does not necessarily give an inflated estimate of the length of the process path; in fact, partial observation may, counterintuitively, make the path seem shorter than it actually is.
NASA Astrophysics Data System (ADS)
Shkolnik, Igor; Pavlova, Tatiana; Efimov, Sergey; Zhuravlev, Sergey
2018-01-01
Climate change simulation based on 30-member ensemble of Voeikov Main Geophysical Observatory RCM (resolution 25 km) for northern Eurasia is used to drive hydrological model CaMa-Flood. Using this modeling framework, we evaluate the uncertainties in the future projection of the peak river discharge and flood hazard by 2050-2059 relative to 1990-1999 under IPCC RCP8.5 scenario. Large ensemble size, along with reasonably high modeling resolution, allows one to efficiently sample natural climate variability and increase our ability to predict future changes in the hydrological extremes. It has been shown that the annual maximum river discharge can almost double by the mid-XXI century in the outlets of major Siberian rivers. In the western regions, there is a weak signal in the river discharge and flood hazard, hardly discernible above climate variability. Annual maximum flood area is projected to increase across Siberia mostly by 2-5% relative to the baseline period. A contribution of natural climate variability at different temporal scales to the uncertainty of ensemble prediction is discussed. The analysis shows that there expected considerable changes in the extreme river discharge probability at locations of the key hydropower facilities. This suggests that the extensive impact studies are required to develop recommendations for maintaining regional energy security.
A Bayesian ensemble data assimilation to constrain model parameters and land-use carbon emissions
NASA Astrophysics Data System (ADS)
Lienert, Sebastian; Joos, Fortunat
2018-05-01
A dynamic global vegetation model (DGVM) is applied in a probabilistic framework and benchmarking system to constrain uncertain model parameters by observations and to quantify carbon emissions from land-use and land-cover change (LULCC). Processes featured in DGVMs include parameters which are prone to substantial uncertainty. To cope with these uncertainties Latin hypercube sampling (LHS) is used to create a 1000-member perturbed parameter ensemble, which is then evaluated with a diverse set of global and spatiotemporally resolved observational constraints. We discuss the performance of the constrained ensemble and use it to formulate a new best-guess version of the model (LPX-Bern v1.4). The observationally constrained ensemble is used to investigate historical emissions due to LULCC (ELUC) and their sensitivity to model parametrization. We find a global ELUC estimate of 158 (108, 211) PgC (median and 90 % confidence interval) between 1800 and 2016. We compare ELUC to other estimates both globally and regionally. Spatial patterns are investigated and estimates of ELUC of the 10 countries with the largest contribution to the flux over the historical period are reported. We consider model versions with and without additional land-use processes (shifting cultivation and wood harvest) and find that the difference in global ELUC is on the same order of magnitude as parameter-induced uncertainty and in some cases could potentially even be offset with appropriate parameter choice.
NASA Technical Reports Server (NTRS)
Keppenne, Christian L.; Rienecker, Michele M.; Kovach, Robin M.; Vernieres, Guillaume; Koster, Randal D. (Editor)
2014-01-01
An attractive property of ensemble data assimilation methods is that they provide flow dependent background error covariance estimates which can be used to update fields of observed variables as well as fields of unobserved model variables. Two methods to estimate background error covariances are introduced which share the above property with ensemble data assimilation methods but do not involve the integration of multiple model trajectories. Instead, all the necessary covariance information is obtained from a single model integration. The Space Adaptive Forecast error Estimation (SAFE) algorithm estimates error covariances from the spatial distribution of model variables within a single state vector. The Flow Adaptive error Statistics from a Time series (FAST) method constructs an ensemble sampled from a moving window along a model trajectory. SAFE and FAST are applied to the assimilation of Argo temperature profiles into version 4.1 of the Modular Ocean Model (MOM4.1) coupled to the GEOS-5 atmospheric model and to the CICE sea ice model. The results are validated against unassimilated Argo salinity data. They show that SAFE and FAST are competitive with the ensemble optimal interpolation (EnOI) used by the Global Modeling and Assimilation Office (GMAO) to produce its ocean analysis. Because of their reduced cost, SAFE and FAST hold promise for high-resolution data assimilation applications.
NASA Technical Reports Server (NTRS)
Keppenne, Christian L.; Rienecker, Michele; Kovach, Robin M.; Vernieres, Guillaume
2014-01-01
An attractive property of ensemble data assimilation methods is that they provide flow dependent background error covariance estimates which can be used to update fields of observed variables as well as fields of unobserved model variables. Two methods to estimate background error covariances are introduced which share the above property with ensemble data assimilation methods but do not involve the integration of multiple model trajectories. Instead, all the necessary covariance information is obtained from a single model integration. The Space Adaptive Forecast error Estimation (SAFE) algorithm estimates error covariances from the spatial distribution of model variables within a single state vector. The Flow Adaptive error Statistics from a Time series (FAST) method constructs an ensemble sampled from a moving window along a model trajectory.SAFE and FAST are applied to the assimilation of Argo temperature profiles into version 4.1 of the Modular Ocean Model (MOM4.1) coupled to the GEOS-5 atmospheric model and to the CICE sea ice model. The results are validated against unassimilated Argo salinity data. They show that SAFE and FAST are competitive with the ensemble optimal interpolation (EnOI) used by the Global Modeling and Assimilation Office (GMAO) to produce its ocean analysis. Because of their reduced cost, SAFE and FAST hold promise for high-resolution data assimilation applications.
Ensemble Kalman Filter versus Ensemble Smoother for Data Assimilation in Groundwater Modeling
NASA Astrophysics Data System (ADS)
Li, L.; Cao, Z.; Zhou, H.
2017-12-01
Groundwater modeling calls for an effective and robust integrating method to fill the gap between the model and data. The Ensemble Kalman Filter (EnKF), a real-time data assimilation method, has been increasingly applied in multiple disciplines such as petroleum engineering and hydrogeology. In this approach, the groundwater models are sequentially updated using measured data such as hydraulic head and concentration data. As an alternative to the EnKF, the Ensemble Smoother (ES) was proposed with updating models using all the data together, and therefore needs a much less computational cost. To further improve the performance of the ES, an iterative ES was proposed for continuously updating the models by assimilating measurements together. In this work, we compare the performance of the EnKF, the ES and the iterative ES using a synthetic example in groundwater modeling. The hydraulic head data modeled on the basis of the reference conductivity field are utilized to inversely estimate conductivities at un-sampled locations. Results are evaluated in terms of the characterization of conductivity and groundwater flow and solute transport predictions. It is concluded that: (1) the iterative ES could achieve a comparable result with the EnKF, but needs a less computational cost; (2) the iterative ES has the better performance than the ES through continuously updating. These findings suggest that the iterative ES should be paid much more attention for data assimilation in groundwater modeling.
NASA Astrophysics Data System (ADS)
Pollard, D.; Chang, W.; Haran, M.; Applegate, P.; DeConto, R.
2015-11-01
A 3-D hybrid ice-sheet model is applied to the last deglacial retreat of the West Antarctic Ice Sheet over the last ~ 20 000 years. A large ensemble of 625 model runs is used to calibrate the model to modern and geologic data, including reconstructed grounding lines, relative sea-level records, elevation-age data and uplift rates, with an aggregate score computed for each run that measures overall model-data misfit. Two types of statistical methods are used to analyze the large-ensemble results: simple averaging weighted by the aggregate score, and more advanced Bayesian techniques involving Gaussian process-based emulation and calibration, and Markov chain Monte Carlo. Results for best-fit parameter ranges and envelopes of equivalent sea-level rise with the simple averaging method agree quite well with the more advanced techniques, but only for a large ensemble with full factorial parameter sampling. Best-fit parameter ranges confirm earlier values expected from prior model tuning, including large basal sliding coefficients on modern ocean beds. Each run is extended 5000 years into the "future" with idealized ramped climate warming. In the majority of runs with reasonable scores, this produces grounding-line retreat deep into the West Antarctic interior, and the analysis provides sea-level-rise envelopes with well defined parametric uncertainty bounds.
Peptidic Macrocycles - Conformational Sampling and Thermodynamic Characterization
2018-01-01
Macrocycles are of considerable interest as highly specific drug candidates, yet they challenge standard conformer generators with their large number of rotatable bonds and conformational restrictions. Here, we present a molecular dynamics-based routine that bypasses current limitations in conformational sampling and extensively profiles the free energy landscape of peptidic macrocycles in solution. We perform accelerated molecular dynamics simulations to capture a diverse conformational ensemble. By applying an energetic cutoff, followed by geometric clustering, we demonstrate the striking robustness and efficiency of the approach in identifying highly populated conformational states of cyclic peptides. The resulting structural and thermodynamic information is benchmarked against interproton distances from NMR experiments and conformational states identified by X-ray crystallography. Using three different model systems of varying size and flexibility, we show that the method reliably reproduces experimentally determined structural ensembles and is capable of identifying key conformational states that include the bioactive conformation. Thus, the described approach is a robust method to generate conformations of peptidic macrocycles and holds promise for structure-based drug design. PMID:29652495
DOE Office of Scientific and Technical Information (OSTI.GOV)
Angerer, Andreas, E-mail: andreas.angerer@tuwien.ac.at; Astner, Thomas; Wirtitsch, Daniel
We design and implement 3D-lumped element microwave cavities that spatially focus magnetic fields to a small mode volume. They allow coherent and uniform coupling to electron spins hosted by nitrogen vacancy centers in diamond. We achieve large homogeneous single spin coupling rates, with an enhancement of more than one order of magnitude compared to standard 3D cavities with a fundamental resonance at 3 GHz. Finite element simulations confirm that the magnetic field distribution is homogeneous throughout the entire sample volume, with a root mean square deviation of 1.54%. With a sample containing 10{sup 17} nitrogen vacancy electron spins, we achieve amore » collective coupling strength of Ω = 12 MHz, a cooperativity factor C = 27, and clearly enter the strong coupling regime. This allows to interface a macroscopic spin ensemble with microwave circuits, and the homogeneous Rabi frequency paves the way to manipulate the full ensemble population in a coherent way.« less
Active relearning for robust supervised classification of pulmonary emphysema
NASA Astrophysics Data System (ADS)
Raghunath, Sushravya; Rajagopalan, Srinivasan; Karwoski, Ronald A.; Bartholmai, Brian J.; Robb, Richard A.
2012-03-01
Radiologists are adept at recognizing the appearance of lung parenchymal abnormalities in CT scans. However, the inconsistent differential diagnosis, due to subjective aggregation, mandates supervised classification. Towards optimizing Emphysema classification, we introduce a physician-in-the-loop feedback approach in order to minimize uncertainty in the selected training samples. Using multi-view inductive learning with the training samples, an ensemble of Support Vector Machine (SVM) models, each based on a specific pair-wise dissimilarity metric, was constructed in less than six seconds. In the active relearning phase, the ensemble-expert label conflicts were resolved by an expert. This just-in-time feedback with unoptimized SVMs yielded 15% increase in classification accuracy and 25% reduction in the number of support vectors. The generality of relearning was assessed in the optimized parameter space of six different classifiers across seven dissimilarity metrics. The resultant average accuracy improved to 21%. The co-operative feedback method proposed here could enhance both diagnostic and staging throughput efficiency in chest radiology practice.
Peptidic Macrocycles - Conformational Sampling and Thermodynamic Characterization.
Kamenik, Anna S; Lessel, Uta; Fuchs, Julian E; Fox, Thomas; Liedl, Klaus R
2018-05-29
Macrocycles are of considerable interest as highly specific drug candidates, yet they challenge standard conformer generators with their large number of rotatable bonds and conformational restrictions. Here, we present a molecular dynamics-based routine that bypasses current limitations in conformational sampling and extensively profiles the free energy landscape of peptidic macrocycles in solution. We perform accelerated molecular dynamics simulations to capture a diverse conformational ensemble. By applying an energetic cutoff, followed by geometric clustering, we demonstrate the striking robustness and efficiency of the approach in identifying highly populated conformational states of cyclic peptides. The resulting structural and thermodynamic information is benchmarked against interproton distances from NMR experiments and conformational states identified by X-ray crystallography. Using three different model systems of varying size and flexibility, we show that the method reliably reproduces experimentally determined structural ensembles and is capable of identifying key conformational states that include the bioactive conformation. Thus, the described approach is a robust method to generate conformations of peptidic macrocycles and holds promise for structure-based drug design.
Shear-stress fluctuations and relaxation in polymer glasses
NASA Astrophysics Data System (ADS)
Kriuchevskyi, I.; Wittmer, J. P.; Meyer, H.; Benzerara, O.; Baschnagel, J.
2018-01-01
We investigate by means of molecular dynamics simulation a coarse-grained polymer glass model focusing on (quasistatic and dynamical) shear-stress fluctuations as a function of temperature T and sampling time Δ t . The linear response is characterized using (ensemble-averaged) expectation values of the contributions (time averaged for each shear plane) to the stress-fluctuation relation μsf for the shear modulus and the shear-stress relaxation modulus G (t ) . Using 100 independent configurations, we pay attention to the respective standard deviations. While the ensemble-averaged modulus μsf(T ) decreases continuously with increasing T for all Δ t sampled, its standard deviation δ μsf(T ) is nonmonotonic with a striking peak at the glass transition. The question of whether the shear modulus is continuous or has a jump singularity at the glass transition is thus ill posed. Confirming the effective time-translational invariance of our systems, the Δ t dependence of μsf and related quantities can be understood using a weighted integral over G (t ) .
The Cross-Entropy Based Multi-Filter Ensemble Method for Gene Selection.
Sun, Yingqiang; Lu, Chengbo; Li, Xiaobo
2018-05-17
The gene expression profile has the characteristics of a high dimension, low sample, and continuous type, and it is a great challenge to use gene expression profile data for the classification of tumor samples. This paper proposes a cross-entropy based multi-filter ensemble (CEMFE) method for microarray data classification. Firstly, multiple filters are used to select the microarray data in order to obtain a plurality of the pre-selected feature subsets with a different classification ability. The top N genes with the highest rank of each subset are integrated so as to form a new data set. Secondly, the cross-entropy algorithm is used to remove the redundant data in the data set. Finally, the wrapper method, which is based on forward feature selection, is used to select the best feature subset. The experimental results show that the proposed method is more efficient than other gene selection methods and that it can achieve a higher classification accuracy under fewer characteristic genes.
Mass spectrometer with electron source for reducing space charge effects in sample beam
Houk, Robert S.; Praphairaksit, Narong
2003-10-14
A mass spectrometer includes an ion source which generates a beam including positive ions, a sampling interface which extracts a portion of the beam from the ion source to form a sample beam that travels along a path and has an excess of positive ions over at least part of the path, thereby causing space charge effects to occur in the sample beam due to the excess of positive ions in the sample beam, an electron source which adds electrons to the sample beam to reduce space charge repulsion between the positive ions in the sample beam, thereby reducing the space charge effects in the sample beam and producing a sample beam having reduced space charge effects, and a mass analyzer which analyzes the sample beam having reduced space charge effects.
Ideas for a pattern-oriented approach towards a VERA analysis ensemble
NASA Astrophysics Data System (ADS)
Gorgas, T.; Dorninger, M.
2010-09-01
Ideas for a pattern-oriented approach towards a VERA analysis ensemble For many applications in meteorology and especially for verification purposes it is important to have some information about the uncertainties of observation and analysis data. A high quality of these "reference data" is an absolute necessity as the uncertainties are reflected in verification measures. The VERA (Vienna Enhanced Resolution Analysis) scheme includes a sophisticated quality control tool which accounts for the correction of observational data and provides an estimation of the observation uncertainty. It is crucial for meteorologically and physically reliable analysis fields. VERA is based on a variational principle and does not need any first guess fields. It is therefore NWP model independent and can also be used as an unbiased reference for real time model verification. For downscaling purposes VERA uses an a priori knowledge on small-scale physical processes over complex terrain, the so called "fingerprint technique", which transfers information from rich to data sparse regions. The enhanced Joint D-PHASE and COPS data set forms the data base for the analysis ensemble study. For the WWRP projects D-PHASE and COPS a joint activity has been started to collect GTS and non-GTS data from the national and regional meteorological services in Central Europe for 2007. Data from more than 11.000 stations are available for high resolution analyses. The usage of random numbers as perturbations for ensemble experiments is a common approach in meteorology. In most implementations, like for NWP-model ensemble systems, the focus lies on error growth and propagation on the spatial and temporal scale. When defining errors in analysis fields we have to consider the fact that analyses are not time dependent and that no perturbation method aimed at temporal evolution is possible. Further, the method applied should respect two major sources of analysis errors: Observation errors AND analysis or interpolation errors. With the concept of an analysis ensemble we hope to get a more detailed sight on both sources of analysis errors. For the computation of the VERA ensemble members a sample of Gaussian random perturbations is produced for each station and parameter. The deviation of perturbations is based on the correction proposals by the VERA QC scheme to provide some "natural" limits for the ensemble. In order to put more emphasis on the weather situation we aim to integrate the main synoptic field structures as weighting factors for the perturbations. Two widely approved approaches are used for the definition of these main field structures: The Principal Component Analysis and a 2D-Discrete Wavelet Transform. The results of tests concerning the implementation of this pattern-supported analysis ensemble system and a comparison of the different approaches are given in the presentation.
NASA Astrophysics Data System (ADS)
Zuehlsdorff, T. J.; Isborn, C. M.
2018-01-01
The correct treatment of vibronic effects is vital for the modeling of absorption spectra of many solvated dyes. Vibronic spectra for small dyes in solution can be easily computed within the Franck-Condon approximation using an implicit solvent model. However, implicit solvent models neglect specific solute-solvent interactions on the electronic excited state. On the other hand, a straightforward way to account for solute-solvent interactions and temperature-dependent broadening is by computing vertical excitation energies obtained from an ensemble of solute-solvent conformations. Ensemble approaches usually do not account for vibronic transitions and thus often produce spectral shapes in poor agreement with experiment. We address these shortcomings by combining zero-temperature vibronic fine structure with vertical excitations computed for a room-temperature ensemble of solute-solvent configurations. In this combined approach, all temperature-dependent broadening is treated classically through the sampling of configurations and quantum mechanical vibronic contributions are included as a zero-temperature correction to each vertical transition. In our calculation of the vertical excitations, significant regions of the solvent environment are treated fully quantum mechanically to account for solute-solvent polarization and charge-transfer. For the Franck-Condon calculations, a small amount of frozen explicit solvent is considered in order to capture solvent effects on the vibronic shape function. We test the proposed method by comparing calculated and experimental absorption spectra of Nile red and the green fluorescent protein chromophore in polar and non-polar solvents. For systems with strong solute-solvent interactions, the combined approach yields significant improvements over the ensemble approach. For systems with weak to moderate solute-solvent interactions, both the high-energy vibronic tail and the width of the spectra are in excellent agreement with experiments.
Deep multi-spectral ensemble learning for electronic cleansing in dual-energy CT colonography
NASA Astrophysics Data System (ADS)
Tachibana, Rie; Näppi, Janne J.; Hironaka, Toru; Kim, Se Hyung; Yoshida, Hiroyuki
2017-03-01
We developed a novel electronic cleansing (EC) method for dual-energy CT colonography (DE-CTC) based on an ensemble deep convolution neural network (DCNN) and multi-spectral multi-slice image patches. In the method, an ensemble DCNN is used to classify each voxel of a DE-CTC image volume into five classes: luminal air, soft tissue, tagged fecal materials, and partial-volume boundaries between air and tagging and those between soft tissue and tagging. Each DCNN acts as a voxel classifier, where an input image patch centered at the voxel is generated as input to the DCNNs. An image patch has three channels that are mapped from a region-of-interest containing the image plane of the voxel and the two adjacent image planes. Six different types of spectral input image datasets were derived using two dual-energy CT images, two virtual monochromatic images, and two material images. An ensemble DCNN was constructed by use of a meta-classifier that combines the output of multiple DCNNs, each of which was trained with a different type of multi-spectral image patches. The electronically cleansed CTC images were calculated by removal of regions classified as other than soft tissue, followed by a colon surface reconstruction. For pilot evaluation, 359 volumes of interest (VOIs) representing sources of subtraction artifacts observed in current EC schemes were sampled from 30 clinical CTC cases. Preliminary results showed that the ensemble DCNN can yield high accuracy in labeling of the VOIs, indicating that deep learning of multi-spectral EC with multi-slice imaging could accurately remove residual fecal materials from CTC images without generating major EC artifacts.
Online breakage detection of multitooth tools using classifier ensembles for imbalanced data
NASA Astrophysics Data System (ADS)
Bustillo, Andrés; Rodríguez, Juan J.
2014-12-01
Cutting tool breakage detection is an important task, due to its economic impact on mass production lines in the automobile industry. This task presents a central limitation: real data-sets are extremely imbalanced because breakage occurs in very few cases compared with normal operation of the cutting process. In this paper, we present an analysis of different data-mining techniques applied to the detection of insert breakage in multitooth tools. The analysis applies only one experimental variable: the electrical power consumption of the tool drive. This restriction profiles real industrial conditions more accurately than other physical variables, such as acoustic or vibration signals, which are not so easily measured. Many efforts have been made to design a method that is able to identify breakages with a high degree of reliability within a short period of time. The solution is based on classifier ensembles for imbalanced data-sets. Classifier ensembles are combinations of classifiers, which in many situations are more accurate than individual classifiers. Six different base classifiers are tested: Decision Trees, Rules, Naïve Bayes, Nearest Neighbour, Multilayer Perceptrons and Logistic Regression. Three different balancing strategies are tested with each of the classifier ensembles and compared to their performance with the original data-set: Synthetic Minority Over-Sampling Technique (SMOTE), undersampling and a combination of SMOTE and undersampling. To identify the most suitable data-mining solution, Receiver Operating Characteristics (ROC) graph and Recall-precision graph are generated and discussed. The performance of logistic regression ensembles on the balanced data-set using the combination of SMOTE and undersampling turned out to be the most suitable technique. Finally a comparison using industrial performance measures is presented, which concludes that this technique is also more suited to this industrial problem than the other techniques presented in the bibliography.
Ensemble predictive model for more accurate soil organic carbon spectroscopic estimation
NASA Astrophysics Data System (ADS)
Vašát, Radim; Kodešová, Radka; Borůvka, Luboš
2017-07-01
A myriad of signal pre-processing strategies and multivariate calibration techniques has been explored in attempt to improve the spectroscopic prediction of soil organic carbon (SOC) over the last few decades. Therefore, to come up with a novel, more powerful, and accurate predictive approach to beat the rank becomes a challenging task. However, there may be a way, so that combine several individual predictions into a single final one (according to ensemble learning theory). As this approach performs best when combining in nature different predictive algorithms that are calibrated with structurally different predictor variables, we tested predictors of two different kinds: 1) reflectance values (or transforms) at each wavelength and 2) absorption feature parameters. Consequently we applied four different calibration techniques, two per each type of predictors: a) partial least squares regression and support vector machines for type 1, and b) multiple linear regression and random forest for type 2. The weights to be assigned to individual predictions within the ensemble model (constructed as a weighted average) were determined by an automated procedure that ensured the best solution among all possible was selected. The approach was tested at soil samples taken from surface horizon of four sites differing in the prevailing soil units. By employing the ensemble predictive model the prediction accuracy of SOC improved at all four sites. The coefficient of determination in cross-validation (R2cv) increased from 0.849, 0.611, 0.811 and 0.644 (the best individual predictions) to 0.864, 0.650, 0.824 and 0.698 for Site 1, 2, 3 and 4, respectively. Generally, the ensemble model affected the final prediction so that the maximal deviations of predicted vs. observed values of the individual predictions were reduced, and thus the correlation cloud became thinner as desired.
Texture developed during deformation of Transformation Induced Plasticity (TRIP) steels
NASA Astrophysics Data System (ADS)
Bhargava, M.; Shanta, C.; Asim, T.; Sushil, M.
2015-04-01
Automotive industry is currently focusing on using advanced high strength steels (AHSS) due to its high strength and formability for closure applications. Transformation Induced Plasticity (TRIP) steel is promising material for this application among other AHSS. The present work is focused on the microstructure development during deformation of TRIP steel sheets. To mimic complex strain path condition during forming of automotive body, Limit Dome Height (LDH) tests were conducted and samples were deformed in servo hydraulic press to find the different strain path. FEM Simulations were done to predict different strain path diagrams and compared with experimental results. There is a significant difference between experimental and simulation results as the existing material models are not applicable for TRIP steels. Micro texture studies were performed on the samples using EBSD and X-RD techniques. It was observed that austenite is transformed to martensite and texture developed during deformation had strong impact on limit strain and strain path.
Unsupervised Ensemble Anomaly Detection Using Time-Periodic Packet Sampling
NASA Astrophysics Data System (ADS)
Uchida, Masato; Nawata, Shuichi; Gu, Yu; Tsuru, Masato; Oie, Yuji
We propose an anomaly detection method for finding patterns in network traffic that do not conform to legitimate (i.e., normal) behavior. The proposed method trains a baseline model describing the normal behavior of network traffic without using manually labeled traffic data. The trained baseline model is used as the basis for comparison with the audit network traffic. This anomaly detection works in an unsupervised manner through the use of time-periodic packet sampling, which is used in a manner that differs from its intended purpose — the lossy nature of packet sampling is used to extract normal packets from the unlabeled original traffic data. Evaluation using actual traffic traces showed that the proposed method has false positive and false negative rates in the detection of anomalies regarding TCP SYN packets comparable to those of a conventional method that uses manually labeled traffic data to train the baseline model. Performance variation due to the probabilistic nature of sampled traffic data is mitigated by using ensemble anomaly detection that collectively exploits multiple baseline models in parallel. Alarm sensitivity is adjusted for the intended use by using maximum- and minimum-based anomaly detection that effectively take advantage of the performance variations among the multiple baseline models. Testing using actual traffic traces showed that the proposed anomaly detection method performs as well as one using manually labeled traffic data and better than one using randomly sampled (unlabeled) traffic data.
Lu, Qing; Kim, Jaegil; Straub, John E
2013-03-14
The generalized Replica Exchange Method (gREM) is extended into the isobaric-isothermal ensemble, and applied to simulate a vapor-liquid phase transition in Lennard-Jones fluids. Merging an optimally designed generalized ensemble sampling with replica exchange, gREM is particularly well suited for the effective simulation of first-order phase transitions characterized by "backbending" in the statistical temperature. While the metastable and unstable states in the vicinity of the first-order phase transition are masked by the enthalpy gap in temperature replica exchange method simulations, they are transformed into stable states through the parameterized effective sampling weights in gREM simulations, and join vapor and liquid phases with a succession of unimodal enthalpy distributions. The enhanced sampling across metastable and unstable states is achieved without the need to identify a "good" order parameter for biased sampling. We performed gREM simulations at various pressures below and near the critical pressure to examine the change in behavior of the vapor-liquid phase transition at different pressures. We observed a crossover from the first-order phase transition at low pressure, characterized by the backbending in the statistical temperature and the "kink" in the Gibbs free energy, to a continuous second-order phase transition near the critical pressure. The controlling mechanisms of nucleation and continuous phase transition are evident and the coexistence properties and phase diagram are found in agreement with literature results.
Neutron capture studies with a short flight path
NASA Astrophysics Data System (ADS)
Walter, Stephan; Heil, Michael; Käppeler, Franz; Plag, Ralf; Reifarth, René
The time of flight (TOF) method is an important tool for the experimental determination of neu- tron capture cross sections which are needed for s-process nucleosynthesis in general, and for analyses of branchings in the s-process reaction path in particular. So far, sample masses of at least several milligrams are required to compensate limitations in the currently available neutron fluxes. This constraint leads to unacceptable backgrounds for most of the relevant unstable branch point nuclei, due to the decay activity of the sample. A possible solution has been proposed by the NCAP project at the University of Frankfurt. A first step in this direction is reported here, which aims at enhancing the sensitivity of the Karlsruhe TOF array by reducing the neutron flight path to only a few centimeters. Though sample masses in the microgram regime can be used by this approach, the increase in neutron flux has to be paid by a higher background from the prompt flash related to neutron production. Test measurements with Au samples are reported.
Method and apparatus for probing relative volume fractions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jandrasits, W.G.; Kikta, T.J.
1996-12-31
A relative volume fraction probe particularly for use in a multiphase fluid system includes two parallel conductive paths defining there between a sample zone within the system. A generating unit generates time varying electrical signals which are inserted into one of the two parallel conductive paths. A time domain reflectometer receives the time varying electrical signals returned by the second of the two parallel conductive paths and, responsive thereto, outputs a curve of impedance versus distance. An analysis unit then calculates the area under the curve, subtracts the calculated area from an area produced when the sample zone consists entirelymore » of material of a first fluid phase, and divides this calculated difference by the difference between an area produced when the sample zone consists entirely of material of the first fluid phase and an area produced when the sample zone consists entirely of material of a second fluid phase. The result is the volume fraction.« less
CFO compensation method using optical feedback path for coherent optical OFDM system
NASA Astrophysics Data System (ADS)
Moon, Sang-Rok; Hwang, In-Ki; Kang, Hun-Sik; Chang, Sun Hyok; Lee, Seung-Woo; Lee, Joon Ki
2017-07-01
We investigate feasibility of carrier frequency offset (CFO) compensation method using optical feedback path for coherent optical orthogonal frequency division multiplexing (CO-OFDM) system. Recently proposed CFO compensation algorithms provide wide CFO estimation range in electrical domain. However, their practical compensation range is limited by sampling rate of an analog-to-digital converter (ADC). This limitation has not drawn attention, since the ADC sampling rate was high enough comparing to the data bandwidth and CFO in the wireless OFDM system. For CO-OFDM, the limitation is becoming visible because of increased data bandwidth, laser instability (i.e. large CFO) and insufficient ADC sampling rate owing to high cost. To solve the problem and extend practical CFO compensation range, we propose a CFO compensation method having optical feedback path. By adding simple wavelength control for local oscillator, the practical CFO compensation range can be extended to the sampling frequency range. The feasibility of the proposed method is experimentally investigated.